The difference you are seeing in this specific usage is because frun is dynamically adjusting batch size and worker count (by default it always begins at a batch size of 1 and using 1 worker). It is pretty darn good at dynamically pinning these down pretty quickly, but with only 14k total inputs split you are probably ending up with 2-3 times as many jq calls as you do setting the batch size to 100 inputs from the start, and you may not be fully spawning 32 workers.
If you want an apples-to-apples comparison, try running the following. This tells frun to use 100 lines per batch (-l 100), to use 32 workers (-j 32). Please let me know how this one compares to the rush invocation in terms of runtime.
NOTE: when I posted this reply using a space as a delimiter was broken. I just pushed a PR to the forkrun main branch that fixes this. If you re-download frun.bash and source it in a new bash instance, then the above space-delimited command should work as well, and is the most direct apples-to-apples comparison to your rush command.
So I thought about this for a bit, and this actually doesnt surprise me all that much. This makes sense when you consider the following 2 things:
First, 14k items in batches of 100 are only 140 batches. 140 batches in 160 ms is not even 1000 batches per second. For reference, parallel tops out at around 500 per second (but is dreadfully slow) and forkrun, in its normal "passing quoted arguments via the cmdline" mode, can do about 10000 batches per second. I have no doubt rush is far more capable of distributing batches quicker than parallel, so theres a good chance that "how fast the parallelization engine can distribute work" isnt the main bottleneck for either frun nor rush for this particular workload.
Second, the way frun distributes batches is very efficient but requires setting up a substantial amount of supporting machinery. This puts (on my system) the "no-load run time" of forkrun at about 80 ms.
time { echo | frun :; }
real 0m0.078s
user 0m0.027s
sys 0m0.064s
And this 80 ms difference is pretty close to the time difference you are seeing. Id bet the "minimum no-load time" for rush is considerably lower - perhaps a couple of ms.
forkrun is optimized for plowing through MASSIVE amount of very fast running inputs...it is capable of plowing through a billion (empty) inputs a second in its fastest mode. 14k inputs just isn't enough to amortize the startup of all the lock-free machinery.
I would venture to guess that if you repeat the same test but with 100x more inputs, the relative difference between frun and rush would be considerably less.