Well, this is an entirely other category of optimizations - not program performa...

lucrbvi · 2026-01-26T10:16:14 1769422574

Yes, in "runtime optimization" the model is just a computation graph so we can use a lot of well known tricks from compilation like dead code elimination and co..

tossandthrow · 2026-01-26T10:21:30 1769422890

We are getting closer!

What other optimizations are there that can be used than what explicitly falls into the 4 categories that the top commenter here listed out?

mirekrusin · 2026-01-26T13:32:33 1769434353

For inference assorted categories may include vectorization, register allocation, scheduling, lock elision, better algos, changing complexity, better data structures, profile guided specialization, layout/alignment changes, compression, quantization/mixed precision, fused kernels (goes beyond inlining), low rank adapters, sparsity, speculative decoding, parallel/multi token decoding, better sampling, prefill/decode separation, analog computation (why not) etc etc.

There is more to it, mentioned 4 categories are not the only ones, they are not even broad categories.

If somebody likes broad categories here is good one: "1s and 0s" and you can compute anything you want, there you go – single category for everything. Is it meaningful? Not really.

tossandthrow · 2026-01-26T16:29:50 1769444990

Thanks!