Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think a high level representation is necessary for relatively straightforward FMA extensions (either outer products in the case of Apple AMX or matrix products in the case of CUDA/Intel AMX). WebGPU + tensor core support and WASM + AMX support would be simpler to implement, likely more future proof and wouldn't require maintaining a massive layer of abstraction.


The issue is, much of the performance of Pytorch, JAX, et al comes from running a JIT that is tuned to the underlying HW, and come with support for high level intrinsic operations that were either hand-tuned or have extra hardware support, especially ops dealing with parallelizing computation across multiple cores.

You'd probably end up representing these as external library function calls in WASM, but then the WASM JIT would have to be taught that these are magic functions that are potentially treated specially, so at that point you're just embedding HLO ops as library func, and them embedding an HLO translator into the WASM runtime, I'm not sure that's any better.

By analogy would be be better to eliminate fragment and vertex shaders and just use WASM for sending shaders to the GPU, or is the domain specific language and its constraints beneficial to the GPU drivers?


Yes! In short:

Do we leave it to every web app to figure out how best to serve everyone, and have them bundle their own tuning optimizers into each app? Or do we bake in a higher level abstraction that works for everyone that the browser itself will be able to help optimize?

There's some risk & the browser apis likely won't come with all the escape-hatches the full tools might have to manually jigger with optimizations, but the idea of getting everyone to DIY seems like a promise of misfit: way too much code when you don't need it, way not enough tuning when you do need it. And there's other risks; the assurity that oh we just need one or maybe two ops on the web & then everything will be fine forever doesn't wash with me. If we make new ops the old code won't use it.

And what about hardware that doesn't have any presence on the web; lots of cheap embedded cores have a couple tflops of neural coprocessing, but neither wasm nor webgpu can target that atm, it's much too simple a core for that kind of dynamic execution; it's the sea of weird expansive hardware that OpenXLA helps one target (and target very well indeed) that is it's chief capability, and I can't imagine forgoing a middleman abstraction like it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: