So what advantages does implementing spiking NNs in hardware vs implementing non-spiking NNs in hardware? Usually people say "better power efficiency" but I have never seen apples to apples comparisons. Is mixed signal chip running spiking NN actually more efficient than a mixed signal chip running a non-spiking (traditional, GEMM based) NN? Have such comparisons been done in the literature? If not, where is this claim coming from?
Also, why do people try to implement SNNs in hardware, when they don't work well in software? Shouldn't we first try to figure out how the brain actually does it (processes information), and only then try to build expensive specialized hardware for it?
The chip under discussion is able to do both (analog matrix multiplication and SNN operation). Briefly there are a bunch of “nano”-devices which are more amendable to spike based operation. Moreover analog computation is hard to scale up so the layer wise digitalization and weight loading and communication wipe out a lot of the potential benefit in the case of analog matrix accelerators. Part of the advances in recent years make SNN work “well enough” in software, especially considering the relative smaller overall investment in them.
But BrainScale chip was built to run spiking ops, so even when it does analog matmul, it's not optimized to do it exclusively and end to end, right? How about we compare it to a chip that was designed to perform analog matmul, for example, this one: https://www.mythic-ai.com/product/m1076-analog-matrix-proces...
If we measured the forward pass time to run something like Resnet-50 on Imagenet, taking into account any accuracy degradation, and compared to what BrainScale can do with the SNN equivalent of Resnet-50 - that would be interesting.
Don't get me wrong, what you did there is nice (chip in the loop with SNNs), I'm just struggling a bit with understanding the motivation. What does "works well enough" mean? Shouldn't it work much better than anything else to deserve building custom hardware for it? Especially if a regular matmul based NNs work better and might actually run faster and be more power efficient (when run on state of the art custom hw)?
I mean, this would be a no-brainer :) if you told me "this is how our brain works, and we want to emulate it in hardware to speed up neuroscience experiments", but that's just not true, is it? We don't know how the brain processes information, even such basic things like how the information is actually encoded, or what kind of computation a neuron performs.
Or if you don't care about the brain, it would make sense if the SNN algos produced state of the art results, and everyone would want to run them in their iphones. Or ok, if no state of the art results, at least good results with the best speed/efficiency. But if you have neither best results, nor best hw performance, I'm really scratching my head here...
Let's see, first of all this is a research project not an attempt to build a commercial product (yet). Indeed it is not currently optimised to do analog matrix multiplication particularly well, but what prevents it from performing better is well understood and relatively easy to fix. Of course we are aware of commercial efforts, but this is a pretty long running research project, so not all of those necessarily existed, when it was set in motion. As an example wafer-scale integration was accomplished on BrainScaleS way before Cerebras pushed for it, clearly they have executed far better on building a commercial product around it.
Our motivation is to build large scale accelerated neuromorphic hardware and to prove that it can be useful. It is not particularly efficient or even feasible to train SNN with GPUs over large timescales, so eventually we will need to use on-chip learning. This paper could be seen as an intermediate step, it's useful to know that the hardware can be optimised in the very least as a base line for further experiments.
Applications are not our primary concern at the moment, for the most part we believe that once we have identified the right algorithm(s) and hardware that is able to support the implementation of these algorithms it will be possible to apply it to many problems. For ANNs backpropagation had been figured out a long time ago, but the recent successes started after 2010. To be clear for SNN inference our chip is far faster and has far better latency than a GPU and is roughly ~10x faster than Intel's Loihi. Application areas for that are admittedly niche, especially as long as we can't scale to far more neurons.
Also, why do people try to implement SNNs in hardware, when they don't work well in software? Shouldn't we first try to figure out how the brain actually does it (processes information), and only then try to build expensive specialized hardware for it?