Well, my take as a non-AI expert is that it was this paper (transformers) in combination with RLHF that made ChatGPT possible (with the former coming out of Google and the latter, as far as I can tell a bunch of places including DeepMind, OpenAI, etc).
That said, still mind blowing how far ahead Google was (is?) on research, data, compute, and how hard it is to actually novel things with that tech (vs OpenAI which... Just ships things).
Sam Altman practicing what he preached at YC for so many years. There's something wholly respectable about someone whose words closely match their actions.
A reminder that there's nothing open about OpenAI. Stable diffusion is open. OpenAI started spewing morality nonsense once they got something they could capitalize on. From supposedly non profit when there was no profit to be made to fake moral high ground, to becoming completely closed.
Hardly "practice what you preach". They may have delivered technologically, but they have completely backtracked on their openess promises.
I ran stable diffusion at home. Never could do the same with anything out of "Open"AI
I heard the Kitchen thing release of their lamda work this week is shockingly bad by comparison. Not good news for those hoping Google had been keeping a pound for pound competitor to OpenAI in their back pocket.
Imagenet was a breakthrough in that they were a large curated dataset which was needed for supervised learning, but modern transformers rely on self-supervision
I would say that BERT was an incremental improvement (as well as many others). But the idea of much more efficient sequence processing was introduced in this exact paper.
IMO you should just start playing with them. Midjourney gave by far the most impressive results with images but I also got bored fast with seeing amazing art.
chatGPT you just have to interact with it and get a feel for its strengths and weakness.
Midjourney provides better results out of the box, with Stable Diffusion you need to know how to use custom models and/or LoRas to get similar or better results, plus Midjourney does not allow you to generate NSFW and thus pornography (as it sometimes does with technology) has pushed SD performance even further; you can look to civitai with its huge collections of models (NSFW images are blurred but I would not suggest opening the site at work).
"This paper presents the Transformer, a model architecture that relies entirely on an attention mechanism to draw global dependencies between input and output. Experiments on two machine translation tasks show that the Transformer is superior in quality while being more parallelizable and requiring significantly less time to train. On the WMT 2014 English-to-German translation task, the Transformer model achieved a BLEU score of 28.4, improving over the existing best results by over 2 BLEU. On the WMT 2014 English-to-French translation task, the Transformer model achieved a BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. The Transformer was also successfully applied to English constituency parsing both with large and limited training data. The Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution."
What even is a summary.. I'm beginning to wonder. meaning itself is disintegrating.. Where is knowledge? call me a heretic but I say "In Us or nowhere."
That's a very good question. I think GPT-3 shows knowledge is not in the brain, or in the neural network, but in the language itself. A human without language is not very smart.
The main contribution of this paper is that the authors found a very computationally efficient way to process sequence data. That was a real breakthrough.
The previous approaches, like LSTM, GRE, etc were also working well on sequence data, but were much less efficient.
But what about the inventor of attention itself? and what about the inventors of MLP? What about the inventor of backpropagation? And who s that guy who invented matrix multiplication?
Apologies for making a more meta-comment. But I think the title of the post title should be the the paper title "Attention Is All You Need" as per the HN guidelines.
"...please use the original title, unless it is misleading or linkbait; don't editorialize." [0]
I'm completely alien to this stuff, so excuse me, but does the attention mean that, if, on average in all texts seen, "jane visits" happens more often than "jane africa", that means that in the sentence "jane visits africa" the second word should pay most "attention" to the first word?
It's part of the puzzle. You could also mention Schmidhuber who claimed that he invented attention a long time ago (didn't read his paper though, so hard to judge).
It's a pedagogic paper, not scientific paper. People who invent things, and people who teach the things so that they become main stream often are very different kinds of people.
Also in fairness to LeCun, that sentence started with “in terms of underlying techniques”, and his words reported in the media are guaranteed to be a subset of what he said. A journalist talked to him, found an angle, and quoted to support that angle.
What comes across is someone who downplays both the effort involved with and the importance of the “boring” stuff, the tweaks and product design that tie everything together, but we actually don’t know that he thinks like that. We just know that he said some things that can be excerpted to sound that way.
I don't know, personally I don't find anything revolutionary about ChatGPT either. At best, it's a cute toy that appeals to the masses. I think it's like looking at a cloud, people see what they want to see
Of course he would say this, as he’s a direct competitor that failed to get the popularity with Galactica. The public has voted, and agreed it’s a breakthrough. Indeed nothing else like it has existed before - so how is it not?
This is such a salty take by LeCun. It's an important reminder that someone can be incredibly smart and still have bad opinions.
This would be like saying software isn't novel - everything is just an extension of that first NAND/NOR gate or that it's not useful because the computer doesn't actually understand the knowledge it's storing and is therefore no better than a clay tablet.
That said, still mind blowing how far ahead Google was (is?) on research, data, compute, and how hard it is to actually novel things with that tech (vs OpenAI which... Just ships things).