Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The paper that made ChatGPT possible (arxiv.org)
173 points by nadermx on Feb 3, 2023 | hide | past | favorite | 55 comments


Well, my take as a non-AI expert is that it was this paper (transformers) in combination with RLHF that made ChatGPT possible (with the former coming out of Google and the latter, as far as I can tell a bunch of places including DeepMind, OpenAI, etc).

That said, still mind blowing how far ahead Google was (is?) on research, data, compute, and how hard it is to actually novel things with that tech (vs OpenAI which... Just ships things).


> vs OpenAI which... Just ships things).

Sam Altman practicing what he preached at YC for so many years. There's something wholly respectable about someone whose words closely match their actions.

And haven't we all been better for it?


A reminder that there's nothing open about OpenAI. Stable diffusion is open. OpenAI started spewing morality nonsense once they got something they could capitalize on. From supposedly non profit when there was no profit to be made to fake moral high ground, to becoming completely closed.

Hardly "practice what you preach". They may have delivered technologically, but they have completely backtracked on their openess promises.

I ran stable diffusion at home. Never could do the same with anything out of "Open"AI


"nothing open" doesn't fully hold up there: Whisper is a (somewhat ground breaking) model released by OpenAI that you can run on your own hardware.


> Hardly "practice what you preach".

My comment was about Sam Altman doing what he was telling startups to do for many years at YC, ie - ship product and iterat


They are open about their methods and goals, not open source.


Would you mind posting some of the papers (links)? I would love to read them. Thank you.


I heard the Kitchen thing release of their lamda work this week is shockingly bad by comparison. Not good news for those hoping Google had been keeping a pound for pound competitor to OpenAI in their back pocket.


Doesn't Google ship transformer stuff for translation? And some use in the search natural language summary/answer box, or is that hardcoded?


I mean... Plenty of papers did? Might as well just cite the 1986 Hinton backpropagation paper or the 2012 ImageNet paper.


Imagenet was a breakthrough in that they were a large curated dataset which was needed for supervised learning, but modern transformers rely on self-supervision


Weird take, but OK... BERT was arguably the real breakthrough moment, where everything started working.


I would say that BERT was an incremental improvement (as well as many others). But the idea of much more efficient sequence processing was introduced in this exact paper.


Attention is all you need is a classic and the transformer is the foundation. However there were 5 years of further work to get to ChatGPT.


Both ChatGPT and stable diffusion have gone by me because of work. Does anyone have a reading list to get up to speed in these areas?


Read the InstructGPT paper, it’s quite good: https://arxiv.org/abs/2203.02155


IMO you should just start playing with them. Midjourney gave by far the most impressive results with images but I also got bored fast with seeing amazing art.

chatGPT you just have to interact with it and get a feel for its strengths and weakness.


Midjourney provides better results out of the box, with Stable Diffusion you need to know how to use custom models and/or LoRas to get similar or better results, plus Midjourney does not allow you to generate NSFW and thus pornography (as it sometimes does with technology) has pushed SD performance even further; you can look to civitai with its huge collections of models (NSFW images are blurred but I would not suggest opening the site at work).


Ran the PDF through https://labs.kagi.com/ai/sum

"This paper presents the Transformer, a model architecture that relies entirely on an attention mechanism to draw global dependencies between input and output. Experiments on two machine translation tasks show that the Transformer is superior in quality while being more parallelizable and requiring significantly less time to train. On the WMT 2014 English-to-German translation task, the Transformer model achieved a BLEU score of 28.4, improving over the existing best results by over 2 BLEU. On the WMT 2014 English-to-French translation task, the Transformer model achieved a BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. The Transformer was also successfully applied to English constituency parsing both with large and limited training data. The Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution."

How accurate is the summary?


What even is a summary.. I'm beginning to wonder. meaning itself is disintegrating.. Where is knowledge? call me a heretic but I say "In Us or nowhere."


> Where is knowledge?

That's a very good question. I think GPT-3 shows knowledge is not in the brain, or in the neural network, but in the language itself. A human without language is not very smart.


you might find this episode of radio interesting https://radiolab.org/episodes/91725-words


I read the middle paragraph before the first/last sentence, and thought it was a great summary for what it's worth.


Transformers are much bigger than just ChatGPT!


One might even say they are more than meets the eye.


Indeed, chatbots in disguise


Care to elaborate?


Honestly barring the whole pandemic, 2017 seems like yesterday. The progress on these models has been so fast


I need an AI co-brain to even have the ambition of catching up


The main contribution of this paper is that the authors found a very computationally efficient way to process sequence data. That was a real breakthrough.

The previous approaches, like LSTM, GRE, etc were also working well on sequence data, but were much less efficient.


But what about the inventor of attention itself? and what about the inventors of MLP? What about the inventor of backpropagation? And who s that guy who invented matrix multiplication?


Pretty sure all of those apart from matrix multiplication were invented by Turning award loser Jürgen Schmidhuber.


> And who s that guy who invented matrix multiplication?

Incidentally there's a link on the front page about this, https://news.ycombinator.com/item?id=34653462


Backpropagation - Seppo Linnainmaa

https://en.m.wikipedia.org/wiki/Seppo_Linnainmaa


Who counted his fingers first? That's the question.


hi Jürgen


Apologies for making a more meta-comment. But I think the title of the post title should be the the paper title "Attention Is All You Need" as per the HN guidelines.

"...please use the original title, unless it is misleading or linkbait; don't editorialize." [0]

[0] https://news.ycombinator.com/newsguidelines.html


It’s crazy how we went overnight from ai winter to ai summertime.

I truly believe that the next Google will be founded this year (if it is not already here).


I'm completely alien to this stuff, so excuse me, but does the attention mean that, if, on average in all texts seen, "jane visits" happens more often than "jane africa", that means that in the sentence "jane visits africa" the second word should pay most "attention" to the first word?


What about Neural Machine Translation? They were the first to propose the attention mechanism


It's part of the puzzle. You could also mention Schmidhuber who claimed that he invented attention a long time ago (didn't read his paper though, so hard to judge).


You're better off reading "The Illustrated Transformer" by Jay Allamar


It's a pedagogic paper, not scientific paper. People who invent things, and people who teach the things so that they become main stream often are very different kinds of people.


Another take I can recommend is the “Annotated ’Attention is all you need’” which is a collab notebook with relevant code inserted.


It would be good to know the conceptual breakthroughs required to make these things. Seems Nobel prize worthy.


It follows a carefully curated script. Hardly A.I.


It doesn't. Chatbots from the 80s did


I tested it, it admitted to following a script updated by the openAI group. It then contradicted itself claiming it forms its' own opinions.


ChatGPT is not the breakthrough it's been made out to be. Yann LeCun has explicated this quite well.


If anything this is why someone like LeCun isn't good at shipping a product people want.

It seems like we lack a word in English for knowing too much about a subject that you can't see the forest for the trees.

To say chatGPT is not revolutionary is as good an example of not seeing the forest for the trees that I can think of.


Also in fairness to LeCun, that sentence started with “in terms of underlying techniques”, and his words reported in the media are guaranteed to be a subset of what he said. A journalist talked to him, found an angle, and quoted to support that angle.

What comes across is someone who downplays both the effort involved with and the importance of the “boring” stuff, the tweaks and product design that tie everything together, but we actually don’t know that he thinks like that. We just know that he said some things that can be excerpted to sound that way.


I don't know, personally I don't find anything revolutionary about ChatGPT either. At best, it's a cute toy that appeals to the masses. I think it's like looking at a cloud, people see what they want to see


Of course he would say this, as he’s a direct competitor that failed to get the popularity with Galactica. The public has voted, and agreed it’s a breakthrough. Indeed nothing else like it has existed before - so how is it not?


This is such a salty take by LeCun. It's an important reminder that someone can be incredibly smart and still have bad opinions.

This would be like saying software isn't novel - everything is just an extension of that first NAND/NOR gate or that it's not useful because the computer doesn't actually understand the knowledge it's storing and is therefore no better than a clay tablet.


Wow the quality of HN submissions has gone downhill. The world is bigger than ChatGPT. Signed, ML Scientist




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: