I built and trained a BERT on my gaming laptop (3070 RTX) to ~94% of BERT-base's...

swyx · on Sept 8, 2023

> it's a transformer based natural language model just like GPT

its an encoder-decoder model whereas GPT is decoder only. feels like a pretty big difference, though in practice i honestly still dont have a strong grasp of how encoder-decoder is deficient to decoder-only when it comes to text generation. i get that BERT was designed for translation but why cant we scale it up and use it for textgen just the same

rolisz · on Sept 8, 2023

BERT is encoder only and was designed for classification and natural language inference problems. The original Transformer was encoder-decoder and was designed for translation.

BERT can't be used in an autoregressive way because it doesn't output a new token, it simply generates embeddings from the existing tokens (you get one for each input token).

amilios · on Sept 8, 2023

To clarify for people familiar with the names but not what corresponds to what:

BERT - Encoder-only - embeddings for downstream tasks

GPT/OPT/etc - Decoder-only - language generation

T5/T0 - Encoder-decoder. Kind of does both?

abhijitr · on Sept 8, 2023

BERT is encoder only..

amelius · on Sept 7, 2023

> where GPT is used to generate text, BERT is used to generate embeddings for input text that you can then use for predictive models (e.g. sentiment prediction)

Ok, but isn't text generation more general? E.g. you could ask it to predict the sentiment of a sentence and write the result as a sentence?

samvher · on Sept 7, 2023

Yeah my explanation was definitely a lossy summary. You can do similar things with GPT, but BERT is bidirectional, so for a given token it can take into account both tokens before and after it. GPT would only take into account tokens before it. Looking both ways can be helpful. Another comment in this thread explains the same (maybe clearer).

jakderrida · on Sept 8, 2023

Yeah, he's glossing over some things, but with good reason. Might be more accurate to say BERT is a discriminant model, while GPT is a generative model. BERT was trained using Masked Language Model process, which is different from the decoder-only process used for the first GPT. Sentiment prediction seems like more of a particular thing BERT is capable of. There are many more capabilities, but GPT has sort of steered the industry towards generative models. https://huggingface.co/docs/transformers/main/tasks/masked_l...

GPT and BERT were actually the first models published after Attention was published by Google.

extasia · on Sept 8, 2023

BERT is ~100m parameters. GPT3.5 is 1000x bigger. A Bert sized decoder model would likely generate nonsense

For most applications BERTs output is used to fine tune additional NN layers, eg for text classification

amrrs · on Sept 7, 2023

I'm sorry if it's off-topic but did the laptop hold up well? Did you have any special cooling mechanism?

samvher · on Sept 7, 2023

Haha fair question. I didn't make any special changes, I just left the lid open and put the laptop in a ventilated spot. I'm actually in the tropics, so I guess Lenovo scores some points here (the laptop is a Legion 5 Pro).

gerdesj · on Sept 7, 2023

If you want to really cook your lap, try running Gentoo! Emerging (compiling) Firefox, glibc, gcc, Libre office and a few of their friends will soon show you how good the cooling is.

A few years back cough I upgraded gcc from 3 to 4 and emerged system and then world. That was over 1200 packages. It took about a week. That was in the days when I used a Windows wifi driver and some unholy magic to get a connection. I parked the laptop on a table with the lid open and two metal rods lifting it up 6" for airflow. I left the nearby window open a bit too.

eru · on Sept 8, 2023

Going off on a tangent: I used to use Gentoo in the past. I suspect, if you use one of the common processors, compiling your own binaries doesn't really give you any performance benefits, does it?

(I have to admit I stopped using Gentoo mostly because it encouraged me to endlessly fiddle with my system, and it would invariably end up broken somehow. That's entirely my fault, and not Gentoo's. I switched to Archlinux as my distribution of choice, and I manage to hold myself back enough not to destroy my installation.)

TrueDuality · on Sept 8, 2023

Quite a bit of performance. Generally Linux distro (and really all general purpose OS's) need to limit the CPU features they can to a minimum common baseline. Your machine support AVX512 instructions? Those instructions won't be used by the compiler because it's not available everywhere the software will run. By compiling yourself, you can specialize the compilation to the features on the machine.

Beyond that the big win over performance even is customization. The most secure code, and the fastest code, is the code that isn't there at all. Do you really need your entire system to support LDAP authentication? Maybe... What about your local email daemon? Do you need that? Because cron does and since your mail daemon also has MySQL support built in, installing cron gets you the MySQL libraries.

I don't use it anymore because of the overhead, but there are a lot of performance and security benefits to be had there.

eru · on Sept 10, 2023

https://www.reddit.com/r/Gentoo/comments/f2q6ew/comment/fhe3... says

> Realistically, though, most software that can benefit from specialized instructions already detects their availability at runtime and uses those, even if the code was compiled with -march=x86-64.

unixhero · on Sept 7, 2023

Do you wish to share your source code with the community?

samvher · on Sept 7, 2023

It's all behind the submission link! I've set it up so that you can run it start to end, if you want. The only thing I'm not 100% sure about is resource requirements - I have an 8GB GPU and 32GB of RAM, it could be that if you have less than that you'd run into out of memory errors. Those would be fairly straightforward to fix, though (honestly I'd be happy to help if someone runs into this).

unixhero · on Sept 7, 2023

Thanks! Cool, no worries, I have more onboard gpu memory avail