> it's a transformer based natural language model just like GPT its an encoder-d...

rolisz · on Sept 8, 2023

BERT is encoder only and was designed for classification and natural language inference problems. The original Transformer was encoder-decoder and was designed for translation.

BERT can't be used in an autoregressive way because it doesn't output a new token, it simply generates embeddings from the existing tokens (you get one for each input token).

amilios · on Sept 8, 2023

To clarify for people familiar with the names but not what corresponds to what:

BERT - Encoder-only - embeddings for downstream tasks

GPT/OPT/etc - Decoder-only - language generation

T5/T0 - Encoder-decoder. Kind of does both?

abhijitr · on Sept 8, 2023

BERT is encoder only..