Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How can you identify content generated with them?


I'm not saying that Meta did it, but recent research shows that it is possible and hard to detect - https://arxiv.org/abs/2204.06974 - so if they really wanted to, they could.


That paper is not about fingerprinting the arbitrary output of a specific model, which would allow Meta to track its usage in the results, e.g. tell a genuine text from a fake generated by their model. The paper implies giving the model some specific secret input only known to you.

I think the thread we're in is also based on the similar misunderstanding.


By training a GAN. A trained GAN will be able to accurately guess whether a block of text was produced by this GPT model, some other GPT model, or is authentic.


Just so I understand you properly:

Original Inputs (A) -> NN (Q) -> Output (X)

You are saying you could train something that would take X and identify that it is the product of NN (Q). Even though you don't know A?

So, to simplify and highlight the absurdity: If I made a NN that would complete sentences by putting a full stop on the end of open sentences. You could train something that could detect that separately to a human placed full stop?

(This seems actually impossible, there is an information loss that occurs that can't be recovered)


Can you identify GPT text versus authentic text? If so, then there are features in that text that give it away. It stands to reason that there exist other features in the text, based on the training data the model was fed, and other characteristics of the model, that a discriminator model could use to detect, with some confidence, which model produced the text. A discriminator model which can detect a specific generative model essentially captures its "fingerprint".

An example of some of these features might be the use of specific word pairs around other word pairs. Or a peculiar verb conjugation in the presence of a specific preposition.


If differentiating between real samples and generated ones were as straightforward as "training a GAN", detecting deep fakes would not be as big of a research topic as it is.


The point is that it's possible and we're improving on it every day.


Know any papers where someone has done this with large language models successfully?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: