Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's what robots.txt does.

However you'd have to delist yourself from search engines to fully prevent AIs from reading the content on your website.



> That's what robots.txt does

It most certainly does not. robots.txt is almost totally worthless against genAI crawlers. Even being unindexed from search engines doesn't keep you safe.


This is factually false.

There's ample documentation of crawlers straight-up ignoring robots.txt.

It's not a legal control, but a technical one - and a voluntary one, which means that it's trivial to ignore.

And there's obviously nowhere to put a robots.txt for a book that you've published.


The biggest, best, most reputable organizations e.g. Google, Bing, Yahoo, Yandex, Baidu, DuckDuckGo, OpenAI, and Anthropic have all publicly promised to respect your robots.txt file. You can make them hurt if they lie. So you know they're telling the truth. There's some people out there who don't respect robots.txt like Archive Team. However they're more likely to be treated as folk heroes here on Hacker News than trigger AI training fears.


That's a naive statement about robots.txt; nothing about it is binding or enforceable. It is a request that well-behaved crawlers heed. Other crawlers treat the Disallow section as a list of targets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: