Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I heard stories of incriminating stuff for higher-ups disappearing from archive.org.


I heard stories about a potential Oracle data breach (I think mainly affecting their customers) being removed from Archive.org too. It’s because in general, they comply with requests to remove stuff, which is understandable from an ethical perspective. But do they at least try to explain the reason for the takedown? Is it just not feasible to do that?


Archive.org honors robots.txt retroactively. So anybody can take down their own stuff by adding a link to their robots.txt file.


This is no longer true. They changed their policy to ignore robots.txt in 2017. I seem to recall that they still respected robots.txt later, though I can’t find any more information on it and may be misremembering. Currently, they do not.


Does it mean archive.org works for any sites?

My main use for archive.is is for sites that somehow cannot be archived (a message will show up mentioning this site cannot be archive or something along these lines).

archive.is is generally pretty good in forcibly attempting to get an archive, if the HTML doesn't work, the screenshot will work fine. Although archive.is doesn't seem to handle gifs/videos.


> Does it mean archive.org works for any sites?

They respected exclusion requests after they stopped to respect robots.txt. I don't know their policy for new exclusion requests.


Oh. Did not know that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: