We need to crowdsource a db of websites that have opted to exclude themselves from the #WaybackMachine. The WM has become essential with so many Tor-blocking and #Cloudflare blocking sites. I don't want to see WM-excluded sites in my search results. Such archive-resisting sites also downgrade blogs (a dead link invalidates part of an article when there's no archive)
If you discover a website that has opted out of archive.org's #WaybackMachine, there is now a place where you can list them: https://git.sdf.org/deCloudflare/deCloudflare/src/branch/master/anti-tor_users/misc/blocking_archiveorg.md
@resist1984 is that page still available? I do think there are some legitimate reasons to want to block iabot from archiving your site, just like there are for indexing.
@edsu it has moved to https://git.nogafam.es/deCloudflare/deCloudflare/src/branch/master/anti-tor_users/fqdn/antiarchive.txt I'm not clear on what legitimate reason you have in mind for blocking bots from harvesting. Can you give an example?
@resist1984 thanks! Generally speaking I think that's up to the publisher to decide. The Internet Archive doesn't own the web and if you don't want them to serve up your content in perpetuity I think that's ok.
@resist1984 I don't know if this is helpful, but you could try to collect some leads for the list from Google, for example: https://www.google.com/search?q=filetype%3Atxt+ia_archiver&hl=en&ei=pjOIYO3oGLKy5NoP6c-TkA4&oq=filetype%3Atxt+ia_archiver&gs_lcp=Cgdnd3Mtd2l6EANQxg5YxStgvC5oAXAAeAGAAW2IAeYOkgEEMjQuM5gBAKABAaoBB2d3cy13aXrAAQE&sclient=gws-wiz&ved=0ahUKEwjt0_ew5J7wAhUyGVkFHennBOIQ4dUDCA0&uact=5