We need to crowdsource a db of websites that have opted to exclude themselves from the #WaybackMachine. The WM has become essential with so many Tor-blocking and #Cloudflare blocking sites. I don't want to see WM-excluded sites in my search results. Such archive-resisting sites also downgrade blogs (a dead link invalidates part of an article when there's no archive)
If you discover a website that has opted out of archive.org's #WaybackMachine, there is now a place where you can list them: https://git.sdf.org/deCloudflare/deCloudflare/src/branch/master/anti-tor_users/misc/blocking_archiveorg.md
@resist1984 is that page still available? I do think there are some legitimate reasons to want to block iabot from archiving your site, just like there are for indexing.
@edsu it has moved to https://git.nogafam.es/deCloudflare/deCloudflare/src/branch/master/anti-tor_users/fqdn/antiarchive.txt I'm not clear on what legitimate reason you have in mind for blocking bots from harvesting. Can you give an example?
@edsu Archive.org gives publishers control, most likely to avoid legal problems. So while it is up to the publisher, as users we have a right to judge that. Now that the #WaybackMachine has become indispensible (due to Tor-hostility), those who act against WBM act against Tor & thus against privacy. They are not our friends and we have a right to resist propagation of their website URLs.
@resist1984 I don't know if this is helpful, but you could try to collect some leads for the list from Google, for example: https://www.google.com/search?q=filetype%3Atxt+ia_archiver&hl=en&ei=pjOIYO3oGLKy5NoP6c-TkA4&oq=filetype%3Atxt+ia_archiver&gs_lcp=Cgdnd3Mtd2l6EANQxg5YxStgvC5oAXAAeAGAAW2IAeYOkgEEMjQuM5gBAKABAaoBB2d3cy13aXrAAQE&sclient=gws-wiz&ved=0ahUKEwjt0_ew5J7wAhUyGVkFHennBOIQ4dUDCA0&uact=5
@resist1984 I mean, maybe it's to avoid legal problems, but I like to think it's also because they recognize it's the right thing to do. There are lots of shades of gray in the world and the world wide web is no exception.
@edsu The blocklist is merely objective data for people to use as they see fit. What I hope will happen is someone will cross-reference the wbm blocklist with Tor-blocking sites, and reduce search rankings of sites that block both.