We need to crowdsource a db of websites that have opted to exclude themselves from the #WaybackMachine. The WM has become essential with so many Tor-blocking and #Cloudflare blocking sites. I don't want to see WM-excluded sites in my search results. Such archive-resisting sites also downgrade blogs (a dead link invalidates part of an article when there's no archive)
@resist1984 How can I report sites without registration?
It should be as simple as possible.
@Br0m3x You can simply mention them in this thread. I would normally be tempted to generally say run something like this: "git clone --config http.proxy=http://127.0.0.1:8118 https://git.sdf.org/deCloudflare/deCloudflare.git" followed by: "git log --format='%aE' | sort -u" and pick an email address, but in this case that's a quite large download and most of the email addresses are fake.
@Br0m3x Perhaps we should designate a hashtag as well: #wbmblocklist (wbm abbreviates #WayBackMachine) so those maintaing the list can search easily.
@Br0m3x There is now a way to post an issue from Mastodon without having an account: https://git.nogafam.es/deCloudflare/deCloudflare/src/branch/master/subfiles/anonymous_issue.md
@resist1984 great!
@resist1984 404
The page you are trying to reach either does not exist or you are not authorized to view it.
@resist1984 is that page still available? I do think there are some legitimate reasons to want to block iabot from archiving your site, just like there are for indexing.
@edsu it has moved to https://git.nogafam.es/deCloudflare/deCloudflare/src/branch/master/anti-tor_users/fqdn/antiarchive.txt I'm not clear on what legitimate reason you have in mind for blocking bots from harvesting. Can you give an example?
@resist1984 thanks! Generally speaking I think that's up to the publisher to decide. The Internet Archive doesn't own the web and if you don't want them to serve up your content in perpetuity I think that's ok.
@edsu Archive.org gives publishers control, most likely to avoid legal problems. So while it is up to the publisher, as users we have a right to judge that. Now that the #WaybackMachine has become indispensible (due to Tor-hostility), those who act against WBM act against Tor & thus against privacy. They are not our friends and we have a right to resist propagation of their website URLs.
@edsu The blocklist is merely objective data for people to use as they see fit. What I hope will happen is someone will cross-reference the wbm blocklist with Tor-blocking sites, and reduce search rankings of sites that block both.
@resist1984 I don't know if this is helpful, but you could try to collect some leads for the list from Google, for example: https://www.google.com/search?q=filetype%3Atxt+ia_archiver&hl=en&ei=pjOIYO3oGLKy5NoP6c-TkA4&oq=filetype%3Atxt+ia_archiver&gs_lcp=Cgdnd3Mtd2l6EANQxg5YxStgvC5oAXAAeAGAAW2IAeYOkgEEMjQuM5gBAKABAaoBB2d3cy13aXrAAQE&sclient=gws-wiz&ved=0ahUKEwjt0_ew5J7wAhUyGVkFHennBOIQ4dUDCA0&uact=5
@resist1984 I mean, maybe it's to avoid legal problems, but I like to think it's also because they recognize it's the right thing to do. There are lots of shades of gray in the world and the world wide web is no exception.
If you discover a website that has opted out of archive.org's #WaybackMachine, there is now a place where you can list them: https://git.sdf.org/deCloudflare/deCloudflare/src/branch/master/anti-tor_users/misc/blocking_archiveorg.md