#CloudFlare is now hitting the archive.org wayback machine with the same #CAPTCHA as #Tor users, thus censoring history too.
@resist1984 One reason more to use Cloudflare.The wayback machine keeps data from all websites forever.Maybe it's interesting history for some public information but they also save social media,forums,everyone you ever wrote with no way to delete it.That's pure evil.
@nipos @resist1984 But that's why they are respecting the same flags as search engines. If you don't allow searchengines to index a part of your page, wayback machine won't do so. And without a robots.txt or something comparable, it won't happen...
@frommMoritz @nipos @resist1984 no, they aren't respecting these flags anymore
@dadosch
I think it was later clarified, that it meant retrospect deletion.
Before the change the wayback machine had the problem that when someone new bought the domain and restricted the access via robots.txt, the whole archive of the site of the previous owner was deleted.
This was not only bad because of accidental deletion, but people could even intentionally destroy parts of the archive by buying old domains.
For just denying access to the archive bot have a look at the "noarchive" flag.
@nipos @resist1984 they only save websites that allow crawlers. So disabling crawlers for website means it won't be saved.
@nedelne_rano @resist1984 @frommMoritz There's a big difference in making content searchable or cloning it completely forever.If it's in the search and the author decides to delete it,search links will return Error 404 after clicking it.Yes,there may be some other cache but I'm talking about pure search results.This isn't problematic.If you delete it and there's an exact copy of the page which isnt removed,this is a problem in some cases.
wayback machine respects robots.
https://blog.reputationx.com/block-wayback-machine
And claiming that it is "one more reason to use cloudflare" is kinda wierd.
And yes, you can get your site removed from wayback machine.
@fortune
@nipos @nedelne_rano @resist1984 @frommMoritz
You can also request do not be added anymore
@nipos
@nedelne_rano @resist1984 @frommMoritz
Google cache
@nipos @nedelne_rano @resist1984 @frommMoritz
Also one of my hobbies is archiving
@nipos @nedelne_rano @resist1984 @frommMoritz and yet history is important. There is a balance to be found here somewhere.
If you don't want your information public, don't make it public. Facebook already disallows crawling, so does Twitter, by the way. So your point is mostly moot anyway.
CloudFlare unilaterally deciding to screw over one of the main projects keeping Internet history is not that balance.
@nipos
Whether your dislike of the Wayback machine is justified or not -- putting them behind access restrictions like this only limits access by people who care about privacy, and disabled people -- hardly "just dessert"
@resist1984
@Mr_Teatime @resist1984 Nope because I explicitly whitelisted Tor in my Cloudflare settings resulting in Tor users being able to access the site without seeing a shitty Google captcha 😉
@nipos @Mr_Teatime #CloudFlare w/ #Tor whitelisted is even worse, b/c then Tor users don't know they are interacting w/a CF MitM. Tor users then unwittingly support a Tor adversary.
@Mr_Teatime @nipos archive.org does #Tor users a service b/c it helps bypass the #CAPTCHA (if needed) & ensures the target site is not rewarded w/traffic or interaction.
@nipos @Mr_Teatime there is a very useful browser plugin that detects #CloudFlare & automatically redirects to the archive of the page.
@resist1984
Do you have the name (or a link) to hand?
@nipos
@Mr_Teatime @nipos the Firefox plugin that redirects CF sites is called "Block Cloudflare MITM Attack" and is posted here: https://addons.mozilla.org/en-US/firefox/user/14218621/. Description is in cyrillic but don't let that scare you off. This plugin will outright block CF sites: https://gitlab.com/gkrishnaks/cloud-firewall
@g at a high level, #CloudFlare is very similar to #SpamHaus. In both cases you have a vigilante extremist org so fixated on attacking their enemy that they have no regard for collateral damage to harmless users. Ppl cannot protect their own #privacy by running their own mail server b/c of SpamHaus, & ppl cannot protect their own identity b/c CF DoS's *all* #Tor users.
@g #CloudFlare also harms non-Tor users by MitMing the connection. CF sees every username & unhashed password even when a TLS padlock is present.
@g w.r.t finding off DDoS attacks, note 1st that any CDN will offer that.. no reason to use #CloudFlare. Also, once you have a DDoS attack, CF is no longer gratis. CF will force you to upgrade to premium b/c the attack counts toward your bandwidth allowance.
@g I could write a book on this. I'll also mention that #CloudFlare uses #Google's #CAPTCHA, & that's a #privacy abuse in itself. Google links your logged in cookie w/the CF site the CAPTCHA is on.
@g some problems like having visibility on all traffic are shared across all CDNs, so it's best to avoid CDNs entirely if possible. But if you must use a CDN, #CloudFlare is the worst of the worst.. it shelters criminals and harms #humanrights
@g the TLS tunnel terminates at #CloudFlare, so CF sees all traffic. It must work that way. If CF were to simply proxy all encrypted traffic to the origin, then it would fail to relieve the originating server of workload.
@g this article covers it in detail: http://cryto.net/~joepie91/blog/2016/07/14/cloudflare-we-have-a-problem/
@g np. And note there may or may not be a 2nd tunnel between the originating server & CF, but in either case the end user sees a padlock
So if I have a SSL certificate from Let's Encrypt setup, but still sit behind Cloudflare, is my data still compromised?
@resist1984 I know why I never liked them...
@resist1984 Works here without. Seems it's triggered under certain conditions.. https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Captcha
@resist1984
I recently read an article about how those #CAPTCHA things that ask you to do stuff like, "Click all the boxes that have cars" actually have much less to do with proving you're human than with training various AI products to better recognize things in photos. Basically they're outsourcing their AI training to the public and falsely framing it as a security mechanism. It might have been Edward Snowden's book, I'm not sure.
@gerowen indeed, #Google is using #CAPTCHA to exploit labor, which in some situations in the US amounts to a 13th amendment issue (involuntary servitude). Google's CAPTCHA also gives preferential treatment to logged in Google users, which adds to the tracking of those users by associating them to the site where the captcha is presented.
@resist1984