#Cloudflare has now surpassed MitMing *1/3rd* of the world's websites, controlling a staggering 34%. And yet privacy seekers naively continue to use so-called "privacy focused" search engines like #DuckDuckGo, which neglects to filter out Cloudflare sites from the results.
@josias This article claimed in 2019 that Cloudflare had 34.55% of the CDN market (french): https://web.archive.org/web/www.zdnet.fr/actualites/quand-cloudflare-begaie-internet-trebuche-39887031.htm If that's true, then it probably has less than 34% of the whole web regardless of CDNs, even with 2 years of growth.
@resist1984 Any suggestions for search engines which *do*?
@Coffee Ss filters out #Cloudflare sites & folds them at the bottom of the results. Clearnet users can reach it at https://sercxi.eu.org/ & Tor users can visit https://sercxi.nnpaefp7pkadbxxkhz2agtbv2a4g5sgo2fbmv3i7czaua354334uqqad.onion/ This is the best search engine in the world. It puts #DuckDuckGo to shame and it's better than any #searx instance i've used.
@resist1984
Thanks! I'll be giving that a try.
@resist1984 I've tried opening it back then, and just tried opening it now, but I can't open it in Lynx. It closes the connection as soon as the HTTP request is received.
@Coffee lynx is broken on Ss for me too. If i run "torsocks lynx -noreferer -head 'DNT: 1' $URL" it's instant death. Strangely, it goes a little further if I run lynx inside of a tor middlebox and firejail.. it prompts for ssl cert acceptance. But then it still fails anyway. Ss requires do-not-track to be enabled, but -head should have achieved that unless i wrote it wrong.
@resist1984 Ah. Good that they provide a helpful error message instead of just closing the conne... oh wait.
Anyway, -head doesn't work the way you think. I checked the man page, and turns out Lynx doesn't have a way to add custom headers, nor a way to turn on DNT.
That said, using links with "-http.do-not-track 1" doesn't work either. It makes me accept the certificate, and then makes 3 connection attempts before reporting "Error reading from socket".
@resist1984 Actually, I only have to accept the certificate when connecting to the .onion domain. Probably because the certificate on sercxi.eu.org is valid.
@resist1984 A search engine should not filter ANY sites from the results...
@datenschutzratgeber If you know of a client-side app that can push junk results down to page 20 so the server can send /all/, I'd be keen to know about it. Until then, we count on search engines not just to find shit but also to organize by putting the best results in view and hide the garbage. Research has shown that a link in search results is *twice* as likely to get clicked than the link below it.
@datenschutzratgeber to say "A search engine should not filter ANY sites" grossly misunderstands the purpose of a search engine. If a search were to return the whole index known to the engine, you would learn very quickly the value of filtering search results. Filtering is the core activity of what search engines do.
@resist1984
I remember in the 90s we used to have applications that filtered results from several search engines. It was considered better to check several search engines depending on what you were searching for.
@datenschutzratgeber
@onepict @resist1984 So basically meta search engines?
@datenschutzratgeber @onepict just like DDG is a metasearch engine, so would be any app that harvests results from other engines. Analogous to a search app would be using #YaCy perhaps in combination with searx, but those tools don't have a "filter out Cloudflare" switch.. it would still need to be created. It's feasible but in the end doesn't solve the problem of getting loyal #DDG users off false-privacy.
@otso @onepict @resist1984 Privacy 🇩🇪
@resist1984 I was talking about filtering sites from the search results, not from the search index. Of course, a search engine is supposed to only display relevant results but removing sites from the list because they use Cloudflare is just paternalism.
Also, the search engine would have to maintain an ever-changing second index of Cloudflare sites.
@datenschutzratgeber #Cloudflare results are irrelevant to privacy seekers. There is plenty of choice for privacy-ambivilous users (Google, DuckDuckGo, Bing, Yahoo [#DDG is falsely positioned]). Filtering CF sites doesn't need a site index, just CIDRs. Ss filters out CF sites just fine, and that's basically a garage operation on a shoestring budget.
But if you filter Cloudflare sites, you should also filter the ones that use tracking etc. Which basically reduces the result list by 60+ %.
What's "Ss"? Does that work on a large scale?
@datenschutzratgeber Luckily we're not on a slippery slope here. We can easily nix the worst ~34% and still function. At large scales, nixing the 34% big offenders would have the effect of shrinking the 34% (sites want to be found).
@datenschutzratgeber Ss is handling the current scale just fine but it has a breaking point, which is why other search engines calling themselves "privacy focused" need to actually become privacy focused.
@resist1984 I don't think that would have any significant effect because major search engines like Google, Bing etc. would still list them on top 🤔
@datenschutzratgeber Yes, they would. But the idea @resist1984 offers is to filter client side. You can manipulate the downloaded content in the browser as you like. I delete cookie consent dialogs, quite often successfully, or filter certain content by type or origin domain in uMatrix. Why not change orders or suppress paid results based on filter rules?
@datenschutzratgeber The rankings of the majors can be discarded since we don't use the majors directly. A metasearch engine (or user app) controls its own ranking. Some searx instances will alternate round-robin: #1 from google, #1 from bing (if different), #1 from giga, #2 from google, etc. No reason bad sites can't be filtered.
@resist1984 Well, filtering on client-side is something completely different. I was talking about (mandatory) server-side filtering by the search engine itself.
@datenschutzratgeber Ss already demonstrates that it's possible. Why wouldn't it be? If a source decides to give 100% Cloudflare results, they can simply be dropped as a source.
@resist1984 seems like 34% of total traffic is a good reason to abstain from censoring them. Would you more satisfied with a ddg warning on cloudfare links?
@Apolyon_LMR If DDG is listening, I would appreciate a search directive that makes it filter out cloudflare sites.
Cloudflare results are useless to me anyway, because they look like this...
@Apolyon_LMR it's not good reason. My first choice of search engine is sercxi.nnpaefp7pkadbxxkhz2agtbv2a4g5sgo2fbmv3i7czaua354334uqqad.onion precisely because it filters out unusable #Cloudflare garbage sites. Only when despiration calls do I unfold the CF sites at the bottom, & click the favicon to visit the IA mirror of those CF sites (in which case I still manage to avoid CF).
@resist1984 Privacy-related advantages of using search engines like https://duckduckgo.com/privacy are not just content control. They step you out of your filter-bubble. Today it is as if our library had two different old-fashioned card catalogs, based on our race. Worse, now global communications are mediated by analytic marketing bots, selecting your miniscule filter-bubble for you. What better explanation is needed to explain contemporary increases in cross-sector conflicts?
@Delib DDG users are inherently subjected to the same censorship that comes from Bing because Bing is DDG's source. While that censorship hits all DDG users across the board, there is also nothing to stop DDG from putting users in Microsoft's individualized filter bubble because MS sees the IP address & queries of DDG users (since DDG is also MS-hosted).
@Delib And because the public doesn't get to see the agreement between DDG & MS, there's no guarantee that DDG has mirrored through their contract with MS terms & conditions that uphold the effect of DDG's privacy policy.
@resist1984 I thought I read that duckduckgo makes the queries without sending through the user's IP info?
@Delib #DuckDuckGo does not pro-actively send IP info to MS w/the query according to the privacy policy (which is hard to trust considering #DDG has been caught violating it multiple times). But if you trust the privacy policy, MS still sees your IP when you connect since MS hosts DDG. And MS also sees your query which DDG proxies to Bing. So MS can work out for itself IP-query pairings.
@Delib I suggest reading http://techrights.org/2021/03/15/duckduckgo-in-2021/
@resist1984 So there are ways to trace internet users of duckduckgo (DDG), I see that from your refs. And we should always assume that the internet is insecure. It was never meant to be (not my preference of course). But one good thing about DDG privacy policies is that I can search this analagous 'library catalog' of the internet and be relatively sure I am getting the same results you get when we do the same search, no?
@Delib When #Microsoft censors a result from its searches, so does #DuckDuckGo because #DDG does not significantly diversify its sources. #MS is the primary source. So effectively you may not be in a personalized filter bubble, but you're in Microsoft's global filter bubble when using DDG
@resist1984 "you're in Microsoft's global filter bubble when using DDG" . Tnanks that's good to know. Will pay attention to that in the future.
@josias Sorry folks, I was wrong about the 34% figure. That's just the ratio of *gaming* sites that use #Cloudflare. The global count is underway and hopefully will be released soon.