@datenschutzratgeber Luckily we're not on a slippery slope here. We can easily nix the worst ~34% and still function. At large scales, nixing the 34% big offenders would have the effect of shrinking the 34% (sites want to be found).
@cryptoxic that's a CF site, and archive.org is down right now when visiting https://web.archive.org/web/www.ul.com/news/build-trust-3d-manufactured-buildings-ul-3401 thus, that article is unreachable from the free world.
@datenschutzratgeber @onepict just like DDG is a metasearch engine, so would be any app that harvests results from other engines. Analogous to a search app would be using #YaCy perhaps in combination with searx, but those tools don't have a "filter out Cloudflare" switch.. it would still need to be created. It's feasible but in the end doesn't solve the problem of getting loyal #DDG users off false-privacy.
@datenschutzratgeber #Cloudflare results are irrelevant to privacy seekers. There is plenty of choice for privacy-ambivilous users (Google, DuckDuckGo, Bing, Yahoo [#DDG is falsely positioned]). Filtering CF sites doesn't need a site index, just CIDRs. Ss filters out CF sites just fine, and that's basically a garage operation on a shoestring budget.
""Fuck off!": says Pink Floydās @rogerwaters to Mark Zuckerberg.
After being offered "a large amount of money" to allow the use of āAnother Brick in the Wallā to promote Instagram & Facebook.
Speaking at another āFree Assangeā forum."
@datenschutzratgeber to say "A search engine should not filter ANY sites" grossly misunderstands the purpose of a search engine. If a search were to return the whole index known to the engine, you would learn very quickly the value of filtering search results. Filtering is the core activity of what search engines do.
@datenschutzratgeber to say "A search engine should not filter ANY sites" grossly misunderstands the purpose of a search engine. If a search were to return the whole index known to the engine, you would learn very quickly the value of filtering search results. Filtering is the core activity of what search engines do.
@datenschutzratgeber If you know of a client-side app that can push junk results down to page 20 so the server can send /all/, I'd be keen to know about it. Until then, we count on search engines not just to find shit but also to organize by putting the best results in view and hide the garbage. Research has shown that a link in search results is *twice* as likely to get clicked than the link below it.
@boud glad to see it made it onto SWH. i'm surprised it renders the markdown for README.md but not for forge_comparison.md or github.md. Perhaps it ruins the possibility of only citing the SWH location in journals, since the reader may no automatically think to go to the original source from there.
@Coffee Ss filters out #Cloudflare sites & folds them at the bottom of the results. Clearnet users can reach it at https://sercxi.eu.org/ & Tor users can visit https://sercxi.nnpaefp7pkadbxxkhz2agtbv2a4g5sgo2fbmv3i7czaua354334uqqad.onion/ This is the best search engine in the world. It puts #DuckDuckGo to shame and it's better than any #searx instance i've used.
@anniemo71 @davidoclubb Actually #DDG is a poor choice for #privacy. http://techrights.org/2021/03/15/duckduckgo-in-2021/
#Cloudflare has now surpassed MitMing *1/3rd* of the world's websites, controlling a staggering 34%. And yet privacy seekers naively continue to use so-called "privacy focused" search engines like #DuckDuckGo, which neglects to filter out Cloudflare sites from the results.
@boud @VickyRampin @switchingsoftware @codeberg thanks for adding it to that list. I didn't know specific repos could be requested, so i've added the whole list of forges to this page: https://wiki.softwareheritage.org/wiki/Suggestion_box:_source_code_to_add
@boud @VickyRampin btw here's the link to the zip file: https://web.archive.org/web/20210613163219/https://dl.acm.org/doi/10.1145/2911988
@VickyRampin @boud that's not just a case of missing tex source code, but that project involved writing a bot that crawled the web for banks' privacy policies. I would love to see that code and the raw data, but it was lost. If it had been integrated into the PDF, it would have persisted.
@boud @VickyRampin Consider this failure case reguarding a study at Carnegie Mellon by Dr.Cranor: https://web.archive.org/web/20170329100626/dl.acm.org/citation.cfm?id=2911988 They attempted to distribute a zip file w/all the raw data but they botched it. The zip only contains the same PDF file that sits next to the zip. When really there was no need for the zip. the PDF should have included the files
@VickyRampin @boud i often embed the tex files in the PDF produced by the tex code for ad hoc collaboration, particularly if I don't know if the recipient knows what to do with a tex file. And in some cases the other person extracts the tex files, makes some edits, and sends them back to me.
@boud @VickyRampin If the attachfile pkg is incompatible, another approach is to have the makefile do a "pdftk attach_files" command. I think a PDF gets more widely distributed and outlives the *.tex that it came from, so embedding the source in the PDF ensures that the source inherits the longevity of the PDF in a reproducable way.
@werwolf @fedeproxy @dachary If #fedeproxy would make #Gitlab.com bug reports viewable, so archive.org becomes redundant, that would be a great thing.
@dachary @fedeproxy @werwolf Note that #Gitlab.com is a much bigger offender here. Gitlab is a #Cloudflared tor-hostile walled-garden, where I cannot even /view/ bug reports (unless I were to hypothetically solve their captcha)