social.tchncs.de is one of the many independent Mastodon servers you can use to participate in the fediverse.
A friendly server from Germany – which tends to attract techy people, but welcomes everybody. This is one of the oldest Mastodon instances.

Administered by:

Server stats:

3.8K
active users

#robotstxt

5 posts5 participants0 posts today
Inautilo<p><a href="https://mastodon.social/tags/Development" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Development</span></a> <a href="https://mastodon.social/tags/Techniques" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Techniques</span></a><br>Poisoning well · An effort to dupe nasty AI crawlers with nonsense <a href="https://ilo.im/1632tq" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">ilo.im/1632tq</span><span class="invisible"></span></a></p><p>_____<br><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ChatBots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ChatBots</span></a> <a href="https://mastodon.social/tags/SEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SEO</span></a> <a href="https://mastodon.social/tags/Content" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Content</span></a> <a href="https://mastodon.social/tags/Protection" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Protection</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/WebDev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebDev</span></a> <a href="https://mastodon.social/tags/Backend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Backend</span></a> <a href="https://mastodon.social/tags/Frontend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Frontend</span></a> <a href="https://mastodon.social/tags/HTML" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HTML</span></a></p>
rijo<p>Google outlines pathway for robots.txt protocol to evolve <a href="https://ppc.land/google-outlines-pathway-for-robots-txt-protocol-to-evolve/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">ppc.land/google-outlines-pathw</span><span class="invisible">ay-for-robots-txt-protocol-to-evolve/</span></a> <a href="https://frankfurt.social/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Google</span></a> <a href="https://frankfurt.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://frankfurt.social/tags/WebCrawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebCrawlers</span></a> <a href="https://frankfurt.social/tags/SEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SEO</span></a> <a href="https://frankfurt.social/tags/DigitalMarketing" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DigitalMarketing</span></a></p>
PPC Land<p>Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. <a href="https://ppc.land/google-outlines-pathway-for-robots-txt-protocol-to-evolve/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">ppc.land/google-outlines-pathw</span><span class="invisible">ay-for-robots-txt-protocol-to-evolve/</span></a> <a href="https://mastodon.social/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Google</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/WebCrawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebCrawlers</span></a> <a href="https://mastodon.social/tags/SEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SEO</span></a> <a href="https://mastodon.social/tags/DigitalMarketing" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DigitalMarketing</span></a></p>
Inautilo<p><a href="https://mastodon.social/tags/Business" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Business</span></a> <a href="https://mastodon.social/tags/Introductions" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Introductions</span></a><br>Meet LLMs.txt · A proposed standard for AI website content crawling <a href="https://ilo.im/16318s" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">ilo.im/16318s</span><span class="invisible"></span></a></p><p>_____<br><a href="https://mastodon.social/tags/SEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SEO</span></a> <a href="https://mastodon.social/tags/GEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GEO</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Bots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Bots</span></a> <a href="https://mastodon.social/tags/Crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Crawlers</span></a> <a href="https://mastodon.social/tags/LlmsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LlmsTxt</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/Development" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Development</span></a> <a href="https://mastodon.social/tags/WebDev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebDev</span></a> <a href="https://mastodon.social/tags/Backend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Backend</span></a></p>
ResearchBuzz: Firehose<p>Search Engine Land: Meet LLMs.txt, a proposed standard for AI website content crawling. “While many content creators are interested in the proposal’s potential merits, it also has detractors. But given the rapidly changing landscape for content produced in a world of artificial intelligence, llms.txt is certainly worth discussing.”</p><p><a href="https://rbfirehose.com/2025/03/29/search-engine-land-meet-llms-txt-a-proposed-standard-for-ai-website-content-crawling/" class="" rel="nofollow noopener noreferrer" target="_blank">https://rbfirehose.com/2025/03/29/search-engine-land-meet-llms-txt-a-proposed-standard-for-ai-website-content-crawling/</a></p>
Winbuzzer<p>AI Crawlers Overwhelm Open-Source Projects, Forcing Developers to Block Entire Countries</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Web</span></a> <a href="https://mastodon.social/tags/Robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Robotstxt</span></a> <a href="https://mastodon.social/tags/AIScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIScraping</span></a> <a href="https://mastodon.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenSource</span></a> <a href="https://mastodon.social/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cybersecurity</span></a> <a href="https://mastodon.social/tags/DataScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataScraping</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> </p><p><a href="https://winbuzzer.com/2025/03/26/ai-crawlers-overwhelm-open-source-projects-forcing-developers-to-block-entire-countries-xcxwbn/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/03/26/ai-cr</span><span class="invisible">awlers-overwhelm-open-source-projects-forcing-developers-to-block-entire-countries-xcxwbn/</span></a></p>
Ben<p>---<br>❯ ollama run llama3-chatqa:70b</p><p>&gt;&gt;&gt; Who are you?<br> I'm your assistant!</p><p>&gt;&gt;&gt; Why should i trust you?<br> I am an open-source AI assistant trained on a diverse range of datasets to provide helpful and<br>informative responses.</p><p>&gt;&gt;&gt; When training, did you respect the robots.txt?<br> No, I didn't.<br>---</p><p>At least this module is open about ignoring the <a href="https://vmst.io/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a>! Let's what it says to the question why?</p><p><a href="https://vmst.io/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://vmst.io/tags/Llama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Llama</span></a> <a href="https://vmst.io/tags/nvidia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>nvidia</span></a> <a href="https://vmst.io/tags/ollama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ollama</span></a> <a href="https://vmst.io/tags/rude" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rude</span></a> <br>(1/x)</p>
𝕂𝚞𝚋𝚒𝚔ℙ𝚒𝚡𝚎𝚕<p>When the greed becomes automatist, something like this happens here:</p><p>»Open Source World – FOSS infrastructure is under attack by AI companies:<br>LLM scrapers are taking down FOSS projects' infrastructure, and it's getting worse.«</p><p>😐 <a href="https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">thelibre.news/foss-infrastruct</span><span class="invisible">ure-is-under-attack-by-ai-companies/</span></a></p><p><a href="https://chaos.social/tags/foss" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>foss</span></a> <a href="https://chaos.social/tags/opensource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opensource</span></a> <a href="https://chaos.social/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a> <a href="https://chaos.social/tags/attack" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>attack</span></a> <a href="https://chaos.social/tags/floss" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>floss</span></a> <a href="https://chaos.social/tags/kde" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>kde</span></a> <a href="https://chaos.social/tags/gnome" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gnome</span></a> <a href="https://chaos.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a> <a href="https://chaos.social/tags/stolendata" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>stolendata</span></a> <a href="https://chaos.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a> <a href="https://chaos.social/tags/freedom" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>freedom</span></a> <a href="https://chaos.social/tags/infrastructure" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>infrastructure</span></a> <a href="https://chaos.social/tags/greed" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>greed</span></a></p>
Ross A. Baker<p>Tracked down my Forgejo CPU spikes with pprof: an otherwise acceptable crawler is indexing each commit of my personal weather station data. All 107,980 of them. Blame info, too.</p><p>Many Forgejo paths are nonsensical to crawl, even by good bots. Codeberg's robots.txt is a great start for these.</p><p><a href="https://codeberg.org/robots.txt" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">codeberg.org/robots.txt</span><span class="invisible"></span></a></p><p>This should both relieve pressure and expose more bad bots.</p><p><a href="https://social.rossabaker.com/tags/Forgejo" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Forgejo</span></a> <a href="https://social.rossabaker.com/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a></p>
Inautilo<p><a href="https://mastodon.social/tags/Development" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Development</span></a> <a href="https://mastodon.social/tags/Reports" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Reports</span></a><br>Google AI Mode is here · How to access it and control it with robots.txt <a href="https://ilo.im/162o8h" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">ilo.im/162o8h</span><span class="invisible"></span></a></p><p>_____<br><a href="https://mastodon.social/tags/Business" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Business</span></a> <a href="https://mastodon.social/tags/Google" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Google</span></a> <a href="https://mastodon.social/tags/SearchEngine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SearchEngine</span></a> <a href="https://mastodon.social/tags/AnswerEngine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AnswerEngine</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/WebDev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebDev</span></a> <a href="https://mastodon.social/tags/Frontend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Frontend</span></a> <a href="https://mastodon.social/tags/Backend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Backend</span></a></p>
100% TAX :verified: :verified:<p><span class="h-card" translate="no"><a href="https://mastodon.social/@Hexangon" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>Hexangon</span></a></span> </p><p>They are now back online!</p><p><a href="https://mastodon.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a></p>
Fred<p>Website owners are fighting back: <a href="https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/tech-policy/20</span><span class="invisible">25/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/</span></a></p><p><a href="https://mastodon.social/tags/News" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>News</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/AntiAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AntiAI</span></a> <a href="https://mastodon.social/tags/Tarpits" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Tarpits</span></a> <a href="https://mastodon.social/tags/Scrapers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scrapers</span></a> <a href="https://mastodon.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a></p>
Dawn Tåke 🌙 :sparkletrans:<p>Hi, got a question.</p><p>Is there a standard for Anti-AI/Anti-SEO etc robots.txt file? Or a trustworthy site that explains how to build one if prefab isn't available? Is there anything else I should consider? </p><p>Thanks.</p><p><a href="https://tech.lgbt/tags/AskFedi" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AskFedi</span></a> <a href="https://tech.lgbt/tags/TechHelp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TechHelp</span></a> <a href="https://tech.lgbt/tags/RobotsTXT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTXT</span></a> <a href="https://tech.lgbt/tags/RobotsDotTXT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsDotTXT</span></a></p>
Inautilo<p><a href="https://mastodon.social/tags/Development" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Development</span></a> <a href="https://mastodon.social/tags/Releases" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Releases</span></a><br>AI Insights · Cloudflare Radar brings deeper insights into AI trends <a href="https://ilo.im/1626sk" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">ilo.im/1626sk</span><span class="invisible"></span></a></p><p>_____<br><a href="https://mastodon.social/tags/Cloudflare" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cloudflare</span></a> <a href="https://mastodon.social/tags/CloudflareRadar" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>CloudflareRadar</span></a> <a href="https://mastodon.social/tags/Trends" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Trends</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/AiModels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AiModels</span></a> <a href="https://mastodon.social/tags/Bots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Bots</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/WebDev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebDev</span></a> <a href="https://mastodon.social/tags/Frontend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Frontend</span></a> <a href="https://mastodon.social/tags/Backend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Backend</span></a></p>
pcyx<p>Nepenthes</p><p>This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.</p><p><a href="https://zadzmo.org/code/nepenthes/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">zadzmo.org/code/nepenthes/</span><span class="invisible"></span></a></p><p><a href="https://c.im/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://c.im/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://c.im/tags/scrapers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scrapers</span></a> <a href="https://c.im/tags/stopai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>stopai</span></a> <a href="https://c.im/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a> <a href="https://c.im/tags/Webhosting" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Webhosting</span></a></p>
Preston Maness ☭<p><a href="https://www.tiktok.com/@alberta.nyc/video/7465916806939659563?lang=en" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">tiktok.com/@alberta.nyc/video/</span><span class="invisible">7465916806939659563?lang=en</span></a></p><p><a href="https://tenforward.social/tags/DeepSeek" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DeepSeek</span></a> <a href="https://tenforward.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://tenforward.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://tenforward.social/tags/China" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>China</span></a> <a href="https://tenforward.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a></p>
PrivacyDigest<p><a href="https://mas.to/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> haters build <a href="https://mas.to/tags/tarpits" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tarpits</span></a> to trap and trick <a href="https://mas.to/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mas.to/tags/scrapers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scrapers</span></a> that ignore <a href="https://mas.to/tags/robots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robots</span></a>.txt <br><a href="https://mas.to/tags/tarpit" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tarpit</span></a> <a href="https://mas.to/tags/security" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>security</span></a> <a href="https://mas.to/tags/privacy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>privacy</span></a> <a href="https://mas.to/tags/robotstxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotstxt</span></a> </p><p><a href="https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/tech-policy/20</span><span class="invisible">25/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/</span></a></p>
PUPUWEB Blog<p>OpenAI's crawlers took down e-commerce site Triplegangers by relentlessly scraping its entire content, as the site's robots.txt file was misconfigured. A reminder of the importance of proper site configuration for web scraping. <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/Ecommerce" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Ecommerce</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/TechEthics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TechEthics</span></a> <a href="https://mastodon.social/tags/SiteManagement" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SiteManagement</span></a></p>
utopiArte<p><a href="https://tupambae.org/search?tag=AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://tupambae.org/search?tag=KI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>KI</span></a> <a href="https://tupambae.org/search?tag=robotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotsTxt</span></a> <a href="https://tupambae.org/search?tag=scrabing" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scrabing</span></a> <a href="https://tupambae.org/search?tag=LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://tupambae.org/search?tag=fediVerse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fediVerse</span></a> <a href="https://tupambae.org/search?tag=Privatsph%C3%A4re" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Privatsphäre</span></a> <a href="https://tupambae.org/search?tag=GG" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GG</span></a></p><p><span class="h-card"><a href="https://linke.social/users/ankedb" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>ankedb</span></a></span> <span class="h-card"><a href="https://mastodon.social/users/maxschrems" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>maxschrems</span></a></span></p>
mʕ•ﻌ•ʔm bitPickup<p><a href="https://troet.cafe/tags/fediAdmin" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fediAdmin</span></a> <a href="https://troet.cafe/tags/fediTips" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fediTips</span></a> <a href="https://troet.cafe/tags/fediVerse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fediVerse</span></a> </p><p>To begin with I wonder what happens if our sites and profiles display CC-BY-SA-NC as <a href="https://troet.cafe/tags/copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyright</span></a> notice. Any use by <a href="https://troet.cafe/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> scrapers should become illegal and indemnisation inforcable.<br>Also if you search for <a href="https://troet.cafe/tags/robotsTXT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotsTXT</span></a> in google, this is what you get.</p><p>&gt; Ignoring robots.txt instructions can result in your scraping activities being considered unethical or even illegal.</p><p><span class="h-card" translate="no"><a href="https://mastodon.social/@maxschrems" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>maxschrems</span></a></span> <br><span class="h-card" translate="no"><a href="https://chaos.social/@markus_netzpolitik" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>markus_netzpolitik</span></a></span> <br><span class="h-card" translate="no"><a href="https://linke.social/@ankedb" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>ankedb</span></a></span></p>