Sabot in the Age of AI
A list of offensive methods & strategic approaches for facilitating (algorithmic) sabotage, framework disruption, & intentional data poisoning.
Selected Tools & Frameworks
- Nepenthes — Endless crawler trap.
- Babble — Standalone LLM crawler tarpit.
- Markov Tarpit — Traps AI bots & feeds them useless data.
- Sarracenia — Loops bots into fake pages.
- Antlion — Express.js middleware for infinite sinkholes.
- Infinite Slop — Garbage web page generator.
- Poison the WeLLMs — Reverse proxy for LLM confusion.
- Marko — Dissociated Press CLI/lib.
- django-llm-poison — Serves poisoned content to crawlers.
- konterfAI — Model-poisoner for LLMs.
- Quixotic — Static site LLM confuser.
- toxicAInt — Replaces text with slop.
- Iocaine — Defense against unwanted scrapers.
- Caddy Defender — Blocks bots & pollutes training data.
- GzipChunk — Inserts compressed junk into live gzip streams.
- Chunchunmaru — Go-based web scraper tarpit.
- IED — ZIP bombs for web scrapers.
- FakeJPEG — Endless fake JPEGs.
- Pyison — AI crawler tarpit.
- HalluciGen — WP plugin that scrambles content.
- Spigot — Hierarchical Markov page generator.
This is a living resource — regularly updated to reflect the shifting terrain of collective techno-disobedience and algorithmic Luddism.
This entry was edited (1 week ago)
reshared this
Mr_Hat_2010
in reply to ASRG • • •@Fingel maybe add glaze to the list?
glaze.cs.uchicago.edu/index.ht…
Glaze - Protecting Artists from Generative AI
glaze.cs.uchicago.eduNicole Parsons
in reply to ASRG • • •@Fingel
Larry Ellison, Oracle, and fossil fuel funded fascism...
cbsnews.com/news/trump-announc…
sfchronicle.com/tech/article/p…
washingtonpost.com/politics/20…
arstechnica.com/information-te…
propublica.org/article/project…
oracle.com/jo/news/announcemen…
arstechnica.com/tech-policy/20…
Larry Ellison chips in a cool billion toward Musk’s Twitter takeover
Financial Times (Ars Technica)Nicole Parsons reshared this.
Matt 🔶 (LordMatt)
in reply to ASRG • • •Marcos Dione
in reply to ASRG • • •@Fingel another take that I hope I have time to write:
An app that feeds either static text or a poisoned Markov Chain, but it writes back one byte at a time, and tries to delay the client as much as possible. It would probably would have to have start with a big delay, and every time the client disconnects, it registers the IP and the delay in a db so next time it tries a lower delay until it finds the best delay for each client.
Marcos Dione
in reply to Marcos Dione • • •Mr Salteena is not quite a gentleman
in reply to ASRG • • •Aaron
in reply to Mr Salteena is not quite a gentleman • • •A medium term plan for Nepenthes is to coordinate data amongst instances to conclusively identity crawlers, and hopefully allow people to ban them preemptively.
Still thinking through it. No ETA.
@asrg
[moved] Floppy 💾
in reply to ASRG • • •Thank you for this comprehensive list, this is great! I will need one of such defenses sooner or later I believe.
I worry a bit that looking at IP-ranges and User-Agent-strings might not be enough long-term. UA can be faked easily and different ranges are possible.
Are there other reliable ways employed to identify bots, without giving away any important content (e.g. beyond landing page)? Maybe looking at behavioural data (e.g. rate of accessing URLs)?
@aaron @marcusb @mike @Fingel