Sabot in the Age of AI
Here is a curated list of strategies, offensive methods, and tactics for (algorithmic) sabotage, disruption, and deliberate poisoning.
🔻 iocaine
The deadliest AI poison—iocaine generates garbage rather than slowing crawlers.
🔗 git.madhouse-project.org/alger…
🔻 Nepenthes
A tarpit designed to catch web crawlers, especially those scraping for LLMs. It devours anything that gets too close. @aaron
🔗 zadzmo.org/code/nepenthes/
🔻 Quixotic
Feeds fake content to bots and robots.txt-ignoring #LLM scrapers. @marcusb
🔗 marcusb.org/hacks/quixotic.htm…
🔻 Poison the WeLLMs
A reverse-proxy that serves diassociated-press style reimaginings of your upstream pages, poisoning any LLMs that scrape your content. @mike
🔗 codeberg.org/MikeCoats/poison-…
🔻 Django-llm-poison
A django app that poisons content when served to #AI bots. @Fingel
🔗 github.com/Fingel/django-llm-p…
🔻 KonterfAI
A model poisoner that generates nonsense content to degenerate LLMs.
🔗 codeberg.org/konterfai/konterf…
reshared this
Mr_Hat_2010
in reply to ASRG • • •@Fingel maybe add glaze to the list?
glaze.cs.uchicago.edu/index.ht…
Glaze - Protecting Artists from Generative AI
glaze.cs.uchicago.eduNicole Parsons
in reply to ASRG • • •@Fingel
Larry Ellison, Oracle, and fossil fuel funded fascism...
cbsnews.com/news/trump-announc…
sfchronicle.com/tech/article/p…
washingtonpost.com/politics/20…
arstechnica.com/information-te…
propublica.org/article/project…
oracle.com/jo/news/announcemen…
arstechnica.com/tech-policy/20…
Larry Ellison chips in a cool billion toward Musk’s Twitter takeover
Financial Times (Ars Technica)Nicole Parsons reshared this.
GhostOnTheHalfShell
in reply to Nicole Parsons • • •Private sector AI infrastructure is tech building the pervasive surveillance state.
All the lunatic babble from the right about the deep state, but as usual, every accusation was a confession. These guys own the state and everybody in it is going to be is or is going to be working for them.
Uncountable government bureaucrats all working for people like Larry Ellison.
Lord Matt ✔️✔️✔️✔️✔️
in reply to ASRG • • •Marcos Dione
in reply to ASRG • • •@Fingel another take that I hope I have time to write:
An app that feeds either static text or a poisoned Markov Chain, but it writes back one byte at a time, and tries to delay the client as much as possible. It would probably would have to have start with a big delay, and every time the client disconnects, it registers the IP and the delay in a db so next time it tries a lower delay until it finds the best delay for each client.
Marcos Dione
in reply to Marcos Dione • • •Mr Salteena is not quite a gentleman
in reply to ASRG • • •Aaron
in reply to Mr Salteena is not quite a gentleman • • •A medium term plan for Nepenthes is to coordinate data amongst instances to conclusively identity crawlers, and hopefully allow people to ban them preemptively.
Still thinking through it. No ETA.
@asrg
Floppy 💾
in reply to ASRG • • •Thank you for this comprehensive list, this is great! I will need one of such defenses sooner or later I believe.
I worry a bit that looking at IP-ranges and User-Agent-strings might not be enough long-term. UA can be faked easily and different ranges are possible.
Are there other reliable ways employed to identify bots, without giving away any important content (e.g. beyond landing page)? Maybe looking at behavioural data (e.g. rate of accessing URLs)?
@aaron @marcusb @mike @Fingel