Skip to main content


Sabot in the Age of AI

Here is a curated list of strategies, offensive methods, and tactics for (algorithmic) sabotage, disruption, and deliberate poisoning.

🔻 iocaine
The deadliest AI poison—iocaine generates garbage rather than slowing crawlers.
🔗 git.madhouse-project.org/alger…

🔻 Nepenthes
A tarpit designed to catch web crawlers, especially those scraping for LLMs. It devours anything that gets too close. @aaron
🔗 zadzmo.org/code/nepenthes/

🔻 Quixotic
Feeds fake content to bots and robots.txt-ignoring #LLM scrapers. @marcusb
🔗 marcusb.org/hacks/quixotic.htm…

🔻 Poison the WeLLMs
A reverse-proxy that serves diassociated-press style reimaginings of your upstream pages, poisoning any LLMs that scrape your content. @mike
🔗 codeberg.org/MikeCoats/poison-…

🔻 Django-llm-poison
A django app that poisons content when served to #AI bots. @Fingel
🔗 github.com/Fingel/django-llm-p…

🔻 KonterfAI
A model poisoner that generates nonsense content to degenerate LLMs.
🔗 codeberg.org/konterfai/konterf…

in reply to Nicole Parsons

Private sector AI infrastructure is tech building the pervasive surveillance state.

All the lunatic babble from the right about the deep state, but as usual, every accusation was a confession. These guys own the state and everybody in it is going to be is or is going to be working for them.

Uncountable government bureaucrats all working for people like Larry Ellison.

This entry was edited (3 days ago)
in reply to ASRG

@Fingel This very much reminds me of the infinitely crawlable nonsense first designed (by me) to give Microsoft Recall a headache.
in reply to ASRG

@Fingel another take that I hope I have time to write:

An app that feeds either static text or a poisoned Markov Chain, but it writes back one byte at a time, and tries to delay the client as much as possible. It would probably would have to have start with a big delay, and every time the client disconnects, it registers the IP and the delay in a db so next time it tries a lower delay until it finds the best delay for each client.

in reply to Marcos Dione

@Fingel is there a site where some of the craziest delusions from the original LLMs are recorded? We should feed them that back.
in reply to ASRG

@Fingel I have been doing something primitive with fail2ban and a "trigger" URL. But. What I see is that the latest in scraping is to use a rotating set of IPs or proxies so requests never seem to come from the same IP number, and with plausible user agents. I'm struggling with this because although I can see the overall behaviour, it's not clear until after a request is made that's part of a scrape session, and blocking that IP number won't block the remaining scrapes. Firms are offering this kind of service commercially and there are plenty of writeups on how to do it.
in reply to Mr Salteena is not quite a gentleman

A medium term plan for Nepenthes is to coordinate data amongst instances to conclusively identity crawlers, and hopefully allow people to ban them preemptively.

Still thinking through it. No ETA.
@asrg

@ASRG
in reply to ASRG

Thank you for this comprehensive list, this is great! I will need one of such defenses sooner or later I believe.

I worry a bit that looking at IP-ranges and User-Agent-strings might not be enough long-term. UA can be faked easily and different ranges are possible.

Are there other reliable ways employed to identify bots, without giving away any important content (e.g. beyond landing page)? Maybe looking at behavioural data (e.g. rate of accessing URLs)?

@aaron @marcusb @mike @Fingel

This website uses cookies. If you continue browsing this website, you agree to the usage of cookies.