Skip to main content


Sad ideas for the AI data scraping era:

* "Access to this website is only available from a library. Find your nearest library here: "
* "To access data from this website, please send your SD card and a pre-stamped envelope"

Any other ideas?

in reply to Nina Kalinina

If it's a "darknet" website, you'd have to meet a stranger in a dark parking garage and you'd be handed a folder with the data.
This entry was edited (3 months ago)
Unknown parent

glitchsoc - Link to source
Nina Kalinina
@adipoeserPursch floppy disk is an option, though they're far less durable for posting. I can do CDs, too.
in reply to Nina Kalinina

(I used to have a GPS where in order to update its maps, you had to request the company mail you a new SD card)
in reply to Nina Kalinina

This website is only available for access between 08.30 and 10.00 weekday mornings.

No, wait... that's my GP's booking website.

in reply to Jonathan Lamothe

@me When they did appointments by phone, they had two short (60-90m) periods in a day when you could phone up for an urgent appointment that day -- one early, one at lunchtime. The rest of the time, you could only book an appointment for two weeks' time.

I guess they kept the same process for the website.

in reply to Nina Kalinina

One more:

For one-to-many limited communication:

* subscribe to the magazine, receive a copy with a CD, send your feedback over mail, we will reroute the messages to the authors

in reply to Nina Kalinina

"To use this website, please email xyz@example.com with your source IP address and the time that you would like to access it."
in reply to Nina Kalinina

This Website comes as downloadable zip archive (index made with frameset/frame)
in reply to Nina Kalinina

@Nina Kalinina Host over Gemini or Gopher instead? Sure, they can be scraped too, but is anyone actually doing that?

I'm inclined to just write the web off as a lost cause at this point.

Kg. Madee Ⅱ. reshared this.

in reply to Jonathan Lamothe

@Nina Kalinina Also, aren't there tools that detect AI scrapers and instead of blocking them, they just feed them garbage to poison the training data? I fully endorse the use of such tools.
in reply to Nina Kalinina

@me Also, I'm unaware of any projects that feed "poison" text in the same way as those image poisoners. There are projects to generate absolute rubbish text, but compared to the image-poisoning work it's orders of magnitude less harmful to the LLM trained on it, just statistical noise.
I would *love* to see a comparable project that generated memetically poisonous text.
in reply to Nina Kalinina

the equivalent of a dead drop:

Be at this IP (somewhere in AWS space) at 10:05 UTC

listen on UDP, port 69

A packet will be transmitted three times, containing the url for the dead drop.

in reply to Nina Kalinina

website can only be accessed via a mobile phone with live GPS showing it isn't within x km of any known data centre.
in reply to Nina Kalinina

be so unremarkable that nothing indexes you in the first place

apart from that i can't think of a thing that'd deter scraping without being very annoying either on the website owner or the user

in reply to evv42

@evv42 my little, unindexed web server was killed by ddos from scrapers. 🙁 and all it hosted was some old C sources in archives
in reply to evv42

making the thing only accessible through a thing like gopher maybe ?
in reply to evv42

@evv42 I'm pretty sure gopher is scrapped already. But it probably helps with the ddos from scrapers
in reply to Nina Kalinina

I haven't looked closely into this, just saw it on my feed a few min ago, but seems interesting: github.com/TecharoHQ/anubis
in reply to vae

@bex tis used to have GenAI mascot :3 made by GenAI lovers who came to realise how damaging GenAI is, resulting in making web inaccessible to old phones and computers ❤ :akko_yay:
@vae
in reply to Nina Kalinina

wait.. is this tool making the web inaccessible to oldies or did they make the tool to try to make the web more accessible to oldies? My brain isn't braining very good right now.. ​:neocat_laugh_tears:
in reply to vae

@bex it requires CPU power to open a website. No CPU power no website for you
@vae
Unknown parent

mastodon - Link to source
Sundew
@troublewithwords Off topic, but reminiscent!
youtube.com/watch?v=MRV8mFWwtS…
in reply to Nina Kalinina

In Clifford Stoll's book "The Cuckoo's Egg" he tricks a hacker into sending a written letter by posting a bulletin "to recieve more information about the SDInet program, mail a letter to blank and we will send a packet back." Of course everyone on the internet was *too* human at that point.
in reply to Nina Kalinina

- this website is available only if your browser doesn't use javascript
- this website requires macromedia flash player
- this website is only availabl, over ipv6 (I think I didn't see any scrappers bots there so far?)
in reply to Nina Kalinina

* This website uses a barter system. Please upload 2.7 attention-minutes of content to proceed.
in reply to Nina Kalinina

I think forcing TOS per TLS would be helpful

mastodon.social/@ShadSterling/…


I keep thinking the technical part should be for the server to enforce acceptance of the terms by requiring something like a cookie that matches the TLS cert and the servers record that that cert has accepted the terms, and if that’s not the case you get redirected to the page that offers the terms. That way there’s no possibility of a reasonable belief that something being openly accessible implies any broader terms of use than were explicitly agreed to

in reply to Nina Kalinina

Somehow encrypt the website, require a password that can only be obtained by email and passing the various hoops the site administrator sets you.
in reply to Nina Kalinina

* "Visit this brick wall in a public park with an inconspicuous usb-c socket to gain access"
* "Tune in to this radio frequency to get a data drop being cycled every hour on the hour"

This website uses cookies. If you continue browsing this website, you agree to the usage of cookies.