Jon's Friendica Node

BlueSky’s “user intents” is a good proposal, and it’s weird to see some people flaming them for it as though this is equivalent to them welcoming in AI scraping (rather than trying to add a consent signal to allow users to communicate preferences for the scraping that is already happening).

github.com/bluesky-social/prop…

#BlueSky #AI

proposals/0008-user-intents at main · bluesky-social/proposals

Bluesky proposal discussions. Contribute to bluesky-social/proposals development by creating an account on GitHub.

^GitHub

in reply to Molly White

Molly White

in reply to Molly White • 7 months ago • •

I think the weakness with this and Creative Commons’ similar proposal for “preference signals” is that they rely on scrapers to respect these signals out of some desire to be good actors. We’ve already seen some of these companies blow right past robots.txt or pirate material to scrape.

ietf.org/slides/slides-aicontr…

#BlueSky #AI

#AI #bluesky

in reply to Molly White

Molly White

in reply to Molly White • 7 months ago • •

I do think that they are good technical foundations, and there is the potential for enforcement to be layered atop them.

Technology alone won’t solve this issue, nor will it provide the levers for enforcement, so it’s somewhat reasonable that they don’t attempt to.

But it would be nice to see some more proactive recognition from groups proposing these signals that enforcement is going to be needed, and perhaps some ideas for how their signals could be incorporated into such a regime.

#BlueSky #AI

#AI #bluesky

in reply to Molly White

flaeky pancako

in reply to Molly White • 7 months ago • •

is there a reason why it would be a bad idea to just do ip block lists for known bad actors ?

in reply to flaeky pancako

Molly White

in reply to flaeky pancako • 7 months ago • •

@fleeky that could be part of it, but it’s a game of whack-a-mole and AI companies are used to circumventing IP blocks. plus with federated protocols, maintaining and enforcing a blocklist is considerably more challenging.

@flaeky pancako

in reply to Molly White

flaeky pancako

in reply to Molly White • 7 months ago • •

i ran an email server for like 10-15 years and with spamassasin i just never had spam in my personal email..

spamassassin.apache.org/

maybe it wouldn't be fool proof but everytime they get through you plug the hole , it's a game of whack a mole but it costs them money ..

Apache SpamAssassin: Welcome

Apache SpamAssassin is the #1 Open Source anti-spam platform giving system administrators a filter to classify email and block spam.

^{spamassassin.apache.org}

in reply to flaeky pancako

flaeky pancako

in reply to flaeky pancako • 7 months ago • •

side note , i enjoyed this random reddit user's idea :
*random reddit user*

The best technique I've seen to combat this is:

Put a random, bad link in robots.txt. No human will ever read this.

Monitor your logs for hits to that URL. All those IPs are LLM scraping bots.

Take that IP and tarpit it.

in reply to flaeky pancako

Molly White

in reply to flaeky pancako • 7 months ago • •

@fleeky this assumes that scrapers are a) looking at robots.txt at all and b) forming their crawl strategy based on disallowed paths, which seems unlikely

@flaeky pancako

in reply to Molly White

phi1997

in reply to Molly White • 7 months ago • •

If the only thing you care about is sucking up as much data as possible, any path you know about is something to raid
@fleeky

@flaeky pancako

in reply to Molly White

Jan Ainali

in reply to Molly White • 7 months ago • •

what if there isn't any legal lever for enforcement? Or do you see some other potential?

in reply to Jan Ainali

Molly White

in reply to Jan Ainali • 7 months ago • •

@ainali we have to make one

@Jan Ainali

in reply to Molly White

Jan Ainali

in reply to Molly White • 7 months ago • •

I think you are right. I am not holding my breath though, the state of the world makes that seem far away.

in reply to Molly White

ShadSterling

in reply to Molly White • 7 months ago • •

I keep thinking the technical part should be for the server to enforce acceptance of the terms by requiring something like a cookie that matches the TLS cert and the servers record that that cert has accepted the terms, and if that’s not the case you get redirected to the page that offers the terms. That way there’s no possibility of a reasonable belief that something being openly accessible implies any broader terms of use than were explicitly agreed to

in reply to Molly White

Peter Makholm (Dansk)

in reply to Molly White • 7 months ago • •

At least in the EU this could probably constructed to have legal recognition under the DSM Text and Data mining exemption.

in reply to Molly White

katzenberger 🇺🇦

in reply to Molly White • 7 months ago • •

The disregard for enforcement issues is mostly a consequence of such proposals never starting from a data protection perspective, because that would lead to entirely different conceptualizations.

Instead, concepts are based on lists of "who wants to ingest, and for what", because the goal is to make data "available" to the respective entities.

This takes away the legitimization burden from those entities, and puts the consequences onto the shoulders of individuals: we're expected to familiarize ourselves with a growing typology of data krakens, and what stance we should take towards each of them.

Proposals like the ones you've mentioned also normalize this work, as if said entities were absolutely entitled to us doing it.

This entry was edited (7 months ago)

in reply to katzenberger 🇺🇦

Molly White

in reply to katzenberger 🇺🇦 • 7 months ago • •

@katzenberger I think it makes sense for projects like CC and Bluesky/ATproto to start from this perspective because they are fundamentally open. I agree that closed ecosystems should take a very different approach.

@katzenberger 🇺🇦

in reply to Molly White

katzenberger 🇺🇦

in reply to Molly White • 7 months ago • •

Interesting. I'd call individuals and their communities on platforms "closed" with respect to their reason for being there.

IMHO what matters isn't the "open" technicalities of the platform that hosts them. Open protocols that facilitate data exchange between servers are not per se a kind of permission to tap into the exchange.

In that respect, communities and their platforms cannot be considered an "ecosystem" with the same "openness" rules applying to both components.

E.g., recently, I've started to become very suspicious of "APIs" being defined that have but one purpose, from a developer's perspective: extracting content, and helping with ingestion.

This is in total disregard for what brings individuals and communities together (informal, psychologically safe, and free conversation among like-minded people).

It subjugates communities to technical considerations, refusing to abide with conventions (or even laws): You don't want me to get hold of your "content"? Find a technically watertight way to prevent me from doing it, or be ready to be ridiculed for your naivete.

in reply to Molly White

Matt Terenzio

in reply to Molly White • 7 months ago • •

We'd have to convince lots of people and companies to use something like Solid pods, or adopt standards.ieee.org/ieee/7012/7…

IEEE Standards Association

This draft standard covers contractual interactions and agreements between individuals and the service providers they engage on a network, including websites.

^{IEEE SA}

Unknown parent

Molly White

Unknown parent • 7 months ago • •

@hellpie I agree that hoping crawlers will respect these signals out of the pureness of their hearts is not sufficient.

I disagree that copyright is “the legally binding version of user preferences”, though. There is no good legal framework for consent when it comes to AI training, and so a lot of people are falling back to copyright, since there are laws on the books. But I don’t think copyright is the weapon with which to fight this battle — in my view, AI training pretty clearly falls under fair use (and should).

@HellPie

in reply to Molly White

Kuba Suder • @mackuba.eu on 🦋

in reply to Molly White • 7 months ago • •

cc @mastodonmigration

@Mastodon Migration

This website uses cookies. If you continue browsing this website, you agree to the usage of cookies.

⇧

Molly White 7 months ago • •

Molly White
7 months ago • •