We Always Knew Anyone Could Take Our Content

Published on Aug 29, 2024 (updated Oct 25, 2024), filed under misc, ai. (Share this post, e.g., on Mastodon or on Bluesky.)

Everyone who publishes on the Web knows that their content can be copied and used by anyone for any purpose.

The impact of this is influenced by the quality, popularity, and value of the content (no one steals bad content), and by one’s ability to detect and take action against theft and misuse.

The Greatest Strength as the Greatest Weakness

This has always been a risk of providing content on the Web.

Over the years, this risk has moved from being a potential one to becoming more and more actual.

While there have been bad actors all along, the first major shift may have occurred with search engines: While these have forever featured content from websites to provide context for the respective results, that came with an unspoken agreement: I show your content, but you get the click.

Search engines broke this contract when they began featuring content in ways that reduced the likelihood of users visiting the original websites. Featuring content on a results page draws a fine line, but there is one.

Over the last years, we’ve been experiencing another major shift with AI actors gulping down web content to feed their models. For them, there is no contract and no line.

Here, the risk has turned into actual exploitation. AI actors would take our content, without our consent, without attribution, and use it for purposes we’re not being informed about and have no say about.

Adding insult to injury, AI actors use this content for their users to generate more content, which may compete with and bury our original material.

The Web Contract

The topic is all over the Web, but this is how I would describe what has happened to what is the web contract. ^*

Personally, I see no amicable solution that wouldn’t require respect (for content creators and owners) and trust (in search engines and scrapers).

From here on, we likely end up with some laws and their enforcement. In the end, what AI and greed will have accomplished is to destroy a good part of the respect and trust we’ve still had on the Web. While this may already be clear to everyone, it doesn’t make it any more great.

^* …or a variation thereof [though not what contractfortheweb.org describes]. The web contract as I understand it here is based on respect for people’s content and their preferences for its use (as indicated through things like the robots exclusion standard), in return for allowing people to access and make fair use of it.

About Me

I’m Jens (long: Jens Oliver Meiert), and I’m an engineering lead, guerrilla philosopher, and indie publisher. I’ve worked as a technical lead and engineering manager at various companies, including Google; I’m an open-source developer and a contributor to web standards (like HTML, CSS, WCAG); and I write and review books for O’Reilly and Frontend Dogma.

I love trying things, not only in web development and engineering management, but also with respect to politics and philosophy. Here on meiert.com I talk about some of my experiences and perspectives. (Please share feedback—interpret charitably, keep it friendly, but do be critical.)