Why I Don’t Block AI Scrapers

Published on Aug 29, 2024 (updated Sep 25, 2024), filed under development, misc, ai. (Share this post, e.g., on Mastodon or on Bluesky.)

This is one of 180 articles that you can also read in an ebook: On Web Development II.

The basic contract of the Web seems to have been called, with AI scrapers taking content from everywhere to train their models, regardless of content licenses and preferences, without attribution or compensation.

For an increasing number of site and content owners, this has since meant to block AI scrapers. (For that purpose, there are also increasingly better helpers, like Dark Visitors.)

I, for my part, running sites like meiert.com, Frontend Dogma, and WebGlossary.info, have first tried but ultimately stopped excluding and blocking AI scrapers.

With scrapers starting off ignoring robots.txt directives, and us keeping on seeing existing and new scrapers that ignore robots.txt preferences, the approach isn’t only not working well—AI companies have probably changed the game for good.

A Different Strategy

Personally, I’m not going to engage in an arms race in which more and more scrapers are being tried to be blocked. I rather watch this unfold legally.

Just like on your websites, the content on my websites is under specific licenses. While usually generous, some require attribution, and others specifically cover derivative use. Still, even where there’s no license specified, it’s not anyone else’s content.

So what I’m betting on instead, is more legal action—by other businesses, and other corporate interests—against what looks like theft.

Will this take a long time to have an effect? Very likely so.

Could this mean the respective work will never get attributed, and their owner—here I—never be compensated for it? That seems likely, too.

Will one even be able to join any cases, to invoke one’s rights? Given how we think about law in Europe (with no few class actions), probably not even that.

Still, let’s face it: If anyone walks around and copies content, to reuse it and resell it—then that’s theft regardless of whether you had put up a sign, “no stealing, please.” And as there hasn’t even been an unwritten “contract” with any AI company, AI scraping the Web appears to be nothing but theft.

That’s why I don’t block AI scrapers—and let thieves do thief things until our justice system(s) do justice system things.

(And yet, I may be wrong all over the place. I’ll be following the development just as you do, and perhaps make further adjustments depending on how it goes.)

About Me

I’m Jens (long: Jens Oliver Meiert), and I’m an engineering lead, guerrilla philosopher, and indie publisher. I’ve worked as a technical lead and engineering manager for companies you use every day (like Google) and companies you’ve never heard of, I’m an occasional contributor to web standards (like HTML, CSS, WCAG), and I write and review books for O’Reilly and Frontend Dogma.

I love trying things, not only in web development and engineering management, but also with respect to politics and philosophy. Here on meiert.com I talk about some of my experiences and perspectives. (Please share feedback: Interpret charitably, but do be critical.)