Why I Don’t Block AI Scrapers

Published on August 29, 2024, filed under and (RSS feed for all categories).

The basic contract of the Web seems to have been called, with AI scrapers taking content from everywhere to train their models, regardless of content licenses and preferences, without attribution or compensation.

For an increasing number of site and content owners, this has since meant to block AI scrapers. (For that purpose, there are also increasingly better helpers, like Dark Visitors.)

I, for my part, running sites like meiert.com, Frontend Dogma, and WebGlossary.info, have first tried but ultimately stopped excluding and blocking AI scrapers.

With scrapers starting off ignoring robots.txt directives, and us keeping on seeing existing and new scrapers that ignore robots.txt preferences, the approach isn’t only not working well—AI companies have probably changed the game for good.

Personally, I’m not going to engage in an arms race in which more and more scrapers are being tried to be blocked. I rather watch this unfold legally.

Just like on your websites, the content on my websites is under specific licenses. While usually generous, some require attribution, and others specifically cover derivative use. Still, even where there’s no license specified, it’s not anyone else’s content.

So what I’m betting on instead, is more legal action—by other businesses, and other corporate interests—against what looks like theft.

Will this take a long time to have an effect? Very likely so.

Could this mean the respective work will never get attributed, and their owner—here I—never be compensated for it? That seems likely, too.

Will one even be able to join any cases, to invoke one’s rights? Given how we think about law in Europe (with no few class actions), probably not even that.

Still, let’s face it: If anyone walks around and copies content, to reuse it and resell it—then that’s theft regardless of whether you had put up a sign, “no stealing, please.” And as there hasn’t even been an unwritten “contract” with any AI company, AI scraping the Web appears to be nothing but theft.

That’s why I don’t block AI scrapers—and let thieves do thief things until our justice system(s) do justice system things.

(And yet, I may be wrong all over the place. I’ll be following the development just as you do, and perhaps make further adjustments depending on how it goes.)

Was this useful or interesting? Share (toot) this post, or support my work by buying one of my books (they’re affordable, and many receive updates). Thanks!

About Me

Jens Oliver Meiert, on September 30, 2021.

I’m Jens (long: Jens Oliver Meiert), and I’m a frontend engineering leader and tech author/publisher. I’ve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, I’m somewhat close to W3C and WHATWG, and I write and review books for O’Reilly and Frontend Dogma.

I love trying things, not only in web development (and engineering management), but also in other areas like philosophy. Here on meiert.com I share some of my views and experiences.

If you’d like to do me a favor, interpret charitably (I speak three languages, and they do collide), yet be critical and give feedback, so that I can make improvements. Thank you!