Why I Don’t Block AI Scrapers
Published on August 29, 2024 (↻ September 25, 2024), filed under Development and Everything Else (RSS feed for all categories).
The basic contract of the Web seems to have been called, with AI scrapers taking content from everywhere to train their models, regardless of content licenses and preferences, without attribution or compensation.
For an increasing number of site and content owners, this has since meant to block AI scrapers. (For that purpose, there are also increasingly better helpers, like Dark Visitors.)
I, for my part, running sites like meiert.com, Frontend Dogma, and WebGlossary.info, have first tried but ultimately stopped excluding and blocking AI scrapers.
With scrapers starting off ignoring robots.txt directives, and us keeping on seeing existing and new scrapers that ignore robots.txt preferences, the approach isn’t only not working well—AI companies have probably changed the game for good.
Personally, I’m not going to engage in an arms race in which more and more scrapers are being tried to be blocked. I rather watch this unfold legally.
Just like on your websites, the content on my websites is under specific licenses. While usually generous, some require attribution, and others specifically cover derivative use. Still, even where there’s no license specified, it’s not anyone else’s content.
So what I’m betting on instead, is more legal action—by other businesses, and other corporate interests—against what looks like theft.
Will this take a long time to have an effect? Very likely so.
Could this mean the respective work will never get attributed, and their owner—here I—never be compensated for it? That seems likely, too.
Will one even be able to join any cases, to invoke one’s rights? Given how we think about law in Europe (with no few class actions), probably not even that.
Still, let’s face it: If anyone walks around and copies content, to reuse it and resell it—then that’s theft regardless of whether you had put up a sign, “no stealing, please.” And as there hasn’t even been an unwritten “contract” with any AI company, AI scraping the Web appears to be nothing but theft.
That’s why I don’t block AI scrapers—and let thieves do thief things until our justice system(s) do justice system things.
(And yet, I may be wrong all over the place. I’ll be following the development just as you do, and perhaps make further adjustments depending on how it goes.)
About Me
I’m Jens (long: Jens Oliver Meiert), and I’m a frontend engineering leader and tech author/publisher. I’ve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, I’m a contributor to several web standards, and I write and review books for O’Reilly and Frontend Dogma.
I love trying things, not only in web development (and engineering management), but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (Please be critical, interpret charitably, and give feedback.)
Read More
Maybe of interest to you, too:
- Next: Imposing on Hearing
- Previous: We Always Knew Anyone Could Take Our Content
- More under Development or Everything Else
- More from 2024
- Most popular posts
Looking for a way to comment? Comments have been disabled, unfortunately.
Is it possible to find fault with everything? Try The Problems With All the Good Things (2023). In a little philosophical experiment, I’m making use of AI to look into this question—and what it means. Available at Amazon, Apple Books, Kobo, Google Play Books, and Leanpub.