0 of the Global Top 100 Websites Use Valid HTML (in 2022)

Post from September 12, 2022 (↻ September 29, 2022), filed under  (feed).

When you looked at the Top 100 U.S. websites in 2021, you learned that 98% of them included invalid HTML.

When you do the same for the Top 100 globally, this year, would things have improved? Using Ahrefs’s list, here’s the latest on HTML and CSS conformance on a global level.

The short answer first: We’re at the absolute low point with HTML, where not a single Top 100 website uses valid HTML. For CSS, things improved, and there’s an increasing number of websites that use valid CSS.

The Analysis

The analysis involved pulling Ahrefs’s Top 100 global websites into a spreadsheet, run every site through the W3C’s HTML and CSS validators, and document the results.

Top 100 websites conformance data spreadsheet teaser.

Figure: Looking for an extremely popular website using HTML according to the HTML specification? It seems we’re out of stock.

This is some rough analysis, however: It assumes the validators to be accurate, and it needs them not to blocked from accessing the sites (which sometimes they were, which then meant to test manually, or to skip the site). The spreadsheet highlights and explains some special cases.

Still, the analysis is not about great precision: It’s about whether the tested websites use valid HTML and CSS. They fail this with 1 issue as much as they do with 500 (which some sites surpassed). A website cannot be half-valid.

Let’s go through the results:

HTML: 0 Valid Sites (–2)

2021 was already shocking—ridiculous—in that in companies with revenues of billions of dollars and with some of our greatest peers on the payroll, only 2 websites were valid.

2022 is worse: 0 of 99 testable websites use valid HTML.

The average number of HTML errors is 125.63 (+0.41, which indicates remarkable consistency).

25 websites return fewer than 10 validation errors.

34 websites return more than 100 validation errors. (Is it inappropriate to say that when you ship that many errors, you don’t know much about HTML? I asked, and most of you would not trust these peers to develop your own sites.)

96 of 99 HTML-testable websites appear to use the standard HTML doctype. 1 website declares XHTML 1.0 Transitional, 1 suggests XHTML 1.1, and 1 uses no doctype.

CSS: 13 Valid Sites (+10)

2021, only 3 websites used valid CSS.

2022, this changed, with 13 of 94 testable websites using valid, error-free CSS.

The average number of CSS errors is 26.33 (–22.07).

35 websites return fewer than 10 validation errors.

5 websites return more than 100 validation errors.

Wikipedia and Amazon have several homepages in the 2022 Top 100. Interestingly, there are many relative differences between the Wikipedia homepages, whereas there seem to be quite a few similarities between Amazon homepages. I didn’t review the respective style sheets, but looking at error counts, those seem to be shared.

An Interpretation

A Glimmer of Hope for CSS?

If you’re like me, you may wonder the most about that marked improvement in CSS conformance: More than four times as many websites using valid CSS now? The average error count nearly halved?

As this analysis is pretty black and white in that it aims to answer the question, “valid or not?”, I haven’t dug into this. (Yet.) But my hunch is that this is due to improvements on the validator end—the W3C CSS validator has been known and feared for its false positives, and I recall there have been efforts to change that.

A Global Inability to Produce Valid HTML Output

Then, our field is absolutely not known for care about conformance. Among the most followed developers, you appear to find close to no one pushing the topic (many an expert developer sits on an invalid site, paralyzing their engaging on the issue). There was no major initiative to improve CSS on websites, either (or was there?).

What else is there to say?

I don’t feel like saying anything here. Personally, I do push this topic, and in the days leading up to this post covered conformance and its importance in the HTTP Archive’s Web Almanac’s upcoming 2022 chapter on HTML.

But what do you want to interpret in a data set that says that there hasn’t been a single document without fantasy HTML—and we’re talking the most frequented documents on the planet, written and maintained by some of the best-trained and highest-paid of our peers?

And I don’t know about you, but this triggers me—because something here can’t be right. As documents don’t write themselves, and as pay is good in our industry, maybe our training really isn’t.

And that would make sense—only a fundamental and literally global training problem, fueled by widespread self-elevation of everyone claiming to know and master HTML, could explain this; a problem so visible by now, that you can nearly see it from space. Well, at least on all the world’s most popular websites, where not a single one is without HTML errors.

If conformance is the canary in the coal mine of frontend development craft, then see for yourself how that craft is doing.

Toot or tweet about this?

About Me

Jens Oliver Meiert, on September 30, 2021.

I’m Jens, and I’m an engineering lead and author. I’ve worked as a technical lead for Google, I’m close to W3C and WHATWG, and I write and review books for O’Reilly. I love trying things, sometimes including philosophy, art, and adventure. Here on meiert.com I share some of my views and experiences.

If you have a question or suggestion about what I write, please leave a comment (if available) or a message. Thank you!