2024: 0.5% of the Global Top 200 Websites Use Valid HTML
Published on September 11, 2024, filed under Development (RSS feed for all categories).
HTML conformance data for 2024 are in, and the good news first: Thereâs an increase in the number of valid website home pages, going from 0 to 1.
The flip side: 199 of 200 of the most popular websites use HTML thatâs faulty, that doesnât exist, and/or that doesnât work.
The usual disclaimer: The following data gives an impression of precision that is greater than warranted. The point of this annual analysis is to check on full HTML conformanceâabsence of markup errorsâon home pages; that is, the specific error counts donât matter for the purpose of telling conformance or non-conformance, and other pages arenât being checked. Iâm providing the data for further study and for comparability to previous years.
Also, before we begin: Whereâs CSS? I took CSS validation out because 1) HTML quality is much more important than CSS quality (cf. The Most Important Thing Is to Get the HTML Right, though it could be clearer), and 2) the W3C CSS validator seems understandably but chronically behind the specifications, with false positives and false negatives (my experience thus far, which carries no judgment).
Contents
Analysis
Like every year since 2021 (cf. 2022, 2023) I used the annual update of the Ahrefs Top 1,000 websites to check the home pages of the first now 200 websites on HTML conformance. *
For this purpose I took all the respective URLs, prepared HTML validation URLs, validated the respective pages, and documented the error counts of the 200 tests in a spreadsheet:
HTML Conformance in 2024
âImprovedâ: 1 of the 200 home pages have 0 HTML conformance errorsâa splash screen of the Unique Identification Authority of India.
Improved: 5 home pages (of Adobe, Speedtest, Poki, the NHS, and Gramedia) were super-close to conformance, with 1 error each.
The issuesâerror messagesâon these single-error home pages?
- âAn
img
element must have analt
attribute, except under certain conditions.â - âBad value
true
for attributeasync
on elementscript
.â - âNo
li
element in scope but ali
end tag seen.â - âBad value
tel: 111
for attributehref
on elementa
: Illegal character in scheme data: space is not allowed.â - âElement
head
is missing a required instance of child elementtitle
.â
Improved: 44 more home pages had a single-digit number of issues.
Degraded: 56 were far away from conformance with any specification (100 or more errors).
Improved: The average number of HTML errors is 99.34.
Improved: The HTML errors median is 35.5.
(Improved: The mode is 5.)
HTML Conformance over Time
We need to test more websites for this to be significant, but after this new analysis, here is how error counts developed over the years:
2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|
Average number of HTML errors on home page | 125.22 | 125.63 â | 132.14 â | 99.34 â |
Home pages without errors | 2% | 0% â | 0% â | 0.5% â |
Notes and Observations
First, the W3C HTML validator (i.e., the âNuâ portion handling living HTML) is too helpful and makes work on fixing and analyzing HTML unnecessarily difficult. If youâre using it, too, youâll know how it mixes HTML errors with CSS and other errors. This has significantly slowed down the work on this analysis, and introduced a potential error source by requiring to manually deduct non-HTML errors from the error counts. This is a design and usability issue, though, so I hope the team can improve thisâfor example, by grouping and counting errors by language and topic, or by making this configurable. (Subscribe to and chime in on Nu Validator issue #940 if you like to see this improved, too.)
Many more companies and organizations than ever block the W3C HTML validator. In 2023, 12 of 100 websites couldnât be validatedâin 2024, it was double (relatively speaking), with 48 of 200 websites. I didnât (and might not have been able to) analyze the nature of the blocking thoughâthat is, it could be intentional (as with geo-blocking), it could be unintentional (perhaps using some overeager edge tooling).
Redirects, interstitials, cookie pages made validation hard as well. In the legend Iâm emphasizing how Iâm not caring about or highlighting these instances anymoreâafter all, the analysis checks on whether or not the most popular websites use valid HTML, and for that purpose, itâs interesting but not decisive how many issues there are, or if a page to be tested doesnât happen to be the main page.
There was one website in the sample, Fast.com, that used HTML 4.01 Strict! Unfortunately, the document was written following XHTMLâHTML, which came with many related conformance issues.
There was also one instance that couldnât be validated at all. Iâve passed on my observations to the W3C team.
Interpretation
Iâm sharing this annual report for your and the communityâs interpretation.
Yet here are two cents:
For the new data, there are several improvements (notably a lower average number of HTML errors). Thatâs better than if we had observed more degradations, but itâs not clear whether weâre dealing with any significant shift. Sustained improvements with more websites using valid HTML would underscore such a shift.
Iâm writing a lot about HTML conformance, and I am to write more about it. For those of you who donât know this part of my work, I think we all benefit from ensuring HTML conformance because itâs a foundational quality attribute that hedges against shipping unnecessary and dysfunctional code and payload to our users, and because itâs the easiest-to-implement quality bar we can employ in our profession. Professional web developers write valid HTML.
If youâre working on a website covered in the analysisâreally, if youâre working on any websiteâ, write HTML, and check (validate) that what you ship is valid HTML.
* I do so faithfully, that is, I donât check and question Ahrefsâ methodology. There had been concerns about some of these top sites in the pastâincluding that a few seemed spammyâ, but Iâm leaving that issue in Ahrefsâ part of the field. This is not to say that you shouldnât be critical about any of this.
About Me
Iâm Jens (long: Jens Oliver Meiert), and Iâm a frontend engineering leader and tech author/publisher. Iâve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, Iâm a contributor to several web standards, and I write and review books for OâReilly and Frontend Dogma.
I love trying things, not only in web development (and engineering management), but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (Be critical, interpret charitably, and give feedback.)
Read More
Maybe of interest to you, too:
- Next: The Assessment Paradox
- Previous: On Disagreement
- More under Development
- More from 2024
- Most popular posts
Looking for a way to comment? Comments have been disabled, unfortunately.
Get a good look at web development? Try WebGlossary.infoâand The Web Development Glossary 3K. With explanations and definitions for thousands of terms of web development, web design, and related fields, building on Wikipedia as well as MDN Web Docs. Available at Apple Books, Kobo, Google Play Books, and Leanpub.