HTML 2022: 20 Additional Observations from Analyzing the Web Almanac Data

Post from October 10, 2022, filed under (feed).

You saw the release of the HTTP Archive’s 2022 Web Almanac? Yes, it’s live—with enough chapters to make me inform Frontend Dogma readers about a large number of articles coming up. (If you’re a developer and don’t know the Web Almanac yet—you’ll probably like it!)

The Web Almanac is turning into an institution, one of those publications to look forward to each year, much like the State of JavaScript and the State of CSS.

This year, I had again the pleasure to analyze and document the data for HTML, in the Web Almanac’s Markup chapter. So while I’d feel honored if you like to check out that chapter, let me honor you by sharing 20 things that I didn’t get to call out in it.

20 More Observations from 30 Sheets of Data

  1. The no-doctype regression: The chapter mentions it in passing, but the 2.5% → 2.7% regression of more pages not using a doctype (mobile; 2.7% → 3.0% on desktop) is another worrying indicator of a decaying craft. It’s a small and perhaps temporary dent, but the trend is negative.

  2. Conditional Comments zombies: The mobile data set revealed 2,885,132 “conditional comments.” All past loathing aside,—why are so many of these still around?

  3. SVG use on the rise: In 2021, 46.4% of (unspecified?) pages used at least one SVG. In 2022, this is up 8.3% (to 54.7%).

  4. Bring out the elements trash: Get yourself a nice cup of specialty coffee and scan the list of elements in use. (Pause.) Let’s please always validate our sites’ HTML output. Doing so contributes to a higher-quality Web and a greater career (and a shorter Web Almanac elements list).

  5. Long live isindex: One element can still be found in the Web Almanac data. And don’t ask me why, I still love it. (It was deprecated with HTML 4.01.)

  6. Mind the “embed” elements: object, embed, and param are still alive.

  7. Pornhub uses custom elements: The HTML analysis included a sheet about “top pages with custom elements.” Pornhub is one of them, though only using one nineteenth of what mercari uses (2 vs. 38).

  8. Someone uses 108 custom elements—and 7 other (desktop) pages use more than 100 custom elements, too. I suppose these don’t have to be 100+ unique elements, but didn’t dig into that.

  9. 65.7% of pages contain a form: Rick called that out in the data, but I didn’t get to review and discuss forms in the Markup chapter. The number seems big to me, though likely related to the data set still relying largely on homepages. How does the number look to you?

  10. 18.5% of all inputs are of type “text”—counting both those input elements explicitly setting type=text, and those that omit it because it’s the default.

  11. There are almost as many verbose instances of defining a submit button as there are concise ones: On 41% of pages we find buttons with no type specified, on 32% we find buttons of type “submit.” But buttons without a type are submit buttons, too—i.e., <button> suffices.

  12. The median form contains 4 input elements; the 10th and 25th percentile contain 2, the 75th percentile 7, and the 90th percentile 14 inputs.

  13. We’re using too many classes (4,300,024,711 on 7,940,685 pages). (Just as we’re using too many divs.)

  14. We’re dealing with too much metadata cruft. 107 different metadata directives, each one added with the idea it was relevant, even important? We’re adding metadata too easily. (Update Your HTML IV—coming out in November—will have a chapter about “metadata madness.”)

  15. It’s great to see strong use of data-*. data-* attributes allow to embed “custom non-visible data,” and websites are making ample use of them. The reason blossoming use of data-* is so much better than blooming use of meta elements is that data-* use is usually driven by site owner and developer needs (which they typically know), while meta elements are typically dictated by third parties (who know their own needs, too—but which may or may not fit those of site owners and their developers).

  16. 7.3% of “mobile pages” and 11.53% of “desktop pages” set no viewport information: Unsurprising (this information is more useful on mobile) and fascinating (no regard or awareness for mobile on some sites, at all?) at the same time.

  17. PNG is the most popular favicon format, and it’s becoming more popular: 2021, 35.3% of favicons were PNGs; 2022, it’s 37.7%. (On 10,035 pages in the mobile set, it’s spelled “pnj.”) But SVG is on the rise, too: 2021, 0.4%; 2022, 1%.

  18. 80% of links use target=_blank? (Did I read this right; why!)

  19. There are too many javascript: links: 25ish% of all links are of this type. (You probably thought, too, that these died in the early 2000’s.) mailto: (0.3%) and tel: (0.5%), more useful schemes, are far less popular.

  20. However, going by what you can find per page, mailto: (29.5%) and tel: (26.6%) are more popular than javascript: (22.2%). That is, on 3 of 10 pages you find a mailto: reference, on 1 of 4 a tel: one, and on 1 of 5 a javascript: one. Next? whatsapp: and viber:—on about 1 of 200 pages.

This is it! This is what I found combing through the data once more. Did I make a mistake? Did I miss something else that’s worth highlighting (I’m sure I did)? Have you shared your own highlights? Respond to the tweet for this post, and let’s start adding life to the #htmlalmanac tag. And yes!—if you’re into minimal, quality HTML, maybe you’ll enjoy my HTML book series.

Toot or tweet about this?

About Me

Jens Oliver Meiert, on September 30, 2021.

I’m Jens, and I’m an engineering lead and author. I’ve worked as a technical lead for Google, I’m close to W3C and WHATWG, and I write and review books for O’Reilly. I love trying things, sometimes including philosophy, art, and adventure. Here on I share some of my views and experiences.

If you have a question or suggestion about what I write, please leave a comment (if available) or a message. Thank you!