HTML 2022: 20 Additional Observations From Analyzing the Web Almanac Data
Published on Oct 10, 2022 (updated Oct 18, 2024), filed under development (feed). (Share this on Mastodon or Bluesky?)
You saw the release of the HTTP Archive’s 2022 Web Almanac? Yes, it’s live—with enough chapters to make me inform Frontend Dogma readers about a large number of articles coming up. (If you’re a developer and don’t know the Web Almanac yet—you’ll probably like it!)
The Web Almanac is turning into an institution, one of those publications to look forward to each year, much like the State of JavaScript and the State of CSS.
This year, I had again the pleasure to analyze and document the data for HTML, in the Web Almanac’s Markup chapter. So while I’d feel honored if you like to check out that chapter, let me honor you by sharing 20 things that I didn’t get to call out in it.
20 More Observations from 30 Sheets of Data
The no-doctype regression: The chapter mentions it in passing, but the 2.5% → 2.7% regression of more pages not using a doctype (mobile; 2.7% → 3.0% on desktop) is another worrying indicator of a decaying craft. It’s a small and perhaps temporary dent, but the trend is negative.
Conditional Comments zombies: The mobile data set revealed 2,885,132 “conditional comments.” All past loathing aside,—why are so many of these still around?
SVG use on the rise: In 2021, 46.4% of (unspecified?) pages used at least one SVG. In 2022, this is up 8.3% (to 54.7%).
Bring out the elements trash: Get yourself a nice cup of specialty coffee and scan the list of elements in use. (Pause.) Let’s please always validate our sites’ HTML output. Doing so contributes to a higher-quality Web and a greater career (and a shorter Web Almanac elements list).
Long live
isindex
: One element can still be found in the Web Almanac data. And don’t ask me why, I still love it. (It was deprecated with HTML 4.01.)Mind the “embed” elements:
object
,embed
, andparam
are still alive.Pornhub uses custom elements: The HTML analysis included a sheet about “top pages with custom elements.” Pornhub is one of them, though only using one nineteenth of what Mercari uses (2 vs. 38).
Someone uses 108 custom elements—and 7 other (desktop) pages use more than 100 custom elements, too. I suppose these don’t have to be 100+ unique elements, but didn’t dig into that.
65.7% of pages contain a form: Rick called that out in the data, but I didn’t get to review and discuss forms in the Markup chapter. The number seems big to me, though likely related to the data set still relying largely on homepages. How does the number look to you?
18.5% of all
input
s are of type “text”—counting both thoseinput
elements explicitly settingtype=text
, and those that omit it because it’s the default.There are almost as many verbose instances of defining a submit button as there are concise ones: On 41% of pages we find
button
s with no type specified, on 32% we findbutton
s of type “submit.” Butbutton
s without a type are submit buttons, too—i.e.,<button>
suffices.The median form contains 4
input
elements; the 10th and 25th percentile contain 2, the 75th percentile 7, and the 90th percentile 14input
s.We’re using too many classes (4,300,024,711 on 7,940,685 pages). (Just as we’re using too many
div
s.)We’re dealing with too much metadata cruft. 107 different metadata directives, each one added with the idea it was relevant, even important? We’re adding metadata too easily. (Update Your HTML IV—coming out in November—will have a chapter about “metadata madness.”)
It’s great to see strong use of
data-*
.data-*
attributes allow to embed “custom non-visible data,” and websites are making ample use of them. The reason blossoming use ofdata-*
is so much better than blooming use ofmeta
elements is thatdata-*
use is usually driven by site owner and developer needs (which they typically know), whilemeta
elements are typically dictated by third parties (who know their own needs, too—but which may or may not fit those of site owners and their developers).7.3% of “mobile pages” and 11.53% of “desktop pages” set no viewport information: Unsurprising (this information is more useful on mobile) and fascinating (no regard or awareness for mobile on some sites, at all?) at the same time.
PNG is the most popular favicon format, and it’s becoming more popular: 2021, 35.3% of favicons were PNGs; 2022, it’s 37.7%. (On 10,035 pages in the mobile set, it’s spelled “pnj.”) But SVG is on the rise, too: 2021, 0.4%; 2022, 1%.
80% of links use
target=_blank
? (Did I read this right; why!)There are too many
javascript:
links: 25ish% of all links are of this type. (You probably thought, too, that these died in the early 2000’s.)mailto:
(0.3%) andtel:
(0.5%), more useful schemes, are far less popular.However, going by what you can find per page,
mailto:
(29.5%) andtel:
(26.6%) are more popular thanjavascript:
(22.2%). That is, on 3 of 10 pages you find amailto:
reference, on 1 of 4 atel:
one, and on 1 of 5 ajavascript:
one. Next?whatsapp:
andviber:
—on about 1 of 200 pages.
This is it! This is what I found combing through the data once more. Did I make a mistake? Did I miss something else that’s worth highlighting (I’m sure I did)? Have you shared your own highlights? Respond to the tweet for this post, and let’s start adding life to the #htmlalmanac tag. And yes!—if you’re into minimal, quality HTML, maybe you’ll enjoy my HTML book series.
About Me
I’m Jens (long: Jens Oliver Meiert), and I’m a web developer, manager, and author. I’ve worked as a technical lead and engineering manager for small and large enterprises, I’m an occasional contributor to web standards (like HTML, CSS, WCAG), and I write and review books for O’Reilly and Frontend Dogma.
I love trying things, not only in web development and engineering management, but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (I value you being critical, interpreting charitably, and giving feedback.)