On Semantics in HTML

Published on October 26, 2011 (↻ February 5, 2024), filed under Web Development (RSS feed for all categories).

This and many other posts are also available as a pretty, well-behaved ebook: On Web Development.

As web developers we like to talk about “semantic markup,” a somehow inaccurate short form for “markup that is meaningful and used how it’s supposed to be used.” We also like discussions around what markup is appropriate when, and to ramble on markup that is “meaningless.” In many cases markup decisions and discussions don’t stop at HTML elements but also cover ID and class names.

Now when we’re talking about “semantic markup,” where is all that meaning actually coming from?

In essence, semantics in HTML is all about who and how many agree on the meaning, both of elements and ID and class names.

More thoughts.

In general you can say that initially, HTML and especially XML elements have no meaning.
In a way, however, we assign standards bodies like the W3C the authority (or accept their mandate) to tell us what HTML elements mean and what their purpose is. That is, only through these bodies’ directive do we agree on p elements representing paragraphs, ul and ol elements representing lists, and also div elements carrying little semantic weight.

It would well be possible to both not assign these elements any meaning (that’s what authors involuntarily did with using tables for layout), or to assign them a different meaning (why would p not be great for parentheses?).
Similarly, we accept certain communities’ interpretation of what markup may mean. Think microformats. Their markup constructs don’t have any meaning, either, per se, but with a lot of people agreeing on their purpose they do become meaningful.
Next in line is common sense. A class like “error” or an ID “author” has meaning because it defines a purpose that can be understood and also agreed on. With these names also being advisable for maintainability reasons we now know why functional ID and class names are most useful.
Then the terrain gets a bit more rough with generic names like “aux” or “alt[ernative]”. Here we’re leaving the semantics trail as generic names are harder to grasp, yet their purpose is less to add meaning but to avoid pseudo-meaning and serve as helper constructs.
Last are obfuscated, random, or presentational names. They are meaningless and should be avoided. Presentational names especially as they impose the biggest threat to maintainability.

As this list goes from “most meaning” to “least meaning,” you can see why blockquote can rather be accepted to mean a quote than “vcard” for an hCard container than “login” for a sign-in field than “aux” for a helper class than “red” for I don’t know. It also shows why you don’t need to have a class “list-item” on an li element as it is already defined as a list item on a higher, namely the spec level.

As I love disclaimers, this may not be all there is to say about the topic but it was good enough for a write-down-everything-that-comes-to-mind-on-semantics-now post.

Update (August 5, 2014)

In the meantime, in 2012, I also wrote an article about semantics for Google. I believe it adds detail and value to the points made here.

Update (November 3, 2014)

Web Components and custom elements rank the same as IDs or class names in terms of meaning, and the same naming best practices apply.

About Me

Jens Oliver Meiert, on September 30, 2021.

I’m Jens, and I’m an engineering lead and author. I’ve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, I’m close to W3C and WHATWG, and I write and review books for O’Reilly and Frontend Dogma.

With my current move to Spain, I’m open to a new remote frontend leadership position. Feel free to review and refer my CV or LinkedIn profile.

I love trying things, not only in web development, but also in other areas like philosophy. Here on meiert.com I share some of my views and experiences.

Comments (Closed)

On October 26, 2011, 22:06 CEST, John Foliot said:
Hey Jens,

You wrote:

A class like “error” or an ID “author” has meaning because it defines a purpose that can be understood and also agreed on.

I have to disagree here, as the first minute this runs up against internationalization issues it breaks. For example <div class=”poszukiwanie”> means what, exactly? (Hint: that’s Polish)

This is the primary reason why HTML5 has introduced a number of new landmark elements, and why previously ARIA introduced both landmark and structural roles - to disambiguate author IDs and classes.

Using ID and class names that have a meaning to the author is helpful, but only to the author (or perhaps authoring team), no other real value is derived by the browsers or Adaptive Technology. So it’s not so much that I disagree with your sentiment, but rather that it is not as important as you seemingly imply.

Just my $0.02 from the accessibility trenches.
On October 29, 2011, 22:33 CEST, Simon Schick said:
Verry good article.
I just read though it shortly but I agree in all points.

@John: As a programmer I like to define english class-names and ids even if my national-language is not english.
On October 30, 2011, 4:34 CET, karl said:
@Simon and as a websmith (and native French) but working mainly in English, I like to define my class names in French.

One size doesn’t fit all. I agree with John on that.
On October 30, 2011, 12:46 CET, Andy Mabbett said:
Good article. When you wrote:

Their markup constructs don’t have any meaning either, per se, but with a lot of people agreeing on their purpose they do become meaningful

it struck me that this is little different to spoken language, although its evolution is less controlled.

However, when John says:

no other real value is derived by the browsers or Adaptive Technology

he overlooks that that’s exactly what browser (via add-ons) do with microformats; and ATs could do.

If people start to use meaningful class names, then common patterns will emerge, just as with spoken language.