On Semantics in HTML
Post from October 26, 2011 (↻ June 5, 2021), filed under Web Development.
This and many other posts are also available as a pretty, well-behaved ebook: On Web Development.
As web developers we like to talk about “semantic markup,” a somehow inaccurate short form for “markup that is meaningful and used how it’s supposed to be used.” We also like discussions around what markup is appropriate when, and to ramble on markup that is “meaningless.” In many cases markup decisions and discussions don’t stop at HTML elements but also cover ID and class names.
Now when we’re talking about “semantic markup,” where is all that meaning actually coming from?
In essence, semantics in HTML is all about who and how many agree on the meaning, both of elements and ID and class names.
In general you can say that initially, HTML and especially XML elements have no meaning.
In a way, however, we assign standards bodies like the W3C the authority (or accept their mandate) to tell us what HTML elements mean and what their purpose is. That is, only through these bodies’ directive do we agree on
pelements representing paragraphs,
olelements representing lists, and also
divelements carrying little semantic weight.
It would well be possible to both not assign these elements any meaning (that’s what authors involuntarily did with using tables for layout), or to assign them a different meaning (why would
pnot be great for parentheses?).
Similarly, we accept certain communities’ interpretation of what markup may mean. Think microformats. Their markup constructs don’t have any meaning, either, per se, but with a lot of people agreeing on their purpose they do become meaningful.
Next in line is common sense. A class like “error” or an ID “author” has meaning because it defines a purpose that can be understood and also agreed on. With these names also being advisable for maintainability reasons we now know why functional ID and class names are most useful.
Then the terrain gets a bit more rough with generic names like “aux” or “alt”. Here we’re leaving the semantics trail as generic names are harder to grasp, yet their purpose is less to add meaning but to avoid pseudo-meaning and serve as helper constructs.
Last are obfuscated, random, or presentational names. They are meaningless and should be avoided. Presentational names especially as they impose the biggest threat to maintainability.
As this list goes from “most meaning” to “least meaning,” you can see why
blockquote can rather be accepted to mean a quote than “vcard” for an hCard container than “login” for a sign-in field than “aux” for a helper class than “red” for I don’t know. It also shows why you don’t need to have a class “list-item” on an
li element as it is already defined as a list item on a higher, namely the spec level.
As I love disclaimers, this may not be all there is to say about the topic but it was good enough for a write-down-everything-that-comes-to-mind-on-semantics-now post.
Update (August 5, 2014)
In the meantime, in 2012, I also wrote an article about semantics for Google. I believe it adds detail and value to the points made here.
Update (November 3, 2014)
Web Components and custom elements rank the same as IDs or class names in terms of meaning, and the same naming best practices apply.
I’m Jens Oliver Meiert, and I’m an engineering manager and author. I’ve worked as a technical lead for Google, I’m close to the W3C and the WHATWG, and I write and review books for O’Reilly. Other than that, I love trying things, sometimes including philosophy, art, and adventure. Here on meiert.com I share some of my views and experiences.
If you have questions or suggestions about what I write, please leave a comment (if available) or a message.
A class like “error” or an ID “author” has meaning because it defines a purpose that can be understood and also agreed on.
I have to disagree here, as the first minute this runs up against internationalization issues it breaks. For example <div class=”poszukiwanie”> means what, exactly? (Hint: that’s Polish)
This is the primary reason why HTML5 has introduced a number of new landmark elements, and why previously ARIA introduced both landmark and structural roles - to disambiguate author IDs and classes.
Using ID and class names that have a meaning to the author is helpful, but only to the author (or perhaps authoring team), no other real value is derived by the browsers or Adaptive Technology. So it’s not so much that I disagree with your sentiment, but rather that it is not as important as you seemingly imply.
Just my $0.02 from the accessibility trenches.
Verry good article.
I just read though it shortly but I agree in all points.
@John: As a programmer I like to define english class-names and ids even if my national-language is not english.
@Simon and as a websmith (and native French) but working mainly in English, I like to define my class names in French.
One size doesn’t fit all. I agree with John on that.
Good article. When you wrote:
Their markup constructs don’t have any meaning either, per se, but with a lot of people agreeing on their purpose they do become meaningful
it struck me that this is little different to spoken language, although its evolution is less controlled.
However, when John says:
no other real value is derived by the browsers or Adaptive Technology
he overlooks that that’s exactly what browser (via add-ons) do with microformats; and ATs could do.
If people start to use meaningful class names, then common patterns will emerge, just as with spoken language.
Have a look at the most popular posts, possibly including:
Looking for a way to comment? Comments have been disabled, unfortunately.
Perhaps my most comprehensive book: The Web Development Glossary (2020). With explanations and definitions for literally thousands of terms from Web Development and related fields, building on Wikipedia as well as the MDN Web Docs. Available at Apple Books, Kobo, Google Play Books, and Leanpub.