Write HTML, the HTML Way (Not the XHTML Way)
Published on May 17, 2022 (updated Oct 3, 2024), filed under development, html, minimalism (feed). (Share this on Mastodon or Bluesky?)
This article first appeared at CSS-Tricks.
You may not use XHTML (anymore), but when you write HTML, you may be more influenced by XHTML than you think. You are very likely writing HTML, the XHTML way.
What is the XHTML way of writing HTML, and what is the HTML way of writing HTML? Letâs have a look.
Contents
HTML, XHTML, HTML
In the 90s, there was HTML. In the 2000s, there was XHTML. Then, in the 2010s, we switched back to HTML. Thatâs the simple story.
You can tell by the rough dates of the specifications, too: HTMLÂ â1â 1992, HTMLÂ 2.0 1995, HTMLÂ 3.2 1997, HTMLÂ 4.01 1999; XHTMLÂ 1.0 2000, XHTMLÂ 1.1 2001; âHTML5â (I dislike the space-phobic spelling), 2007.
XHTML became popular when everyone believed XML and XML derivatives were the future. âXML all the things.â
For HTML, this had a profound effect: The effect that we learned to write it the XHTML way.
The XHTML Way of Writing HTML
The XHTML way is well-documented, because XHTMLÂ 1.0 describes it in great detail in its section on âDifferences with HTMLÂ 4â:
- Documents must be well-formed
- Element and attribute names must be in lower case
- For non-empty elements, end tags are required
- Attribute values must always be quoted
- Attribute minimization [is not supported]
- Empty elements [need to be closed]
- White space handling in attribute values [is done according to XML]
- Script and style elements [need CDATA sections]
- SGML exclusions [are not possible]
- The elements with
id
andname
attributes [likea
,applet
,form
,frame
,iframe
,img
, andmap
, should only useid
] - Attributes with pre-defined value sets [are case-sensitive]
- Entity references as hex values [must be in lower case]
Does this look familiar? With the exception of marking CDATA content, as well as dealing with SGML exclusions, you probably follow all of these rules. All of them.
Although XHTML is dead, many of these rules have never been questioned again. Some have even been elevated to âbest practicesâ for HTML.
That is the XHTML way of writing HTML, and its lasting impact on the field.
The HTML Way of Writing HTML
One way of walking us back is to negate the rules imposed by XHTML. Letâs actually do this (without the SGML part, because HTML isnât based on SGML anymore):
- Documents may not be well-formed
- Element and attribute names may not be in lower case
- For non-empty elements, end tags are not [always] required
- Attribute values may not always be quoted
- Attribute minimization is supported
- Empty elements donât need to be closed
- White space handling in attribute values isnât done according to XML
- Script and style elements donât need CDATA sections
- The elements with
id
andname
attributes may not only useid
- Attributes with pre-defined value sets are not case-sensitive
- Entity references as hex values may not only be in lower case
Letâs remove the esoteric things, the things that donât seem relevant. This includes XML whitespace handling, CDATA sections, doubling of name
attribute values, the case of pre-defined value sets, and hexadecimal entity references:
- Documents may not be well-formed
- Element and attribute names may not be in lower case
- For non-empty elements, end tags are not always required
- Attribute values may not always be quoted
- Attribute minimization is supported
- Empty elements donât need to be closed
Peeling away from these rules, this looks a lot less like weâre working with XML, and more like working with HTML. But weâre not done yet.
âDocuments may not be well-formedâ suggests that it was fine if HTML code was invalid. It was fine for XHTML to point to wellformedness because of XMLâs strict error handling. But while HTML documents work even when they contain severe syntax and wellformedness issues, itâs neither useful for the professional, nor our field, to use and abuse this resilience. (Iâve argued this case beforeâsee In Critical Defense of Frontend Development.)
The HTML way would therefore not suggest âdocuments may not be well-formed.â It would also be clear that not only end, but also start tags arenât always required. Rephrasing and reordering, this seems to be the essence:
- Start and end tags are not always required
- Empty elements donât need to be closed
- Element and attribute names may be lower or upper case
- Attribute values may not always be quoted
- Attribute minimization is supported
Examples
How does this look like in practice?
For start and end tags, be aware that many tags are optional. A paragraph and a list, for example, are written like this in XHTML:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<ul>
<li>Praesent augue nisl</li>
<li>Lobortis nec bibendum ut</li>
<li>Dictum ac quam</li>
</ul>
In HTML, however, you can write them using only this code (which is valid):
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<ul>
<li>Praesent augue nisl
<li>Lobortis nec bibendum ut
<li>Dictum ac quam
</ul>
Developers also learned to write void elements like such:
<br />
This is something XHTML brought to HTML, but as the slash has no effect on void elements, you only need
<br>
In HTML, you can also just write everything in all caps:
<A HREF="https://css-tricks.com/">CSS-Tricks</A>
It looks like youâre yelling and you may not like it, but itâs okay to write HTML like this.
When you want to condense that link, HTML offers you the option to leave out certain quotes:
<A HREF=https://css-tricks.com/>CSS-Tricks</A>
(As a rule of thumb, when the attribute value doesnât contain a space or an equal sign, itâs usually fine to drop the quotes.)
Finally, HTMLâHTML, instead of XHTMLâHTML, also allows to minimize attributes. That is, instead of marking an input
element as required and read-only like this:
<input type="text" required="required" readonly="readonly">
You can minimize the attributes:
<input type="text" required readonly>
And if youâre not only taking advantage of the fact that the quotes arenât needed, but that text
is the default for the type
attribute here (there are more such unneeded attributeâvalue combinations), you get an example that shows HTML in all its minimal beauty:
<input required readonly>
Write HTML, the HTML Way
The above isnât a representation of where HTML was in the 90sâHTML, back then, was table-itis packed with presentational code, largely invalid (like today), with wildly varying support in user agents. Yet itâs the essence of what we would have wanted to keep if XML and XHTML hadnât come around.
If youâre open to a suggestion of what a more comprehensive, contemporary way of writing HTML could look like, I have one. (HTML is my main focus area, so Iâm augmenting this by links to some of my articles.)
- Respect syntax and semantics
- Use the options HTML gives you, as long as you do so consistently
- Remember that element and attribute names may be lower or upper case
- Keep use of HTML to the absolute minimum
- Remember that presentational and behavioral markup is to be handled by CSS and JavaScript instead
- Remember that start and end tags are not always required
- Remember that empty elements donât need to be closed
- Remember that some attributes have defaults that allow these attributeâvalue pairs to be omitted
- Remember that attribute values may not always be quoted
- Remember that attribute minimization is supported
Itâs not a coincidence that this resembles the three ground rules for HTML, that it works with the premise of a smaller payload also leading to faster sites, and that this follows the school of minimal web development. None of this is newâour field could merely decide to rediscover it. (Tooling is available, too: html-minifier is probably the most established of it, being able to handle all HTML optimizations.)
Youâve learned HTML the XHTML way. HTML isnât XHTML. Rediscover HTML, and help shape a new, modern way of writing HTMLâwhich acknowledges, but isnât necessarily based on XML.
About Me
Iâm Jens (long: Jens Oliver Meiert), and Iâm a web developer, manager, and author. Iâve been working as a technical lead and engineering manager for companies youâve never heard of and companies you use every day, Iâm an occasional contributor to web standards (like HTML, CSS, WCAG), and I write and review books for OâReilly and Frontend Dogma.
I love trying things, not only in web development and engineering management, but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (I value you being critical, interpreting charitably, and giving feedback.)