HTML vs. XHTML: Why HTML Wins
Post from December 19, 2008 (↻ March 15, 2018), filed under Web Development.
This and many other posts are also available as a pretty, well-behaved e-book: On Web Development.
Document types are cool, and there are plenty of them. There are plenty, countless discussions about the “right” document type, too. Alas, these discussions often deal with irrelevant details or miss the point. A decisive factor is performance. And that suggests the choice: Just use HTML 4 or, at least “formally,” HTML 5.
How does performance or load time, which we can deem critical, influence a document type decision? In comparison with valid HTML omitting optional tags, file size of XHTML documents is about 5 (large documents) to 10% (small documents) bigger, adding a realistic file size overhead of at least 400 bytes. I did several tests in that regard which I intend to publish at some point, and the numbers are pretty reliable and easy to reproduce.
So no matter what flavor you typically prefer—keep in mind that the popular document types basically feature the same elements—, use of HTML will mean the most benefits. Not only will you be able to save some bytes, you’ll also be untouched by MIME type questions and such.
Regarding XHTML, all the classical arguments for XHTML seem to be related to its interoperability potential as well as the semantic ideal XHTML 2.0 tries to achieve. However, these arguments are as of yet hypothetical and hence not practically relevant.
This site itself still uses supposed XHTML due to historical as well as consistency reasons (the English part of this site uses WordPress which suggested XHTML, too). Even though I’ve doubted whether or not cost of problem really outweighs cost of solution, this site’s main document type is subject to change. To switch to HTML 5 just like most of my personal projects already do.
About the Author
Jens Oliver Meiert is a technical lead and author (sum.cumo, W3C, O’Reilly). He loves trying things, including in the realms of philosophy, art, and adventure. Here on meiert.com he shares and generalizes and exaggerates some of his thoughts and experiences.
If you have any thoughts or questions (or recommendations) about what he writes, leave a comment or a message.
On December 19, 2008, 21:28 CET, molily said:
I don’t just get the point of this article. I’d consider it as incredibly shortsighted to omit tags whereever it’s possible only to achieve a smaller file size. I’d recommend everyone not to omit any of the mentioned end tags. At SELFHTML, we suggest to write HTML 4 like XHTML requires it: Don’t omit optional start and end tags, don’t use the dubious SGML features. This is about code readability, mantainability and consistency - and this affects beginners and professionals. But we already had this discussion when you posted the list of optional tags.
these discussions often deal with irrelevant details or miss the point entirely. The only decisive factor seems to be performance.
I think this supposed “performance” benefit is actually the least decisive factor. In my opinion, you don’t do justice to those discussions.
XHTML and HTML both have their uses. Personally I prefer HTML for most uses because it won’t hurt the user if I make a mistake and don’t notice.
If file size is a concern, then HTML gives more opportunities for saving space, but if you’re using gzip even that’s not really a big deal.
On December 19, 2008, 23:44 CET, rohnn said:
Having 200-300kb of js libs and images, you want to produce some low quality code that’ll be hard to read and maintain, just to win like a couple of bytes…
Sorry I don’t see the point of this approach.
I completely agree with you. I’ve sides with HTML 4.01 Strict for awhile now. I see that your own site uses xhtml though.
On December 20, 2008, 6:06 CET, Jin said:
The most valid argument against XHTML is that IE doesn’t support true XML.
The End. 😊
On December 20, 2008, 6:46 CET, Duluoz said:
You could also make the case that its quicker to whip up some HTML vs. XHTML as well. View performance in terms of labor.
On December 20, 2008, 10:44 CET, wow said:
Come one, a website is not a video stream. The few extra bytes wont affect your site!
Jens, if you know what you’re doing (and I’m sure you do), go ahead. But this advice should not be given to others, IMHO.
If you’re going to save some bytes, why don’t you throw out all indents and line breaks? That would save a lot more bytes than some omitted tags. Would that make the code less maintainable? You bed. But so does omitting tags.
Your comparision of HTML and XHTML sources of an empty webpage is not fair since it does not reflect a real webpage. I had made a real world example about the problems with omitted tags.
As rohnn has pointed out, bytes can be saved elsewhere: optimize images (i.e. use the adequate graphics format and adequate compression rate, remove meta information).
Just don’t make the HTML code vulnerable to errors; never omit optional tags.
Once we’re there, we can write XHTML instead of HTML. @Jin: I don’t see the benefit of XHTML in serving as
application/xhtml+xmlto the client , but on the author’s side: stricter syntax rules, possibility of usage of XML tools.
I wholeheartly agree with molily: XHTML wins over HTML.
On December 20, 2008, 14:15 CET, rohnn said:
By “low quality” I mean that the code will most probably be very hard to maintain and/or update/upgrade.
I didn’t think in term of end user (for once.:) ) but rather in terms of producer.
Having a quite big page from a quite big site… I think people are really rapidly going be lost in the page tree/structure.
As for js libs & image size, all I am saying is that you prefer HTML vs xHTML because it saves a few bytes when pages do actually load hundreds of KB in libs & images and I believe this is not a fair argument.
i’m not agree with you.
A few bytes are note the cause a web designer choose HTML instead XHTML.
I think the future of web-code is semantic and HTML can’t offer any instruments fot this way.
On December 20, 2008, 18:39 CET, Matthew said:
I think this is absurd. It is not professional. It goes against what most of the community has been fighting for. It’s a step backward, not forward. Not to mention the possible SEO ramifications this mentality of ‘it’s OK if I miss a couple of tags now’ would cause, as at least one commenter has suggested.
Sorry if I’m coming off strong, this is just insane to me.
You seem to be basing your whole argument on the axiom that “The only decisive factor seems to be performance”. Unfortunately that’s not the case for everyone: decisions factor vary widely depending on one’s context.
The only relevant factor for your decision is performance, and the conclusions you draw from that are genuinely interesting.
I think your study/article would be more valuable to the community and less controversial if you wrote it “IF your main decision factor is performance THEN you want HTML because […]”.
Indeed a table giving the pros and cons of various formats (which I guess could include not only XHTML, HTML5, HTML4 but also the likes of flash, SVG etc.) against a number of constraints and priorities would be very nice. Has anybody ever worked on such a thing?
I wholeheartedly disagree - XHTML gives structure to an otherwise unstructured format. If you are really that concerned about bandwidth, try removing all the unnecessary whitespace from your HTML - each line break, tab, and space adds extra bytes to the stream, so if you don’t care about formatting the markup, you should start there. It’s just as crazy as having unstructured markup.
I think the bigger advantage is not forcing neophytes to type every single thing in lower case, which everyone screws up from time to time.
On December 22, 2008, 3:35 CET, Jeremy Weiskotten said:
-1 Premature optimization.
On December 22, 2008, 7:06 CET, Website Design said:
True that most of the people prefer HTML but XHTML also has users.
The structure I’m referring to is just the standard XML rules, which are not difficult at all to follow: Ensuring all opening tags have a closing tag (or a slash at the end), and making sure attributes have values. If you follow XHTML, anyone using an XML parser can read your markup and actually find what they’re looking for - there’s no chance of a BR or IMG without a closing slash, or attributes without real values. In addition, in today’s world of nested div inside of nested div inside of nested div, you can easily screw up the markup on a page if you’re not careful. If you validate your markup against XML rules, you’ll easily determine if your markup is really valid before publishing.
When I say regular HTML is unstructured, I mean that it is very close to following a set of rules, but there are a few “exceptions”, which throw a wrench in the whole deal. XML doesn’t have those exceptions - every tag closes, every attribute has a value. Every XML document, with no exception, follows these rules. In my opinion, rules are very important when dealing with technology, especially with a language that’s interpreted by dozens of different client apps (browsers). You know how IE has generally been more “forgiving” with bad markup than other browsers? That’s mostly due to the fact that IE accepts more bad XML (tags not closing, etc.) than other browsers. This gives developers and designers an excuse to be lazy by not using valid XHTML (or even valid HTML), and creates conformance issues when working with other browsers and OS’s. With so many browsers out there each doing different things, the more standardized your code, the better chance your stuff will function on all browsers.
But if you really want to save a few bytes here and there by not closing out your BR tags, more power to you.
On December 22, 2008, 18:01 CET, Duluoz said:
What I am seeing from these comments are those who disagree that one should spend time exploring the optimization options of HTML vs. XHTML are those who give excuse for other forms of invalidity or failure to do due diligence to the craft. The point of the article is quite clear and to the point - given you’ve done your job, at the end of the day, which would give you optimal performance? What am I missing?
I do hope you don’t close comments, it’s got to be more constructive allowing discussion. 😊
I’m personally a fan of XHTML because it’s valid XML, and can be parsed as such whether by browsers, XSLT, or magic.
That said, I’d be curious to see what kind of render times we get from XML versus HTML parsers on larger documents these days. Or is that being pedantic now? >_>
On December 23, 2008, 7:32 CET, Fred Boulton said:
Just to throw this in, I am seeing more and more, as I look at source code of Web pages, the omission altogether of a DTD.
I wonder how much load time this would save?
I don’t see that it’s possible to leave out anything that should be there (HMTL or XHTML) if developers validated their pages, and who would build a Web site without validating every page?
Comments once again prove that the biggest XHTML advocates know the least about both XHTML and HTML 😔
Or maybe they do know about SHORTTAG, about XHTML not being supported by IE, about the fact, that unless you specify proper MIME type XHMTL is handled by the same HTML parser (and relies on the misimplementation of the forementioned SHORTTAG feature), etc. etc.
If someone is not capable of writing celan HTML s/he should not touch XHTML at all.
And the “possibility of using XML tools”… what can I say.
@Ash: there was a case that browser could do incremental rendering for the HTML documents, but had to wait for the XHTML document to be loaded fully before rendering it. That with the proper MIME type of course, serving XHTML with text/html makes no sense whatsoever. Not sure if that is still the case though.
Just wondering, but speaking about performance only (as this seems to be the point of the article)
Isn’t html more performance-heavy to render just because it is more lenient? xhtml is strict so in theory you could throw this draconian error-handling right out of the window.
Html is more or less a mess when it comes to closing tags, so it seems a little harder to process and to get things as intended by the author?
Just a question, really interested in the anwser.
On December 24, 2008, 21:28 CET, Amber said:
I tend to favor HTML. I don’t really see the point of XHTML (well i do if i look) and the benefit. IE, which is the browser of choice …for now, only renders sites with MIME. why spend the time when you can simply stick with HTML - less work - more universal - it’s not broken, why change?
i especially like Ian’s point about making mistakes and still getting by without a total revamp.
I am going to be the boy who spots the emperor in his new clothes and say that shouldn’t we really be aiming for one standard rather than multiple standards.
It may well be oversimplification but why on earth can’t we have one or the other and be done with it, otherwise the current divergence is only going to get worse.
I tend to think that the 5%-10% saving quoted is only likely to be signigicant in already bloated pages, for which it is more productive to focus on the bigger savings gained by optimising images etc.
I would think that a 5-10% saving on load time and bandwidth is not (only) significant for large pages, but more importantly for popular websites. It’s no coincidence that Google has done everything it can to reduce the size of their main page, including sticking with HTML (saving 1.25% of their file size just by omitting a doctype), removing line breaks and spaces, using inline scripts and styles as opposed to loading external files, and using the deprecated FONT tag for some styling instead of having a DIV assigned a class where the style is defined elsewhere in the page.
By doing all of that they have probably shaved 15% or more from the size of the page, which is very significant, especially for a website that gets as many hits as Google does.
To rule out performance optimisations in favour of “validity” and “structure” is ignoring real world client needs. Every bit helps.
most xhtml docs serves as html anyway, so I’ll choose html
but I’ve my blog uses xhtml (valid xml too, I guess..but still useless, I knew it) for experimental and learning purpose
On March 31, 2009, 1:46 CEST, cid said:
HTML wins because interoperability loses it’s meaning when we talk about a web publishing language and the most popular web browser doesn’t support the correct MIME type.
Besides, I don’t understand why param’s name attribute is implied in XHTML 1.0 Strict while it’s required in other XHTML 1.0 and HTML 4.01 DTDs.
If it’s not an error then what is it?
On April 23, 2009, 16:13 CEST, Francesco said:
@Ben Reimers: Please look at this article. Google doesn’t omit the doctype to reduze file size. They do it because they don’t care.
On May 29, 2009, 7:48 CEST, John B said:
Short answer: Xhtml wins!
Why: Valid XHTML is more strict and less forgiving. This makes layout engines’ code paths easier. With all of HTMLs omitted tags and “messiness”, todays layout engine have to work harder to understand just what the hell the page means. Simple is faster.
HTML is loose and forgiving, why? Because it was defined for humans to write it. XHTML on the other hand is more redundant and something more geared for a machine to produce. Everyday, fewer and fewer pages are written by hand. Eventually the code quality of the pages should increase which would lessen the need for layout engines that handle sloppy code. Eventually browser manufacturers can drop the HTML engines and just use the faster and simpler XHTML engines.
If you’re not sure about XHTML being easier to parse into the DOM tree, just read up on XML parser internals!
On May 31, 2009, 17:18 CEST, Duri [SVK] said:
I hate XHTML restrictions. In XHTML, I must use quotes around attribute values, close every li, tr, td etc., write disabled=”disabled”… HTML hasn’t so strict rules.
XHTML is also slower than HTML.
On May 6, 2010, 16:54 CEST, Mike Hopley said:
Omitting optional tags is a nice idea, but you have to be careful. For instance, consider the following valid HTML document (rendered in pseudo-code for the comments):
[!doctype html] [html lang="en"] [title]Implied DOM test (IE)[/title] [div] [form action=""] [p][input]Test [/form] [/div]
Here, omitting the optional [head] tag mangles the DOM in IE. Just try it.
I’m astonished that Google is recommending such dangerous practices. Theory is all very well, but could we please stick to recommending things that actually work? And by “work”, I mean “work robustly”, rather than “work most of the time and then fail unpredictably”.