HTML vs. XHTML: Why HTML Wins
Published on December 19, 2008 (⻠October 18, 2024), filed under Development (RSS feed for all categories).
This and many other posts are also available as a pretty, well-behaved ebook: On Web Development.
Document types are cool, and there are plenty of them. There are plenty, countless discussions about the ârightâ document type, too. Alas, these discussions may deal with irrelevant details or miss the point. A decisive factor is performance. And that suggests the choice: Use HTML 4 or, at least âformally,â HTML 5.
How does performance or load time, which we can deem critical, influence a document type decision? In comparison with valid HTML omitting optional tags, file size of XHTML documents is about 5% (large documents) to 10% (small documents) bigger, adding a realistic file size overhead of at least 400 bytes. I did several tests in that regard which I intend to publish at some point, and the numbers are pretty reliable and easy to reproduce.
That is, no matter what flavor you typically preferâkeep in mind that the popular document types basically feature the same elementsâ, use of HTML will mean the most benefits. Not only will you be able to save some bytes, youâll also be free from MIME type questions and such.
Regarding XHTML, all the classical arguments for XHTML seem to be related to its interoperability potential as well as the semantic ideal XHTML 2.0 tries to achieve. However, these arguments are as of yet hypothetical and hence not practically relevant.
This site itself still uses supposed XHTML due to historical as well as consistency reasons (the English part of this site uses WordPress which suggested XHTML, too). Even though Iâve doubted whether cost of problem outweighs cost of solution, this siteâs main document type is subject to change, to switch to HTML 5 just like most of my personal projects have.
About Me
Iâm Jens (long: Jens Oliver Meiert), and Iâm a frontend engineering leader and tech author/publisher. Iâve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, Iâm a contributor to several web standards, and I write and review books for OâReilly and Frontend Dogma.
I love trying things, not only in web development (and engineering management), but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (Please be critical, interpret charitably, and give feedback.)
Comments (Closed)
-
On December 19, 2008, 21:28 CET, molily said:
I donât just get the point of this article. Iâd consider it as incredibly shortsighted to omit tags whereever itâs possible only to achieve a smaller file size. Iâd recommend everyone not to omit any of the mentioned end tags. At SELFHTML, we suggest to write HTML 4 like XHTML requires it: Donât omit optional start and end tags, donât use the dubious SGML features. This is about code readability, mantainability and consistency - and this affects beginners and professionals. But we already had this discussion when you posted the list of optional tags.
these discussions often deal with irrelevant details or miss the point entirely. The only decisive factor seems to be performance.
I think this supposed âperformanceâ benefit is actually the least decisive factor. In my opinion, you donât do justice to those discussions.
-
On December 19, 2008, 21:37 CET, Jens Oliver Meiert said:
Mathias,
This is about code readability, mantainability and consistency
yet how is
<!DOCTYPE html> <title></title>
less readable, maintainable, and consistent than
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body></body> </html>
� (Featuring a minimal, valid XHTML 1 sample and a minimal, valid HTML 5 sample.)
I think this supposed âperformanceâ benefit is actually the least decisive factor.
This seems to depend on the goals. When you make speed a top priorityâpersonally, I see few reasons not toâ, HTML can give you even more options to reduce file size.
I agree, however, that a novice might not necessarily want to start this way. He should probably fully understand (X)HTML first.
-
On December 19, 2008, 21:58 CET, Ian Hickson said:
XHTML and HTML both have their uses. Personally I prefer HTML for most uses because it wonât hurt the user if I make a mistake and donât notice.
If file size is a concern, then HTML gives more opportunities for saving space, but if youâre using gzip even thatâs not really a big deal.
-
On December 19, 2008, 22:11 CET, Jens Oliver Meiert said:
Thank you, Ian! As for gzip compression itâs true that the difference is not that big, but still there is a difference, and that is what I like to underscore here.
-
On December 19, 2008, 23:44 CET, rohnn said:
Having 200-300kb of js libs and images, you want to produce some low quality code thatâll be hard to read and maintain, just to win like a couple of bytesâŠ
Sorry I donât see the point of this approach. -
On December 19, 2008, 23:58 CET, Neal G said:
I completely agree with you. Iâve sides with HTML 4.01 Strict for awhile now. I see that your own site uses xhtml though.
-
On December 20, 2008, 6:06 CET, Jin said:
The most valid argument against XHTML is that IE doesnât support true XML.
The End. đ
-
On December 20, 2008, 6:46 CET, Duluoz said:
You could also make the case that its quicker to whip up some HTML vs. XHTML as well. View performance in terms of labor.
-
On December 20, 2008, 10:44 CET, wow said:
Come one, a website is not a video stream. The few extra bytes wont affect your site!
-
On December 20, 2008, 11:21 CET, Jens Oliver Meiert said:
Rohnn, well, whatâs your point? Why is HTML âlow qualityâ? Why is valid HTML omitting optional tags âlow qualityâ? (And could there be another problem if youâre using 200â300 KB of scripts?)
Neal, fair enough. As for the latter, please see my note on pseudo-XHTML on this site.
As a general note, the post may come across a bit strong. Think more about the performance potentialâif performance is a priority, HTML does give more options than XHTML, which in turn cannot even really be used (MIME type). And of course, goals like keeping things simple, doing the HTML right, and focusing on load time everywhere are still valid.
-
On December 20, 2008, 12:35 CET, Gunnar Bittersmann said:
Jens, if you know what youâre doing (and Iâm sure you do), go ahead. But this advice should not be given to others, IMHO.
If youâre going to save some bytes, why donât you throw out all indents and line breaks? That would save a lot more bytes than some omitted tags. Would that make the code less maintainable? You bed. But so does omitting tags.
Your comparision of HTML and XHTML sources of an empty webpage is not fair since it does not reflect a real webpage. I had made a real world example about the problems with omitted tags.
As rohnn has pointed out, bytes can be saved elsewhere: optimize images (i.e. use the adequate graphics format and adequate compression rate, remove meta information).
Just donât make the HTML code vulnerable to errors; never omit optional tags.
Once weâre there, we can write XHTML instead of HTML. @Jin: I donât see the benefit of XHTML in serving as
application/xhtml+xml
to the client , but on the authorâs side: stricter syntax rules, possibility of usage of XML tools.I wholeheartly agree with molily: XHTML wins over HTML.
-
On December 20, 2008, 14:15 CET, rohnn said:
Jens,
By âlow qualityâ I mean that the code will most probably be very hard to maintain and/or update/upgrade.
I didnât think in term of end user (for once.:) ) but rather in terms of producer.
Having a quite big page from a quite big site⊠I think people are really rapidly going be lost in the page tree/structure.As for js libs & image size, all I am saying is that you prefer HTML vs xHTML because it saves a few bytes when pages do actually load hundreds of KB in libs & images and I believe this is not a fair argument.
-
On December 20, 2008, 14:55 CET, Cisco said:
Hi,
iâm not agree with you.
A few bytes are note the cause a web designer choose HTML instead XHTML.I think the future of web-code is semantic and HTML canât offer any instruments fot this way.
XML, tag, and semantic use of xml: thats is important, in my opinion. Not a few bytesâŠBecause when we (web designers) regularly use javascript library havier than a html page, your reasonsâŠ. -
On December 20, 2008, 18:39 CET, Matthew said:
I think this is absurd. It is not professional. It goes against what most of the community has been fighting for. Itâs a step backward, not forward. Not to mention the possible SEO ramifications this mentality of ‘itâs OK if I miss a couple of tags nowâ would cause, as at least one commenter has suggested.
Sorry if Iâm coming off strong, this is just insane to me.
-
On December 20, 2008, 19:33 CET, Jens Oliver Meiert said:
Gentlemen! This post is only about (X)HTML, and just diagnoses thatâassuming valid outputâHTML offers most potential when it comes to file size. This seems to be a statement of fact. (Maybe I should have just used this paragraph for the post.)
The post does not suggest that everyone please goes ahead and switches to HTML and starts omitting tags. It does not deal with scripts or images or other things you should of course pay attention to as well when it comes to performance, not at all. And it does not target education or outreach either.
What I find most interesting in the discussion so far is that leaving things out is considered to make maintenance harder (sounds like I should get more furniture to make my next relocation run smootherâŠ), and that there is some, well, âunconcernâ regarding performance (I recently dropped a script that was 20 KB big because it would have increased load time by 50%, so donât ask for my thoughts on 200 KB scripts).
-
On December 21, 2008, 4:29 CET, olivier said:
You seem to be basing your whole argument on the axiom that âThe only decisive factor seems to be performanceâ. Unfortunately thatâs not the case for everyone: decisions factor vary widely depending on oneâs context.
The only relevant factor for your decision is performance, and the conclusions you draw from that are genuinely interesting.
I think your study/article would be more valuable to the community and less controversial if you wrote it âIF your main decision factor is performance THEN you want HTML because [âŠ]â.
Indeed a table giving the pros and cons of various formats (which I guess could include not only XHTML, HTML5, HTML4 but also the likes of flash, SVG etc.) against a number of constraints and priorities would be very nice. Has anybody ever worked on such a thing?
-
On December 21, 2008, 17:13 CET, Joe Enos said:
I wholeheartedly disagree - XHTML gives structure to an otherwise unstructured format. If you are really that concerned about bandwidth, try removing all the unnecessary whitespace from your HTML - each line break, tab, and space adds extra bytes to the stream, so if you donât care about formatting the markup, you should start there. Itâs just as crazy as having unstructured markup.
-
On December 21, 2008, 22:35 CET, Joe Clark said:
I think the bigger advantage is not forcing neophytes to type every single thing in lower case, which everyone screws up from time to time.
-
On December 22, 2008, 3:35 CET, Jeremy Weiskotten said:
-1 Premature optimization.
-
On December 22, 2008, 7:06 CET, Website Design said:
True that most of the people prefer HTML but XHTML also has users.
-
On December 22, 2008, 16:30 CET, Joe Enos said:
The structure Iâm referring to is just the standard XML rules, which are not difficult at all to follow: Ensuring all opening tags have a closing tag (or a slash at the end), and making sure attributes have values. If you follow XHTML, anyone using an XML parser can read your markup and actually find what theyâre looking for - thereâs no chance of a BR or IMG without a closing slash, or attributes without real values. In addition, in todayâs world of nested div inside of nested div inside of nested div, you can easily screw up the markup on a page if youâre not careful. If you validate your markup against XML rules, youâll easily determine if your markup is really valid before publishing.
When I say regular HTML is unstructured, I mean that it is very close to following a set of rules, but there are a few âexceptionsâ, which throw a wrench in the whole deal. XML doesnât have those exceptions - every tag closes, every attribute has a value. Every XML document, with no exception, follows these rules. In my opinion, rules are very important when dealing with technology, especially with a language thatâs interpreted by dozens of different client apps (browsers). You know how IE has generally been more âforgivingâ with bad markup than other browsers? Thatâs mostly due to the fact that IE accepts more bad XML (tags not closing, etc.) than other browsers. This gives developers and designers an excuse to be lazy by not using valid XHTML (or even valid HTML), and creates conformance issues when working with other browsers and OSâs. With so many browsers out there each doing different things, the more standardized your code, the better chance your stuff will function on all browsers.
But if you really want to save a few bytes here and there by not closing out your BR tags, more power to you.
-
On December 22, 2008, 18:01 CET, Duluoz said:
What I am seeing from these comments are those who disagree that one should spend time exploring the optimization options of HTML vs. XHTML are those who give excuse for other forms of invalidity or failure to do due diligence to the craft. The point of the article is quite clear and to the point - given youâve done your job, at the end of the day, which would give you optimal performance? What am I missing?
-
On December 23, 2008, 7:11 CET, Ash said:
I do hope you donât close comments, itâs got to be more constructive allowing discussion. đ
Iâm personally a fan of XHTML because itâs valid XML, and can be parsed as such whether by browsers, XSLT, or magic.
That said, Iâd be curious to see what kind of render times we get from XML versus HTML parsers on larger documents these days. Or is that being pedantic now? >_>
-
On December 23, 2008, 7:32 CET, Fred Boulton said:
Just to throw this in, I am seeing more and more, as I look at source code of Web pages, the omission altogether of a DTD.
I wonder how much load time this would save?
I donât see that itâs possible to leave out anything that should be there (HMTL or XHTML) if developers validated their pages, and who would build a Web site without validating every page?
:)
-
On December 23, 2008, 10:32 CET, Rimantas said:
Comments once again prove that the biggest XHTML advocates know the least about both XHTML and HTML đ
Or maybe they do know about SHORTTAG, about XHTML not being supported by IE, about the fact, that unless you specify proper MIME type XHMTL is handled by the same HTML parser (and relies on the misimplementation of the forementioned SHORTTAG feature), etc. etc.
456bereastreet.com/âŠ
lachy.id.au/âŠIf someone is not capable of writing celan HTML s/he should not touch XHTML at all.
And the âpossibility of using XML toolsâ⊠what can I say.
@Ash: there was a case that browser could do incremental rendering for the HTML documents, but had to wait for the XHTML document to be loaded fully before rendering it. That with the proper MIME type of course, serving XHTML with text/html makes no sense whatsoever. Not sure if that is still the case though.
-
On December 23, 2008, 12:46 CET, Niels Matthijs said:
Just wondering, but speaking about performance only (as this seems to be the point of the article)
Isnât html more performance-heavy to render just because it is more lenient? xhtml is strict so in theory you could throw this draconian error-handling right out of the window.
Html is more or less a mess when it comes to closing tags, so it seems a little harder to process and to get things as intended by the author?
Just a question, really interested in the anwser.
-
On December 24, 2008, 21:28 CET, Amber said:
I tend to favor HTML. I donât really see the point of XHTML (well i do if i look) and the benefit. IE, which is the browser of choice âŠfor now, only renders sites with MIME. why spend the time when you can simply stick with HTML - less work - more universal - itâs not broken, why change?
i especially like Ianâs point about making mistakes and still getting by without a total revamp.
-
On December 29, 2008, 11:50 CET, Richard Morton said:
I am going to be the boy who spots the emperor in his new clothes and say that shouldnât we really be aiming for one standard rather than multiple standards.
It may well be oversimplification but why on earth canât we have one or the other and be done with it, otherwise the current divergence is only going to get worse.
I tend to think that the 5%-10% saving quoted is only likely to be signigicant in already bloated pages, for which it is more productive to focus on the bigger savings gained by optimising images etc.And while I am on my hobbyhorse about simplification, why canât we have a more unified coding environment for the web overall, so that there is much more in common betwen (X)HTML, CSS, JavaScript, PHP etc. I know they are doing different things but why do they need to be like foreign languages to each other.
-
On January 1, 2009, 8:23 CET, Ben Reimers said:
I would think that a 5-10% saving on load time and bandwidth is not (only) significant for large pages, but more importantly for popular websites. Itâs no coincidence that Google has done everything it can to reduce the size of their main page, including sticking with HTML (saving 1.25% of their file size just by omitting a doctype), removing line breaks and spaces, using inline scripts and styles as opposed to loading external files, and using the deprecated FONT tag for some styling instead of having a DIV assigned a class where the style is defined elsewhere in the page.
By doing all of that they have probably shaved 15% or more from the size of the page, which is very significant, especially for a website that gets as many hits as Google does.
To rule out performance optimisations in favour of âvalidityâ and âstructureâ is ignoring real world client needs. Every bit helps.
-
On January 31, 2009, 4:28 CET, dani said:
most xhtml docs serves as html anyway, so Iâll choose html
but Iâve my blog uses xhtml (valid xml too, I guess..but still useless, I knew it) for experimental and learning purpose
-
On February 16, 2009, 11:51 CET, Jens Oliver Meiert said:
[âŠ]this form of markup is not sloppy, but rather precisely defined in the HTML 4.01 DTD as allowed per SGML rules and as such cannot be considered âless strict.â -
On March 31, 2009, 1:46 CEST, cid said:
HTML wins because interoperability loses itâs meaning when we talk about a web publishing language and the most popular web browser doesnât support the correct MIME type.
Besides, I donât understand why paramâs name attribute is implied in XHTML 1.0 Strict while itâs required in other XHTML 1.0 and HTML 4.01 DTDs.
If itâs not an error then what is it?
-
On April 23, 2009, 16:13 CEST, Francesco said:
@Ben Reimers: Please look at this article. Google doesnât omit the doctype to reduze file size. They do it because they donât care.
-
On May 29, 2009, 7:48 CEST, John B said:
Short answer: Xhtml wins!
Why: Valid XHTML is more strict and less forgiving. This makes layout enginesâ code paths easier. With all of HTMLs omitted tags and âmessinessâ, todays layout engine have to work harder to understand just what the hell the page means. Simple is faster.
HTML is loose and forgiving, why? Because it was defined for humans to write it. XHTML on the other hand is more redundant and something more geared for a machine to produce. Everyday, fewer and fewer pages are written by hand. Eventually the code quality of the pages should increase which would lessen the need for layout engines that handle sloppy code. Eventually browser manufacturers can drop the HTML engines and just use the faster and simpler XHTML engines.
If youâre not sure about XHTML being easier to parse into the DOM tree, just read up on XML parser internals! -
On May 31, 2009, 17:18 CEST, Duri [SVK] said:
HTML wins!
I hate XHTML restrictions. In XHTML, I must use quotes around attribute values, close every li, tr, td etc., write disabled=âdisabledâ⊠HTML hasnât so strict rules.
XHTML is also slower than HTML. -
On May 6, 2010, 16:54 CEST, Mike Hopley said:
Omitting optional tags is a nice idea, but you have to be careful. For instance, consider the following valid HTML document (rendered in pseudo-code for the comments):
[!doctype html] [html lang="en"] [title]Implied DOM test (IE)[/title] [div] [form action=""] [p][input]Test [/form] [/div]
Here, omitting the optional [head] tag mangles the DOM in IE. Just try it.
Iâm astonished that Google is recommending such dangerous practices. Theory is all very well, but could we please stick to recommending things that actually work? And by âworkâ, I mean âwork robustlyâ, rather than âwork most of the time and then fail unpredictablyâ.
-
On May 6, 2010, 19:07 CEST, Jens Oliver Meiert said:
Mike, I think thereâs a limited number of edge cases (consider
<p><img>
and<p></p><img>
), one that seems manageable.Weâre talking about a feature of HTML that is almost 20 years old. It is robust.
Read More
Maybe of interest to you, too:
- Next: My Year in Cities, 2008
- Previous: 5 Tips To Deal With Right-to-Left Projects
- More under Development
- More from 2008
- Most popular posts
Looking for a way to comment? Comments have been disabled, unfortunately.
Get a good look at web development? Try WebGlossary.infoâand The Web Development Glossary 3K. With explanations and definitions for thousands of terms of web development, web design, and related fields, building on Wikipedia as well as MDN Web Docs. Available at Apple Books, Kobo, Google Play Books, and Leanpub.