HTML vs. XHTML: Why HTML Wins

Published on December 19, 2008 (↻ October 18, 2024), filed under (RSS feed for all categories).

This and many other posts are also available as a pretty, well-behaved ebook: On Web Development.

Document types are cool, and there are plenty of them. There are plenty, countless discussions about the “right” document type, too. Alas, these discussions may deal with irrelevant details or miss the point. A decisive factor is performance. And that suggests the choice: Use HTML 4 or, at least “formally,” HTML 5.

How does performance or load time, which we can deem critical, influence a document type decision? In comparison with valid HTML omitting optional tags, file size of XHTML documents is about 5% (large documents) to 10% (small documents) bigger, adding a realistic file size overhead of at least 400 bytes. I did several tests in that regard which I intend to publish at some point, and the numbers are pretty reliable and easy to reproduce.

That is, no matter what flavor you typically prefer—keep in mind that the popular document types basically feature the same elements—, use of HTML will mean the most benefits. Not only will you be able to save some bytes, you’ll also be free from MIME type questions and such.

Regarding XHTML, all the classical arguments for XHTML seem to be related to its interoperability potential as well as the semantic ideal XHTML 2.0 tries to achieve. However, these arguments are as of yet hypothetical and hence not practically relevant.

This site itself still uses supposed XHTML due to historical as well as consistency reasons (the English part of this site uses WordPress which suggested XHTML, too). Even though I’ve doubted whether cost of problem outweighs cost of solution, this site’s main document type is subject to change, to switch to HTML 5 just like most of my personal projects have.

Was this useful or interesting? Share (toot) this post, and support my work by learning with my ebooks!

About Me

Jens Oliver Meiert, on November 9, 2024.

I’m Jens (long: Jens Oliver Meiert), and I’m a web developer, manager, and author. I’ve worked as a technical lead and engineering manager for a few companies, I’m a contributor to several web standards, and I write and review books for O’Reilly and Frontend Dogma.

I love trying things, not only in web development and engineering management, but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (I value you being critical, interpreting charitably, and giving feedback.)

Comments (Closed)

  1. On December 19, 2008, 21:28 CET, molily said:

    I don’t just get the point of this article. I’d consider it as incredibly shortsighted to omit tags whereever it’s possible only to achieve a smaller file size. I’d recommend everyone not to omit any of the mentioned end tags. At SELFHTML, we suggest to write HTML 4 like XHTML requires it: Don’t omit optional start and end tags, don’t use the dubious SGML features. This is about code readability, mantainability and consistency - and this affects beginners and professionals. But we already had this discussion when you posted the list of optional tags.

    these discussions often deal with irrelevant details or miss the point entirely. The only decisive factor seems to be performance.

    I think this supposed “performance” benefit is actually the least decisive factor. In my opinion, you don’t do justice to those discussions.

  2. On December 19, 2008, 21:37 CET, Jens Oliver Meiert said:

    Mathias,

    This is about code readability, mantainability and consistency

    yet how is

    <!DOCTYPE html>
    <title></title>

    less readable, maintainable, and consistent than

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title></title>
      </head>
      <body></body>
    </html>

    
? (Featuring a minimal, valid XHTML 1 sample and a minimal, valid HTML 5 sample.)

    I think this supposed “performance” benefit is actually the least decisive factor.

    This seems to depend on the goals. When you make speed a top priority—personally, I see few reasons not to—, HTML can give you even more options to reduce file size.

    I agree, however, that a novice might not necessarily want to start this way. He should probably fully understand (X)HTML first.

  3. On December 19, 2008, 21:58 CET, Ian Hickson said:

    XHTML and HTML both have their uses. Personally I prefer HTML for most uses because it won’t hurt the user if I make a mistake and don’t notice.

    If file size is a concern, then HTML gives more opportunities for saving space, but if you’re using gzip even that’s not really a big deal.

  4. On December 19, 2008, 22:11 CET, Jens Oliver Meiert said:

    Thank you, Ian! As for gzip compression it’s true that the difference is not that big, but still there is a difference, and that is what I like to underscore here.

  5. On December 19, 2008, 23:44 CET, rohnn said:

    Having 200-300kb of js libs and images, you want to produce some low quality code that’ll be hard to read and maintain, just to win like a couple of bytes

    Sorry I don’t see the point of this approach.

  6. On December 19, 2008, 23:58 CET, Neal G said:

    I completely agree with you. I’ve sides with HTML 4.01 Strict for awhile now. I see that your own site uses xhtml though.

  7. On December 20, 2008, 6:06 CET, Jin said:

    The most valid argument against XHTML is that IE doesn’t support true XML.

    The End.  đŸ˜Š

  8. On December 20, 2008, 6:46 CET, Duluoz said:

    You could also make the case that its quicker to whip up some HTML vs. XHTML as well. View performance in terms of labor.

  9. On December 20, 2008, 10:44 CET, wow said:

    Come one, a website is not a video stream. The few extra bytes wont affect your site!

  10. On December 20, 2008, 11:21 CET, Jens Oliver Meiert said:

    Rohnn, well, what’s your point? Why is HTML “low quality”? Why is valid HTML omitting optional tags “low quality”? (And could there be another problem if you’re using 200–300 KB of scripts?)

    Neal, fair enough. As for the latter, please see my note on pseudo-XHTML on this site.

    As a general note, the post may come across a bit strong. Think more about the performance potential—if performance is a priority, HTML does give more options than XHTML, which in turn cannot even really be used (MIME type). And of course, goals like keeping things simple, doing the HTML right, and focusing on load time everywhere are still valid.

  11. On December 20, 2008, 12:35 CET, Gunnar Bittersmann said:

    Jens, if you know what you’re doing (and I’m sure you do), go ahead. But this advice should not be given to others, IMHO.

    If you’re going to save some bytes, why don’t you throw out all indents and line breaks? That would save a lot more bytes than some omitted tags. Would that make the code less maintainable? You bed. But so does omitting tags.

    Your comparision of HTML and XHTML sources of an empty webpage is not fair since it does not reflect a real webpage. I had made a real world example about the problems with omitted tags.

    As rohnn has pointed out, bytes can be saved elsewhere: optimize images (i.e. use the adequate graphics format and adequate compression rate, remove meta information).

    Just don’t make the HTML code vulnerable to errors; never omit optional tags.

    Once we’re there, we can write XHTML instead of HTML. @Jin: I don’t see the benefit of XHTML in serving as application/xhtml+xml to the client , but on the author’s side: stricter syntax rules, possibility of usage of XML tools.

    I wholeheartly agree with molily: XHTML wins over HTML.

  12. On December 20, 2008, 14:15 CET, rohnn said:

    Jens,

    By “low quality” I mean that the code will most probably be very hard to maintain and/or update/upgrade.
    I didn’t think in term of end user (for once.:) ) but rather in terms of producer.
    Having a quite big page from a quite big site
 I think people are really rapidly going be lost in the page tree/structure.

    As for js libs & image size, all I am saying is that you prefer HTML vs xHTML because it saves a few bytes when pages do actually load hundreds of KB in libs & images and I believe this is not a fair argument.

  13. On December 20, 2008, 14:55 CET, Cisco said:

    Hi,
    i’m not agree with you.
    A few bytes are note the cause a web designer choose HTML instead XHTML.

    I think the future of web-code is semantic and HTML can’t offer any instruments fot this way.
    XML, tag, and semantic use of xml: thats is important, in my opinion. Not a few bytes
Because when we (web designers) regularly use javascript library havier than a html page, your reasons
.

  14. On December 20, 2008, 18:39 CET, Matthew said:

    I think this is absurd. It is not professional. It goes against what most of the community has been fighting for. It’s a step backward, not forward. Not to mention the possible SEO ramifications this mentality of ‘it’s OK if I miss a couple of tags now’ would cause, as at least one commenter has suggested.

    Sorry if I’m coming off strong, this is just insane to me.

  15. On December 20, 2008, 19:33 CET, Jens Oliver Meiert said:

    Gentlemen! This post is only about (X)HTML, and just diagnoses that—assuming valid output—HTML offers most potential when it comes to file size. This seems to be a statement of fact. (Maybe I should have just used this paragraph for the post.)

    The post does not suggest that everyone please goes ahead and switches to HTML and starts omitting tags. It does not deal with scripts or images or other things you should of course pay attention to as well when it comes to performance, not at all. And it does not target education or outreach either.

    What I find most interesting in the discussion so far is that leaving things out is considered to make maintenance harder (sounds like I should get more furniture to make my next relocation run smoother
), and that there is some, well, “unconcern” regarding performance (I recently dropped a script that was 20 KB big because it would have increased load time by 50%, so don’t ask for my thoughts on 200 KB scripts).

  16. On December 21, 2008, 4:29 CET, olivier said:

    You seem to be basing your whole argument on the axiom that “The only decisive factor seems to be performance”. Unfortunately that’s not the case for everyone: decisions factor vary widely depending on one’s context.

    The only relevant factor for your decision is performance, and the conclusions you draw from that are genuinely interesting.

    I think your study/article would be more valuable to the community and less controversial if you wrote it “IF your main decision factor is performance THEN you want HTML because [
]”.

    Indeed a table giving the pros and cons of various formats (which I guess could include not only XHTML, HTML5, HTML4 but also the likes of flash, SVG etc.) against a number of constraints and priorities would be very nice. Has anybody ever worked on such a thing?

  17. On December 21, 2008, 17:13 CET, Joe Enos said:

    I wholeheartedly disagree - XHTML gives structure to an otherwise unstructured format. If you are really that concerned about bandwidth, try removing all the unnecessary whitespace from your HTML - each line break, tab, and space adds extra bytes to the stream, so if you don’t care about formatting the markup, you should start there. It’s just as crazy as having unstructured markup.

  18. On December 21, 2008, 22:35 CET, Joe Clark said:

    I think the bigger advantage is not forcing neophytes to type every single thing in lower case, which everyone screws up from time to time.

  19. On December 22, 2008, 3:35 CET, Jeremy Weiskotten said:

    -1 Premature optimization.

  20. On December 22, 2008, 7:06 CET, Website Design said:

    True that most of the people prefer HTML but XHTML also has users.

  21. On December 22, 2008, 16:30 CET, Joe Enos said:

    The structure I’m referring to is just the standard XML rules, which are not difficult at all to follow: Ensuring all opening tags have a closing tag (or a slash at the end), and making sure attributes have values. If you follow XHTML, anyone using an XML parser can read your markup and actually find what they’re looking for - there’s no chance of a BR or IMG without a closing slash, or attributes without real values. In addition, in today’s world of nested div inside of nested div inside of nested div, you can easily screw up the markup on a page if you’re not careful. If you validate your markup against XML rules, you’ll easily determine if your markup is really valid before publishing.

    When I say regular HTML is unstructured, I mean that it is very close to following a set of rules, but there are a few “exceptions”, which throw a wrench in the whole deal. XML doesn’t have those exceptions - every tag closes, every attribute has a value. Every XML document, with no exception, follows these rules. In my opinion, rules are very important when dealing with technology, especially with a language that’s interpreted by dozens of different client apps (browsers). You know how IE has generally been more “forgiving” with bad markup than other browsers? That’s mostly due to the fact that IE accepts more bad XML (tags not closing, etc.) than other browsers. This gives developers and designers an excuse to be lazy by not using valid XHTML (or even valid HTML), and creates conformance issues when working with other browsers and OS’s. With so many browsers out there each doing different things, the more standardized your code, the better chance your stuff will function on all browsers.

    But if you really want to save a few bytes here and there by not closing out your BR tags, more power to you.

  22. On December 22, 2008, 18:01 CET, Duluoz said:

    What I am seeing from these comments are those who disagree that one should spend time exploring the optimization options of HTML vs. XHTML are those who give excuse for other forms of invalidity or failure to do due diligence to the craft. The point of the article is quite clear and to the point - given you’ve done your job, at the end of the day, which would give you optimal performance? What am I missing?

  23. On December 23, 2008, 7:11 CET, Ash said:

    I do hope you don’t close comments, it’s got to be more constructive allowing discussion. đŸ˜Š

    I’m personally a fan of XHTML because it’s valid XML, and can be parsed as such whether by browsers, XSLT, or magic.

    That said, I’d be curious to see what kind of render times we get from XML versus HTML parsers on larger documents these days. Or is that being pedantic now? >_>

  24. On December 23, 2008, 7:32 CET, Fred Boulton said:

    Just to throw this in, I am seeing more and more, as I look at source code of Web pages, the omission altogether of a DTD.

    I wonder how much load time this would save?

    I don’t see that it’s possible to leave out anything that should be there (HMTL or XHTML) if developers validated their pages, and who would build a Web site without validating every page?

    :)

  25. On December 23, 2008, 10:32 CET, Rimantas said:

    Comments once again prove that the biggest XHTML advocates know the least about both XHTML and HTML đŸ˜”

    Or maybe they do know about SHORTTAG, about XHTML not being supported by IE, about the fact, that unless you specify proper MIME type XHMTL is handled by the same HTML parser (and relies on the misimplementation of the forementioned SHORTTAG feature), etc. etc.

    456bereastreet.com/

    lachy.id.au/


    If someone is not capable of writing celan HTML s/he should not touch XHTML at all.

    And the “possibility of using XML tools”
 what can I say.

    @Ash: there was a case that browser could do incremental rendering for the HTML documents, but had to wait for the XHTML document to be loaded fully before rendering it. That with the proper MIME type of course, serving XHTML with text/html makes no sense whatsoever. Not sure if that is still the case though.

  26. On December 23, 2008, 12:46 CET, Niels Matthijs said:

    Just wondering, but speaking about performance only (as this seems to be the point of the article)

    Isn’t html more performance-heavy to render just because it is more lenient? xhtml is strict so in theory you could throw this draconian error-handling right out of the window.

    Html is more or less a mess when it comes to closing tags, so it seems a little harder to process and to get things as intended by the author?

    Just a question, really interested in the anwser.

  27. On December 24, 2008, 21:28 CET, Amber said:

    I tend to favor HTML. I don’t really see the point of XHTML (well i do if i look) and the benefit. IE, which is the browser of choice 
for now, only renders sites with MIME. why spend the time when you can simply stick with HTML - less work - more universal - it’s not broken, why change?

    i especially like Ian’s point about making mistakes and still getting by without a total revamp.

  28. On December 29, 2008, 11:50 CET, Richard Morton said:

    I am going to be the boy who spots the emperor in his new clothes and say that shouldn’t we really be aiming for one standard rather than multiple standards.
    It may well be oversimplification but why on earth can’t we have one or the other and be done with it, otherwise the current divergence is only going to get worse.
    I tend to think that the 5%-10% saving quoted is only likely to be signigicant in already bloated pages, for which it is more productive to focus on the bigger savings gained by optimising images etc.

    And while I am on my hobbyhorse about simplification, why can’t we have a more unified coding environment for the web overall, so that there is much more in common betwen (X)HTML, CSS, JavaScript, PHP etc. I know they are doing different things but why do they need to be like foreign languages to each other.

  29. On January 1, 2009, 8:23 CET, Ben Reimers said:

    I would think that a 5-10% saving on load time and bandwidth is not (only) significant for large pages, but more importantly for popular websites. It’s no coincidence that Google has done everything it can to reduce the size of their main page, including sticking with HTML (saving 1.25% of their file size just by omitting a doctype), removing line breaks and spaces, using inline scripts and styles as opposed to loading external files, and using the deprecated FONT tag for some styling instead of having a DIV assigned a class where the style is defined elsewhere in the page.

    By doing all of that they have probably shaved 15% or more from the size of the page, which is very significant, especially for a website that gets as many hits as Google does.

    To rule out performance optimisations in favour of “validity” and “structure” is ignoring real world client needs. Every bit helps.

  30. On January 31, 2009, 4:28 CET, dani said:

    most xhtml docs serves as html anyway, so I’ll choose html

    but I’ve my blog uses xhtml (valid xml too, I guess..but still useless, I knew it) for experimental and learning purpose

  31. On February 16, 2009, 11:51 CET, Jens Oliver Meiert said:

    [
] this form of markup is not sloppy, but rather precisely defined in the HTML 4.01 DTD as allowed per SGML rules and as such cannot be considered “less strict.”

  32. On March 31, 2009, 1:46 CEST, cid said:

    HTML wins because interoperability loses it’s meaning when we talk about a web publishing language and the most popular web browser doesn’t support the correct MIME type.

    Besides, I don’t understand why param’s name attribute is implied in XHTML 1.0 Strict while it’s required in other XHTML 1.0 and HTML 4.01 DTDs.

    If it’s not an error then what is it?

  33. On April 23, 2009, 16:13 CEST, Francesco said:

    @Ben Reimers: Please look at this article. Google doesn’t omit the doctype to reduze file size. They do it because they don’t care.

  34. On May 29, 2009, 7:48 CEST, John B said:

    Short answer: Xhtml wins!
    Why: Valid XHTML is more strict and less forgiving. This makes layout engines’ code paths easier. With all of HTMLs omitted tags and “messiness”, todays layout engine have to work harder to understand just what the hell the page means. Simple is faster.
    HTML is loose and forgiving, why? Because it was defined for humans to write it. XHTML on the other hand is more redundant and something more geared for a machine to produce. Everyday, fewer and fewer pages are written by hand. Eventually the code quality of the pages should increase which would lessen the need for layout engines that handle sloppy code. Eventually browser manufacturers can drop the HTML engines and just use the faster and simpler XHTML engines.
    If you’re not sure about XHTML being easier to parse into the DOM tree, just read up on XML parser internals!

  35. On May 31, 2009, 17:18 CEST, Duri [SVK] said:

    HTML wins!
    I hate XHTML restrictions. In XHTML, I must use quotes around attribute values, close every li, tr, td etc., write disabled=”disabled”
 HTML hasn’t so strict rules.
    XHTML is also slower than HTML.

  36. On May 6, 2010, 16:54 CEST, Mike Hopley said:

    Omitting optional tags is a nice idea, but you have to be careful. For instance, consider the following valid HTML document (rendered in pseudo-code for the comments):

    [!doctype html]
    [html lang="en"]
    [title]Implied DOM test (IE)[/title]
    [div]
        [form action=""]
            [p][input]Test
        [/form]
    [/div]

    Here, omitting the optional [head] tag mangles the DOM in IE. Just try it.

    I’m astonished that Google is recommending such dangerous practices. Theory is all very well, but could we please stick to recommending things that actually work? And by “work”, I mean “work robustly”, rather than “work most of the time and then fail unpredictably”.

  37. On May 6, 2010, 19:07 CEST, Jens Oliver Meiert said:

    Mike, I think there’s a limited number of edge cases (consider <p><img> and <p></p><img>), one that seems manageable.

    We’re talking about a feature of HTML that is almost 20 years old. It is robust.