Optional “@lang”

Published on March 21, 2019 (↻ November 27, 2023), filed under (RSS feed for all categories).

My concerns about requiring lang to be set on the html start tag had first been based on an insufficient differentiation between (and missing reconciliation of) text-processing language and language(s) of the intended audience. While not meant to be the same, in reality, they end up being used the same way. Under that premise, the argument made should be more understandable.

The lang attribute is one of HTML’s global and with that one of the more popular attributes. If one doesn’t simply take it for granted, however, it also begs some questions—after all it’s not obvious how accessibility techniques and internationalization practices must, by necessity, require it:

The language of paper documents is not usually marked.
Therefore, the language of all electronic documents needs to be marked.

Of course this mock argument isn’t what’s happening, as we don’t label language merely because we can; there has been a strong motivation to provide this meta information for both users and machines.

And yet, one can also find a real argument questioning @lang.

If a task can be done by software, then it isn’t necessary (because superfluous) to manually perform parts of that task.
The detection of language in and of HTML documents is a task that can be done by software.
Therefore, it isn’t necessary to manually perform parts of the task of detection of language in and of HTML documents, like marking (and being required to mark) the language in and of documents.

I’ve brought up the same argument in a different manner back in 2014, yet as said accessibility and internationalization standards still require to use the lang attribute to indicate document language and changes thereof—that is, as nothing changed—it’s time to renew it.

The argument itself is valid (P → Q; P; ∴ Q). Is it cogent? Let’s look at the premises.

P.1 appears evident if we can take for granted that something already being done sufficiently in an automated fashion doesn’t need intervention or assistance, particularly not in a less efficient and reliable manual fashion.

P.2 is the more interesting piece, and it depends on data, data of which we actually (and also) need more of. Yet this need for more data doesn’t mean P.2 is lost, I argue, for there are but three things to consider when judging P.2.

One: As I have emphasized in my original post on the datedness of @lang, there generally is value (efficiency) in moving the responsibility for language detection from humans to machines. (This suggests that the detection of language should be done by software.)

Two: We find flaws and we may always find flaws in humans marking up language. This lacks more data than the question how effective machines are at making out language and changes in language; yet if we find that humans are really bad (ignorant, lazy, imprecise, inefficient) at marking up language, the @lang case may already be lost and we should not force anyone anymore to mark up languages. (This, too, suggests that the detection of language should be done by software.)

Three, and back to the premise: There are many indicators that software has become sufficiently good, really good, in automatically detecting languages. Consider Google Translate and its “Detect language” feature, upcoming (or long available) language detection in Google Assistant, the language processing features of Amazon Comprehend, or automatic language detection in software like HP IDOL. This is exactly what user agents and assistive technology can be and must be capable of, too. (And this point, then, suggests that the detection of language can indeed be done by software.)

To accurately assess the argument I presented, we still need more data. But I believe we long have enough to tell that marking up language shouldn’t be be done by, let alone be required from humans. At the end of the day, what has been asked of developers all this time is a grave violation of “Joe’s Law,” after Joe Clark:

“If a browser or adaptive technology can or should handle an accessibility [or internationalization, my edit] issue, I won’t”—that is, we, as web developers, shouldn’t.

User agents and assistive technology can and should handle language detection, without manual preparation. Therefore: No @lang requirement. Keep markup clean. Let software do the job.

Val arms lightly and at a postern gate bids his wife their first farewell. Then he walks swiftly away, for her eyes show things a brave smile cannot hide.

Figure: When you want to say “no,” but can’t. (Copyright King Features Syndicate, Inc., distr. Bulls.)

Was this useful or interesting? Share (toot) this post, or maybe treat me to a coffee. Thanks!

About Me

Jens Oliver Meiert, on September 30, 2021.

I’m Jens (long: Jens Oliver Meiert), and I’m a frontend engineering leader and tech author/publisher. I’ve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, I’m close to W3C and WHATWG, and I write and review books for O’Reilly and Frontend Dogma.

I love trying things, not only in web development (and engineering management), but also in other areas like philosophy. Here on meiert.com I share some of my views and experiences.

If you want to do me a favor, interpret charitably (I speak three languages, and they can collide), yet be critical and give feedback for me to learn and improve. Thank you!