Post from March 21, 2019 (↻ September 1, 2021), filed under Web Development.
lang attribute is one of HTML’s global and with that one of the more popular attributes. If one doesn’t simply take it for granted, however, it also begs some questions—after all it’s not obvious how accessibility techniques and internationalization practices must, by necessity, require it:
- The language of paper documents is not usually marked.
- Therefore, the language of all electronic documents needs to be marked.
Of course this mock argument isn’t what’s happening, as we don’t label language merely because we can; there has been a strong motivation to provide this meta information for both users and machines.
And yet, one can also find a real argument questioning
- If a task can be done by software, then it isn’t necessary (because superfluous) to manually perform parts of that task.
- The detection of language in and of HTML documents is a task that can be done by software.
- Therefore, it isn’t necessary to manually perform parts of the task of detection of language in and of HTML documents, like marking (and being required to mark) the language in and of documents.
I’ve brought up the same argument in a different manner back in 2014, yet as said accessibility and internationalization standards still require to use the
lang attribute to indicate document language and changes thereof—that is, as nothing changed—it’s time to renew it.
The argument itself is valid (P → Q; P; ∴ Q). Is it cogent? Let’s look at the premises.
P.1 appears evident if we can take for granted that something already being done sufficiently in an automated fashion doesn’t need intervention or assistance, particularly not in a less efficient and reliable manual fashion.
P.2 is the more interesting piece, and it depends on data, data of which we actually (and also) need more of. Yet this need for more data doesn’t mean P.2 is lost, I argue, for there are but three things to consider when judging P.2.
One: As I have emphasized in my original post on the datedness of
@lang, there generally is value (efficiency) in moving the responsibility for language detection from humans to machines. (This suggests that the detection of language should be done by software.)
Two: We find flaws and we may always find flaws in humans marking up language. This lacks more data than the question how effective machines are at making out language and changes in language; yet if we find that humans are really bad (ignorant, lazy, imprecise, inefficient) at marking up language, the
@lang case may already be lost and we should not force anyone anymore to mark up languages. (This, too, suggests that the detection of language should be done by software.)
Three, and back to the premise: There are many indicators that software has become sufficiently good, really good, in automatically detecting languages. Consider Google Translate and its “Detect language” feature, upcoming (or long available) language detection in Google Assistant, the language processing features of Amazon Comprehend, or automatic language detection in software like HP IDOL. This is exactly what user agents and assistive technology can be and must be capable of, too. (And this point, then, suggests that the detection of language can indeed be done by software.)
To accurately assess the argument I presented, we still need more data. But I believe we long have enough to tell that marking up language shouldn’t be be done by, let alone be required from humans. At the end of the day, what has been asked of developers all this time is a grave violation of “Joe’s Law,” after Joe Clark:
“If a browser or adaptive technology can or should handle an accessibility issue, I won’t”—that is, we, as web developers, shouldn’t.
User agents and assistive technology can and should handle language detection, without manual preparation. Therefore: No
@lang requirement. Keep markup clean. Let software do the job.
Figure: When you want to say “no,” but can’t. (Copyright King Features Syndicate, Inc., distr. Bulls.)
I’m Jens Oliver Meiert, and I’m an engineering manager and author. I’ve worked as a technical lead for Google, I’m close to the W3C and the WHATWG, and I write and review books for O’Reilly. Other than that, I love trying things, sometimes including philosophy, art, and adventure. Here on meiert.com I share some of my views and experiences.
If you have questions or suggestions about what I write, please leave a comment (if available) or a message.
Have a look at the most popular posts, possibly including:
- Highlights from “Free Thought and Official Propaganda” (Bertrand Russell)
- Print Styling, the 3 Basics
Looking for a way to comment? Comments have been disabled, unfortunately.
Perhaps my most comprehensive book: The Web Development Glossary (2020). With explanations and definitions for literally thousands of terms from Web Development and related fields, building on Wikipedia as well as the MDN Web Docs. Available at Apple Books, Kobo, Google Play Books, and Leanpub.