Post from March 21, 2019 (↻ April 12, 2023), filed under Web Development (feed).
My concerns about requiring
lang to be set on the
html start tag had first been based on an insufficient differentiation between (and missing reconciliation of) text-processing language and language(s) of the intended audience. While not meant to be the same, in reality, they end up being used the same way. Under that premise, the argument made should be more understandable.
lang attribute is one of HTML’s global and with that one of the more popular attributes. If one doesn’t simply take it for granted, however, it also begs some questions—after all it’s not obvious how accessibility techniques and internationalization practices must, by necessity, require it:
- The language of paper documents is not usually marked.
- Therefore, the language of all electronic documents needs to be marked.
Of course this mock argument isn’t what’s happening, as we don’t label language merely because we can; there has been a strong motivation to provide this meta information for both users and machines.
And yet, one can also find a real argument questioning
- If a task can be done by software, then it isn’t necessary (because superfluous) to manually perform parts of that task.
- The detection of language in and of HTML documents is a task that can be done by software.
- Therefore, it isn’t necessary to manually perform parts of the task of detection of language in and of HTML documents, like marking (and being required to mark) the language in and of documents.
I’ve brought up the same argument in a different manner back in 2014, yet as said accessibility and internationalization standards still require to use the
lang attribute to indicate document language and changes thereof—that is, as nothing changed—it’s time to renew it.
The argument itself is valid (P → Q; P; ∴ Q). Is it cogent? Let’s look at the premises.
P.1 appears evident if we can take for granted that something already being done sufficiently in an automated fashion doesn’t need intervention or assistance, particularly not in a less efficient and reliable manual fashion.
P.2 is the more interesting piece, and it depends on data, data of which we actually (and also) need more of. Yet this need for more data doesn’t mean P.2 is lost, I argue, for there are but three things to consider when judging P.2.
One: As I have emphasized in my original post on the datedness of
@lang, there generally is value (efficiency) in moving the responsibility for language detection from humans to machines. (This suggests that the detection of language should be done by software.)
Two: We find flaws and we may always find flaws in humans marking up language. This lacks more data than the question how effective machines are at making out language and changes in language; yet if we find that humans are really bad (ignorant, lazy, imprecise, inefficient) at marking up language, the
@lang case may already be lost and we should not force anyone anymore to mark up languages. (This, too, suggests that the detection of language should be done by software.)
Three, and back to the premise: There are many indicators that software has become sufficiently good, really good, in automatically detecting languages. Consider Google Translate and its “Detect language” feature, upcoming (or long available) language detection in Google Assistant, the language processing features of Amazon Comprehend, or automatic language detection in software like HP IDOL. This is exactly what user agents and assistive technology can be and must be capable of, too. (And this point, then, suggests that the detection of language can indeed be done by software.)
To accurately assess the argument I presented, we still need more data. But I believe we long have enough to tell that marking up language shouldn’t be be done by, let alone be required from humans. At the end of the day, what has been asked of developers all this time is a grave violation of “Joe’s Law,” after Joe Clark:
“If a browser or adaptive technology can or should handle an accessibility issue, I won’t”—that is, we, as web developers, shouldn’t.
User agents and assistive technology can and should handle language detection, without manual preparation. Therefore: No
@lang requirement. Keep markup clean. Let software do the job.
Figure: When you want to say “no,” but can’t. (Copyright King Features Syndicate, Inc., distr. Bulls.)
I’m Jens, and I’m an engineering lead and author. I’ve worked as a technical lead for Google, I’m close to W3C and WHATWG, and I write and review books for O’Reilly. I love trying things, sometimes including philosophy, art, and adventure. Here on meiert.com I share some of my views and experiences.
If you have a question or suggestion about what I write, please leave a comment (if available) or a message. Thank you!
Maybe this is interesting to you, too:
- Next: Print Styling, the 3 Basics
- Previous: Highlights from “Free Thought and Official Propaganda” (Bertrand Russell)
- More under Web Development, or from 2019
- Most popular posts
Looking for a way to comment? Comments have been disabled, unfortunately.
Get a good look at web development? Try The Web Development Glossary (2020). With explanations and definitions for literally thousands of terms from Web Development and related fields, building on Wikipedia as well as the MDN Web Docs. Available at Apple Books, Kobo, Google Play Books, and Leanpub.