Optional â@langâ
Published on March 21, 2019 (⻠November 27, 2023), filed under Development (RSS feed for all categories).
My concerns about requiring lang
to be set on the html
start tag had first been based on an insufficient differentiation between (and missing reconciliation of) text-processing language and language(s) of the intended audience. While not meant to be the same, in reality, they end up being used the same way. Under that premise, the argument made should be more understandable.
The lang
attribute is one of HTMLâs global and with that one of the more popular attributes. If one doesnât simply take it for granted, however, it also begs some questionsâafter all itâs not obvious how accessibility techniques and internationalization practices must, by necessity, require it:
- P
- The language of paper documents is not usually marked.
- C
- Therefore, the language of all electronic documents needs to be marked.
Of course this mock argument isnât whatâs happening, as we donât label language merely because we can; there has been a strong motivation to provide this meta information for both users and machines.
And yet, one can also find a real argument questioning @lang
.
- P.1
- If a task can be done by software, then it isnât necessary (because superfluous) to manually perform parts of that task.
- P.2
- The detection of language in and of HTML documents is a task that can be done by software.
- C
- Therefore, it isnât necessary to manually perform parts of the task of detection of language in and of HTML documents, like marking (and being required to mark) the language in and of documents.
Iâve brought up the same argument in a different manner back in 2014, yet as said accessibility and internationalization standards still require to use the lang
attribute to indicate document language and changes thereofâthat is, as nothing changedâitâs time to renew it.
The argument itself is valid (P â Q; P; ⎠Q). Is it cogent? Letâs look at the premises.
P.1 appears evident if we can take for granted that something already being done sufficiently in an automated fashion doesnât need intervention or assistance, particularly not in a less efficient and reliable manual fashion.
P.2 is the more interesting piece, and it depends on data, data of which we actually (and also) need more of. Yet this need for more data doesnât mean P.2 is lost, I argue, for there are but three things to consider when judging P.2.
One: As I have emphasized in my original post on the datedness of @lang
, there generally is value (efficiency) in moving the responsibility for language detection from humans to machines. (This suggests that the detection of language should be done by software.)
Two: We find flaws and we may always find flaws in humans marking up language. This lacks more data than the question how effective machines are at making out language and changes in language; yet if we find that humans are really bad (ignorant, lazy, imprecise, inefficient) at marking up language, the @lang
case may already be lost and we should not force anyone anymore to mark up languages. (This, too, suggests that the detection of language should be done by software.)
Three, and back to the premise: There are many indicators that software has become sufficiently good, really good, in automatically detecting languages. Consider Google Translate and its âDetect languageâ feature, upcoming (or long available) language detection in Google Assistant, the language processing features of Amazon Comprehend, or automatic language detection in software like HP IDOL. This is exactly what user agents and assistive technology can be and must be capable of, too. (And this point, then, suggests that the detection of language can indeed be done by software.)
To accurately assess the argument I presented, we still need more data. But I believe we long have enough to tell that marking up language shouldnât be be done by, let alone be required from humans. At the end of the day, what has been asked of developers all this time is a grave violation of âJoeâs Law,â after Joe Clark:
âIf a browser or adaptive technology can or should handle an accessibility [or internationalization, my edit] issue, I wonâtââthat is, we, as web developers, shouldnât.
User agents and assistive technology can and should handle language detection, without manual preparation. Therefore: No @lang
requirement. Keep markup clean. Let software do the job.
Figure: When you want to say âno,â but canât. (Copyright King Features Syndicate, Inc., distr. Bulls.)
About Me
Iâm Jens (long: Jens Oliver Meiert), and Iâm a frontend engineering leader and tech author/publisher. Iâve worked as a technical lead for companies like Google and as an engineering manager for companies like Miro, Iâm a contributor to several web standards, and I write and review books for OâReilly and Frontend Dogma.
I love trying things, not only in web development (and engineering management), but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (Please be critical, interpret charitably, and give feedback.)
Read More
Maybe of interest to you, too:
- Next: Print Styling, the 3 Basics
- Previous: Highlights From âFree Thought and Official Propagandaâ (Bertrand Russell)
- More under Development
- More from 2019
- Most popular posts
Looking for a way to comment? Comments have been disabled, unfortunately.
Get a good look at web development? Try WebGlossary.infoâand The Web Development Glossary 3K. With explanations and definitions for thousands of terms of web development, web design, and related fields, building on Wikipedia as well as MDN Web Docs. Available at Apple Books, Kobo, Google Play Books, and Leanpub.