Optional â@langâ
Published on Mar 21, 2019 (updated Nov 27, 2023), filed under development (feed). (Share this on Mastodon or Bluesky?)
My concerns about requiring lang
to be set on the html
start tag had first been based on an insufficient differentiation between (and missing reconciliation of) text-processing language and language(s) of the intended audience. While not meant to be the same, in reality, they end up being used the same way. Under that premise, the argument made should be more understandable.
The lang
attribute is one of HTMLâs global and with that one of the more popular attributes. If one doesnât simply take it for granted, however, it also begs some questionsâafter all itâs not obvious how accessibility techniques and internationalization practices must, by necessity, require it:
- P
- The language of paper documents is not usually marked.
- C
- Therefore, the language of all electronic documents needs to be marked.
Of course this mock argument isnât whatâs happening, as we donât label language merely because we can; there has been a strong motivation to provide this meta information for both users and machines.
And yet, one can also find a real argument questioning @lang
.
- P.1
- If a task can be done by software, then it isnât necessary (because superfluous) to manually perform parts of that task.
- P.2
- The detection of language in and of HTML documents is a task that can be done by software.
- C
- Therefore, it isnât necessary to manually perform parts of the task of detection of language in and of HTML documents, like marking (and being required to mark) the language in and of documents.
Iâve brought up the same argument in a different manner back in 2014, yet as said accessibility and internationalization standards still require to use the lang
attribute to indicate document language and changes thereofâthat is, as nothing changedâitâs time to renew it.
The argument itself is valid (P â Q; P; â´ Q). Is it cogent? Letâs look at the premises.
P.1 appears evident if we can take for granted that something already being done sufficiently in an automated fashion doesnât need intervention or assistance, particularly not in a less efficient and reliable manual fashion.
P.2 is the more interesting piece, and it depends on data, data of which we actually (and also) need more of. Yet this need for more data doesnât mean P.2 is lost, I argue, for there are but three things to consider when judging P.2.
One: As I have emphasized in my original post on the datedness of @lang
, there generally is value (efficiency) in moving the responsibility for language detection from humans to machines. (This suggests that the detection of language should be done by software.)
Two: We find flaws and we may always find flaws in humans marking up language. This lacks more data than the question how effective machines are at making out language and changes in language; yet if we find that humans are really bad (ignorant, lazy, imprecise, inefficient) at marking up language, the @lang
case may already be lost and we should not force anyone anymore to mark up languages. (This, too, suggests that the detection of language should be done by software.)
Three, and back to the premise: There are many indicators that software has become sufficiently good, really good, in automatically detecting languages. Consider Google Translate and its âDetect languageâ feature, upcoming (or long available) language detection in Google Assistant, the language processing features of Amazon Comprehend, or automatic language detection in software like HP IDOL. This is exactly what user agents and assistive technology can be and must be capable of, too. (And this point, then, suggests that the detection of language can indeed be done by software.)
To accurately assess the argument I presented, we still need more data. But I believe we long have enough to tell that marking up language shouldnât be be done by, let alone be required from humans. At the end of the day, what has been asked of developers all this time is a grave violation of âJoeâs Law,â after Joe Clark:
âIf a browser or adaptive technology can or should handle an accessibility [or internationalization, my edit] issue, I wonâtââthat is, we, as web developers, shouldnât.
User agents and assistive technology can and should handle language detection, without manual preparation. Therefore: No @lang
requirement. Keep markup clean. Let software do the job.
About Me
Iâm Jens (long: Jens Oliver Meiert), and Iâm a web developer, manager, and author. Iâve worked as a technical lead and engineering manager for small and large enterprises, Iâm an occasional contributor to web standards (like HTML, CSS, WCAG), and I write and review books for OâReilly and Frontend Dogma.
I love trying things, not only in web development and engineering management, but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (I value you being critical, interpreting charitably, and giving feedback.)