Comparing Page Language Declaration Setups in Screen Readers

Post from September 28, 2021 (↻ January 26, 2022), filed under (feed).

One best practice in web development is to declare the document language via the lang attribute, on the html start tag. That is useful, as it aims to ensure that user agents can present, including read, each document correctly. It’s also controversial, because using the Content-Language HTTP header is more efficient and because language detection software has become more and more effective—perhaps more effective than authors and editors are in marking up language.

To survey our situation with respect to language declaration, I set up a test page, in German so as not to be supported accidentally by all the screen readers developed in English-speaking countries. That test page knows five conditions: no language declared; language declared correctly through the lang attribute; language declared correctly through the Content-Language HTTP header; language declared correctly through both; and language declared incorrectly through both, with conflicting values.

This page I tested in four of the most popular screen readers; I ran the VoiceOver test on my own machine, and the NVDA, JAWS, and Narrator tests with Assistiv Labs. Big thanks to Assistiv Labs here, not only for their generally great product, but also their kind support after I struggled finding help with JAWS testing.

Here are the findings:

Test NVDA JAWS VoiceOver Narrator
No Language Declared English ❌ * English ❌ * English ❌ * English ❌ *
lang Attribute German âś… German âś… German âś… German âś…
Content-Language HTTP Header German âś… German âś… German âś… German âś…
lang and HTTP Header German âś… German âś… German âś… German âś…
Conflicting lang and HTTP Header Values English? * Russian (following lang) Russian (following lang) Russian (following lang), English *

What does this mean?

First, I’m still cautious around the findings as I don’t regularly test with screen readers, and as all the software was recent. Maybe I missed something an experienced accessibility tester would know, and perhaps older tooling would produce different results.

But then, what do you think the results mean? An HTML minimalist, I’ve already been vocal sharing my take on the topic. Here I simply like to provide a few data points to validate. The topic of whether and how to declare page language is going to stay with us for longer, so we’ll probably see it covered again.

Many thanks to Thomas Steiner for reviewing this post.

Update (November 30, 2021)

My concerns about the use of @lang may so far be based on an insufficient differentiation between, and reconciliation of, text-processing language and language(s) of the intended audience. I’m getting and adding more clarity.

A clarification, rather than an update. The W3C I18N Activity’s language Q&A and RFC 2616 differentiate between a document’s language and the language of its intended audience. Based on that differentiation, there’s an argument against using the Content-Language HTTP header, and for @lang in every document.

I don’t think this is useful. In practice, there doesn’t seem to be a difference between a document’s language and the language of its intended audience. When you write a document in English, you expect your audience to be people who speak English. Accordingly, there doesn’t seem to be any website actually working like this, either—instead, languages declared through Content-Language and html@lang usually match. (If both meant entirely different things, it wouldn’t make sense, either, to use the HTTP header as a fallback to determine a document’s language.)

Therefore, unless you share your web pages on DVDs, the argument is not a good one. It seems weak, even, as it impacts (but ignores) code economy and maintainability. When advocating against the use of a single Content-Language header on the server-side, and instead asking to add @lang in every document, the result is poor economy and maintainability.

No matter where you look (okay—where I look), the argument made for html@lang is not strong. (If I don’t get something, tell me—just don’t quote the very resources I’m already considering.)

* As all screen reader installations were in English, this is likely to mean that the fallback is the language of the screen reader, rather than that the fallback is always going to be English.

Toot or tweet about this?

About Me

Jens Oliver Meiert, on September 30, 2021.

I’m Jens, and I’m an engineering lead and author. I’ve worked as a technical lead for Google, I’m close to W3C and WHATWG, and I write and review books for O’Reilly. I love trying things, sometimes including philosophy, art, and adventure. Here on I share some of my views and experiences.

If you have a question or suggestion about what I write, please leave a comment (if available) or a message. Thank you!