Word Division: On “word-break,” Soft Hyphens, and Zero-Width Spaces
Published on Feb 8, 2007 (updated Aug 17, 2024), filed under development (feed). (Share this on Mastodon or Bluesky?)
This post is outdated. (Consider hyphens
!)
Word breaks and hyphenation are sometimes a problem when it comes to little available space but long words: The longer the word and the less space available, the more a layout is at risk. English appears to be less affected than other languages (I suspect Finnish and also German to be good examples for use of overly long words), but every once in a while a developer looks for ways to “automatically” break words.
Let’s take a look at possible solutions (example page).
word-break
word-break
is a formerly proprietary property introduced by Microsoft (who don’t care much about vendor-specific extensions), which meanwhile has been included in CSS 3.
Though word-break
sounds quite promising, I understand the specification that it does not demand from implementations to really take into account grammar, but to rather provide word breaking on a per-letter basis. That’s how it works now when you try break-all
(see example), as long as you test with Internet Explorer which supports parts of this formerly unstandardized property.
Since this is just a little round-up, I will state that word-break
- does not provide the kind of hyphenation we usually need, and
- is not yet broadly supported.
Soft Hyphen
The soft hyphen—from Unicode’s C1 Controls and Latin-1 Supplement (PDF)—is usually injected via ­
or Â
, respectively. Skipping Jukka Korpela’s former article on SHY, we must note that it
- can also “consider grammar” (when used in the right places and in conjunction with a language that splits words using a hyphen, of course), but
- is not yet supported at least in Gecko-based browsers (like Firefox, for example).
Zero-Width Space
Zero-width spaces—see Unicode’s General Punctuation chart (PDF)—are used the same way as soft hyphens, namely by placing ​
entity references in your HTML. So what’s to note when you bank on zero-width spaces?
- Depending on the language they might be a good but also a poor choice, when “just splitting up words” could lead to spelling mistakes, and
- they appear to be supported the best, despite causing additional white space in Internet Explorer 6 (which isn’t necessarily a problem since IE 7 is doing well) and—contrary to soft hyphens—coming with some uncertainty around whether there are user agents that don’t display the character correctly
[…].
You’re probably as wise as before, but beside correcting me on details (other nifty Unicode characters?) of this hastily written post please take another look at the aforementioned test page.
Update (July 9, 2007)
Breaking: The soft hyphen has been fixed in the Gecko core.
About Me
I’m Jens (long: Jens Oliver Meiert), and I’m a web developer, manager, and author. I’ve worked as a technical lead and engineering manager for small and large enterprises, I’m an occasional contributor to web standards (like HTML, CSS, WCAG), and I write and review books for O’Reilly and Frontend Dogma.
I love trying things, not only in web development and engineering management, but also in other areas like philosophy. Here on meiert.com I share some of my experiences and views. (I value you being critical, interpreting charitably, and giving feedback.)