Word Division: On “word-break”, Soft Hyphens, and Zero Width Spaces
Jens Meiert, February 8, 2007 / March 7, 2008.
This entry is filed under Web Development.
Word breaks and hyphenation sometimes are a problem when it comes to few available space but long words: The longer the word and the less space available, the more a layout is “in danger”. English appears to be less affected than other languages (I suspect Finnish and also German to be sufficient examples for use of overly long words), but once in a while every developer looks for ways to “automatically” break words.
Let’s take a look at possible solutions (example page).
word-break
word-break is a formerly proprietary property introduced by Microsoft (that doesn’t care about vendor-specific extensions anyway), and it meanwhile has been included in CSS 3.
Though word-break sounds quite promising, I understand the specification that it does not demand from implementations to really take into account grammar, but to rather provide word breaking on a per-letter basis. That’s how it works now when you try break-all (see example), as long as you test with Internet Explorer that of course supports (parts of) this former unstandardized property.
Since this is just a little round-up, I may state that word-break
- does not provide the kind of hyphenation we usually need, and
- it’s not yet broadly supported.
Soft Hyphen
The soft hyphen – from Unicode’s C1 Controls and Latin-1 Supplement (PDF) – is usually injected via ­ or ­, respectively. Entirely skipping Jukka Korpela’s ancient article on SHY, we must note that it
- can also “consider grammar” (when used in the right places and in conjunction with a language that splits words using a hyphen, of course), but
- it’s not yet supported at least in Gecko based browsers (like Firefox, for example).
Zero Width Space
Zero width spaces – see Unicode’s General Punctuation chart (PDF) – are used the same way as soft hyphens, namely by placing ​ entity references in your HTML. So what’s to remark when you bank on zero width spaces (summarized as confusing as possible …)?
- Depending on the language they might be a good but also the wrong choice when “just splitting up words” means creating spelling mistakes, and/although
- they appear to be supported the best, but they may cause additional whitespace in at least Internet Explorer 6 (not necessarily a problem for you since IE 7 is doing well), and – as supposed to soft hyphens – I “heard of problems” with certain user agents that couldn’t display the character correctly
[…].
Finally, you’re probably as wise as before, but beside correcting me on details (other nifty Unicode characters?) in this hastily written post you may also want to take another look at the aforementioned test page. Thanks, and you’re welcome
Read More
Enjoy the most popular posts, probably including:
- Suspension of Civil Rights: At Least They’re Honest
- 3 x Web 2.0: What You Should Know About the Current Web
Comments
-
On July 9, 2007, 22:57 CEST, Jens Meiert said:
Quick update: The soft hyphen has been fixed (Gecko core).
-
On August 3, 2007, 10:34 CEST, Miha Hribar said:
Great article. I haven’t heard much of the mentioned zero width space, but the fact that it doesn’t work in IE6 still is a major issue (as a vast majority of users still haven’t upgraded their IE versions).
On that note, you forgot to mention the wbr tag, as it currently is the only piece of code that works well on all browsers I tested. Though it is not a valid tag, it still gets the job done (if you can live with all those failed XHTML validation results
).For more info on the wbr tag check out quirksmode
http://www.quirksmode.org/oddsandends/wbr.html