Deeper Study into Digital Text and Character Encoding

Posted on 22nd June 2011 by robtinbc in internet

Most web designers and developers do not need any instructions for dealing with character encoding. In general the process is done within the browser and isn’t often a topic of discussion amongst programmers. However the tools and regulations used to enforce such strict document types provides leverage for debugging and creating professional typographic aspects to a website.

Although the topic is brief I’ve gone over some of the quick highlights and important notations. You should be able to breeze through these topics if you have at least a beginner’s understanding towards digital documents. Basic HTML isn’t required although these techniques mostly apply to web files such as HTML, CSS, and PHP.

Along with encoding types there are many standards for digital typography. These become more convoluted as you work towards web projects and websites in general. We’ve written on great CSS techniques which may also be of use to any frontend developers.

Document Character Encoding

For a proper introduction character encoding must be explained as a digital phenomenon. This means we do not face similar problems with type written on paper or other canvas.

This is because each character in a font set is assigned a specific symbol. This is also appointed as a numerical value within the computer’s storage system language. After saving any type of web document you must choose a character encoding (generally UTF-8, ANSI, ASCII).

The apparent flaws with character encoding come with missing blocks to your page content. Lettering such as the copyright sign may be misinterpreted with the wrong encoding and you’ll get a missing character symbol. This may be a small black triangle, empty rectangle shape, or any number of symbols.

As HTML developers understand we have shortcuts for implementing special characters into your page. These are often known as HTML entities which are reserved under special numerals or lettering. For example, © will display the exact same character as ©. The numerical value demonstrates which placeholder the symbol will be stored under – in this case value 169.

Importance for Designers

You may still be questioning why this is so important? Well as a developer you may not run into much trouble under the Roman alphabet. For English speaking countries we have the advantage of running many popular character encoding sets. But consider pages written in languages not only different through lettering, but entire alphabets!

You may only need a universal encoding for languages such as German or Spanish. The case for special characters are often held in reserve which means you may combine English and foreign characters together. But consider languages such as Japanese or Russian where you’ll need an entirely new set of glyphs.

It may not be all too often where you’ll be writing Chinese webpages. However if you can understand how the technology works you’ll be better off in the long run with your practices as a webmaster. The HTML character encodings wiki page should answer a few more obvious riddles.

Prime Examples

If we can begin to examine HTML pages in-depth you may be surprised at just how little is required. We may set a simple meta tag towards the heading of any web document which supports defining the character set, or charset. Below is a brief example:

The attribute for charset is defined within the content attribute. This may sound a little confusing and thankfully represents an older way of defining doctypes. However it’s not a bad solution and will still be processed properly with older and modern web browsers alike.

Although you may not be too interested with charsets there isn’t all too much to remember. If you can keep the names of important sets on-hand as code snippets you may end up saving loads of time on project work. For those interested research ISO 8859-1 charset details which outline the various elements in table format.

If you’ll be creating an XML or XHTML document then a simpler tag may be placed before all other code. This line of code defines all further elements and typography down the page. Within XHTML it’s possible to define your character set based on newer XML tag principles.

The defining XML element above sets any webpage to XHTML standards. XML 1.0 is the finest revision containing standard nodes for HTML placement and manipulation. You don’t need to understand the tag all too well, but understanding the UTF-8 encoding style may shine some light on your other options.

Conclusion

The concept of page encoding is absolutely an important one. It may not be required to build a basic website but after some time you will certainly run into trouble with typography. It helps to understand just how each letter is constructed and where the computer is reading this information from.

You may find it difficult to choose a character set without further guidance. The system isn’t all too complicated and just requires a bit of study time on your part. Ultimately you hold all the tools you need within whatever development environment you’re currently using. Both Mac and PC offer viable solutions for character sets even with Notepad and TextEdit as your default.

What are your thoughts on character sets and digital reference guides? We’ve got a small collection of tips for improving your webpage typography. If you know of similar resources or fantastic designers please share your ideas in the comments below.