If you have added the DOCTYPE Declaration to your web pages but still can’t get the validator to work (the validator responds with “No Character Encoding Found! Falling back to UTF-8.”) you are missing an important piece of information that the validator and browsers need to render your web page correctly. So what is this Character Encoding the validator is talking about?
Before we continue, some definitions/descriptions:
A universal character set that defines the characters included in a majority of the languages of the world. It can work with pages and forms that include a mixture of languages within the same page.
Also know as Latin1, includes the latin based languages of the world. It includes most western european languges.
Set of characters required for a certain language.
What is Character Encoding?
To start with you have to understand a little bit about computers. Information on a computer is stored and transmitted in what are called bits. Certain bits or combinations of bits equate to certain characters.
…The “charset” parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters. This conversion fits naturally with the scheme of Web activity: servers send HTML documents to user agents as a stream of bytes; user agents interpret them as a sequence of characters. The conversion method can range from simple one-to-one correspondence to complex switching schemes or algorithms…
Reference: Section 5.2 Character encodings of the HTML Document Representation W3C Recommendations
Character encoding tells the browser and validator what set of characters to use when converting the bits to characters.
Why do I need Character Encoding?
You need to include character encoding because:
- The declaration of character encoding is required as of the HTML 4.01 specification.
- When a browser renders/parses a web document that does not have the character encoding declared it will guess at what character set to use and may choose the wrong one therefore rendering the web page incorrectly.
- The visitor may have changed the default character encoding on their machine (Internet Explorer: View, Encoding) and it may not match the character encoding intended for the web document.
Now that you understand what character encoding is and why it is needed it’s time to choose a character encoding for the web documents on the website.
Choosing a Character Encoding
When choosing a character encoding choose one that will be versitle, covering all the different languages and requirements of your intended audience.
As previously mentioned, Unicode (UTF-8) is a very versitle character encoding to choose. Note: Old browsers may have an issue with you using the UTF-8 character set. It may be wise to consider one of the other character encoding specs until this character encoding is more widely supported.
ISO-8859-1 could also be suitable for the web documents on your web site.
If neither of these will cover your intended audience you can consult the IANA Registry for other character encoding name.
The character encoding names are case sensitive so make sure you note the correct way to name the chosen character encoding.
Character Encoding References
To learn more about character encoding, character sets, and Unicode the following character encoding references provide more technical explainations of the topics covered in this article.
- W3C I18N Tutorial: Character sets & encodings in XHTML, HTML and CSS (Draft)
- IANA character set registry
- Authoring Techniques for XHTML & HTML Internationalization: Characters and Encodings 1.0 (This is a work in progress)
- Unicode in XML and other Markup Languages
- W3C I18N Topic Index This index includes Character sets, character encodings and escapes besides other information.
- Unicode Home Page
Once you have chosen a character encoding for your website, you will need to add it to each web page. This is called declaring the character encoding.