XHTML vs. HTML (part 3)

This is part 3 of the XHTML vs. HTML tutorial (Click here for part 1 or here for part 2) I recommend you read this first.

In january 2000, when XHTML was first recommended, it was brand new. At the writing of this document, more then 3 years later it’s still not fully supported by some browsers. To be able to use XHTML in the existing HTML user agents, here are some guidelines to how to make your XHTML documents render in existing HTML user agents.

XML Declaration

Some user agents that don’t understand the XML declaration may interpret your document as an unrecognized XML file instead of HTML. Leaving the XML declaration out, may solve this. However, if the XML declaration is not included, the document can only use default character encodings.

Empty elements

In XHTML even empty elements need to be closed. To avoid errors, include a space in before the trailing />. For example: <br />, <hr /> and <img src=”” alt=”” />. Also use the minimized tag syntax, e.g. <br />, because the alternative syntax <br></br> can give uncertain results in existing user agents.

Element minimalization

Do not use the minimized form of an empty instance of an element whose content model is not EMPTY. For example: and empty paragraph or title. (e.g. use <p> </p> and not <p />)

Embedded style sheets and scripts

It is commonly known to encapsulate scripts in <!-- --> to make it backward compatible with old browsers. Newer browsers check those comments, if they can handle the script, it is executed, if not, it is left alone. XML parsers however, remove comments silently. This results in not executing the encapsulated script. Therefore, don’t “hide” your scripts and style sheets in comments, this will not work anymore in XML-based user agents.

Document language

Use both lang and xml:lang attributes when specifying the language of an element to avoid problems. The value of xml:lang takes precedence.

Fragment identifiers (id and name)

URI-references that end with fragment identifiers of the form #foo, do not, in XHTML, refer to elements with an attribute name=”foo”; rather, they refer to elements refer to the id attribute (id=”foo”). However, many existing HTML clients do not support the use of the id attribute, so identical values should be supplied for both attributes to ensure maximum forward and backward compatibility (name attribute will removed from XHTML in future versions), e.g. <a id="foo" name="foo">...</a>.

The values in name and id must be unique within a document, valid (meaning only a-z, A-Z, and . : – _ ). The value in id and name must be the same. Finally, in XHTML 1.0 the name attribute is depreciated in a, applet, form, frame, iframe, img and map elements. In subsequent versions of XHTML it will be removed.

Character encoding

Character encoding of an HTML document was either done by the server (a HTTP Content-Type header) or by a meta element in the document itself. In an XML document and as a consequence also XHTML documents, this is done on the XML declaration (e.g. <?xml version="1.0" encoding="iso-8859-1"?>). They best way to ensure portability, is to let the web server provide the correct headers. If this is not possible, a document must include both the XML declaration and a meta http-equiv statement (e.g. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />). In XHTML the encoding declaration of the XML element takes precedence.

In XML and XHTML, the ampersand character (“&”) declares the beginning of an entity reference. Many HTML user agents, however, have silently ignored incorrect use of this character in HTML documents, treating ampersands that do not look like entity references as literal ampersands. XML doesn’t tolerate this incorrect usage and the document will not be “valid”. To ensure compatibility between HTML and XHTML, ampersands that need to be treated as literal characters, must be expressed as entity references themselves (e.g. “&amp;” for the ampersand character). For example, when a href attribute of an a element refers to a CGI or PHP script that takes more than one parameter, it must be expressed as http://my.site.com/myscript.php?id=foo&&amp;name=bar rather than as http://my.site.com/myscript.php?id=foo&name=bar.

Apostrophe

The entity reference &apos; which outputs an apostrophe (” ‘ “), was introduced in XHTML 1.0 but does not appear in HTML. Therefore &#39; should be used instead of &apos; to work as expected in HTML 4 user agents.

Personal note

With this 3rd part in the XHTML vs. HTML tutorial, I think you will be able to cope with most of the problems occurring in the transition from HTML to XHTML. My suggestion at this time is still use XHTML transitional, this will be the easiest, especially if you use frames or external links (using “_blank”).

No Responses to “XHTML vs. HTML (part 3)”.

Leave a response