XHTML
(eXtensible HyperText Markup Language)

by Ed Lein©May 2003
(Because this was written in 2003, some observations, such as those about web browser quirks at that time, will no longer be relevant.)

Introduction

In January 2000, the World-Wide Web Consortium (W3C) declared XHTML the standard for web pages. With XHTML, HTML has been reformulated as an application of XML (eXtensible Markup Language), which is itself a restricted form of SGML (Standard Generalized Markup Language). The point (beyond creating excellent acronyms) is to make the data that's contained in web pages accessible for many years to come, and by a wide variety of applications.

XHTML is a standarized, non-proprietary encoding language that provides a means for sharing data online. In other words, it's an "adaptable internet tagging system" that anyone can use to make web pages.

XHTML is "purer" than old-fashioned HTML. It makes it easier for non-PC platforms (like Web-TV, palm computers, cell phone, etc.) and non-visual devices (like voice or Braille readers) to process web page data. And, it makes for more efficient indexing of web pages by search engine robots.

But, files created with XHTML are still called "HTML" documents.

Writing HTML Files

To open Notepad, click "Start" on your desktop taskbar, then "Programs," "Accessories," and "Notepad."

All you need to create an HTML document is a plain text editor. Simply save the document with the filename extension .html or .htm.

HTML File Structure & Layout

There are two main parts to the web page:

  1. The DOCTYPE (i.e., Document Type declaration), and
  2. The html document proper (or, more precisely, the "root element")

  3. There are two main elements of the HTML document proper:
  1. The head of the document (which must include the document's title), and
  2. The body of the document (which includes everything that actually shows up in the web browser window).

Thus, the five required elements are:
DOCTYPE, htmlhead, title, and body.

Basic Web Page Layout

It is not required that you type your tags and data on separate lines, or use double-spacing or indentations to separate different elements. These are used at the web­master's descretion just to make the code easier for people to read.

<!DOCTYPE ... >

<html ... >
<head>
    <title>[Web page title goes here]</title>
</head>

<body>

[Web page content goes here]

</body>
</html>

See alsoWeb Page Template

About HTML Tags

XHTML elements are defined by prescribed tags between
less-than (<) and greater-than (>) angle brackets. Tags identify and delimit the various parts of the web page, and they help control the display of text and graphics.

First Things First:
The Document Type Declaration
providing the Document Type Definition (DTD)

The DOCTYPE precedes the html root element, and consists of just one (XML) tag enclosed in angle brackets. It starts off with an exclamation point, uses upper- and lowercase letters, and it isn't "closed" (i.e., there's no closing tag containing a slash mark).

The DOCTYPE enables different types of applications to display the web page consistently; without it the browser (or other device) must guess at what you intend, sometimes with unfortunate results. It also enables you to use W3C's HTML Validator, a very useful online tool that identifies any problems with your code.

There are three choices for an HTML DOCTYPE:

Strict:
use when the document complies with all current standards.

"99.9% of Websites are Obsolete." PC-based browsers (like Internet Explorer, Netscape, etc.) will likely always be able to read outdated HTML encoding because they have the memory capacity to cope with it. And you will still find tons of the old stuff as you look at examples on the internet. But that doesn't mean that it's good to perpetuate it ...

Transitional:
use when the document contains disapproved (or, "deprecated") HTML tags, in an attempt to make the web page "backward-compatible" with outdated browsers.

Frameset:
use when the web page has an outdated "frames" structure, in which a patchwork of separate html files are combined to display all at once.
(In truth, current browsers aren't all that successful handling frameset-like displays using just "style" elements, so the sometimes annoying framesets may be around for a good while.)

The specifics of the DOCTYPE are prescribed, so just copy and paste whichever one is appropriate:

<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
  "https://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

A digression
You can please some of the people ...

According to 2002 statistics, about 88% of the world uses either IE5 or IE6. But some people cling to outdated browsers (like the bug-infested Netscape 4, even though they're up to Netscape 7.02), so many (probably most) web page designers use the W3C-authorized "transitional" mode, that mixes deprecated HTML tags with the standard XHTML ones. (But in this guide we will avoid using outdated coding--no use confusing the issue by learning stuff that you will have to "unlearn" later.)

Recognizing some particular problems with "legacy" browsers, the W3C has put together some HTML Compatibility Guidlines. (NOTE: Their compatibility guidelines have been built into our practical discussion. For instance, the "XML Declaration" that might precede the DOCTYPE has been omitted.)

Thus, there's a great deal of effort made trying to get web pages to "look" the same, regardless of the age of the browser. Chances are good, however, that those who use outdated browsers aren't that interested in "style" anyway--if they were they would upgrade since current versions of most browsers can be downloaded for free. So, you may not want to clutter up your code with unnecessary "work-arounds" trying to be all things to all browsers (which is impossible, anyway).

Just make sure that your content flows logically even if the style elements are removed.

XHTML Tag Syntax

In the HTML document proper, besides being enclosed in angle brackets (< >) all XHTML tags must be written in lowercase letters. They also must be "closed," meaning they either are paired with a separate closing tag, or are self-closing "empty" tags.

Paired opening and closing tags are used to enclose all web page content. For example, <body> </body> respectively show the beginning and ending of all the web page data that will appear in the browser window. Note that in the closing tag a slash mark always precedes the tag name.

Opening and closing tag pairs must be properly "nested." This means you may not close one element until all the elements it contains have been closed first.

GOOD: <tag1> <tag2> data </tag2> </tag1> 
BAD: <tag1> <tag2> data </tag1> </tag2>

The self-closing slash in "empty" tags is new with XHTML; it does not appear in the earlier HTML encoding that is still seen in abundance on the internet.

Self-closing tags are complete in themselves, and include everything they need inside a single pair of angle brackets. (In other words, they do not mark the beginning or end of a data string which they "contain.") These "empty" tags have a blank space and a slash mark before the closing angle bracket, (e.g.<hr />).(The blank space before the slash isn't actually part of the XHTML specification; it's included so outdated browsers can properly process these tags, but might be disapproved later on.)

HTML Root & Head Elements, & <body> TagsBasic Tags for Text MarkupTags for Adding Links and ImagesTags for Tables and Lists

Tag Attributes

Most tags can have identifying or modifying attributes. Attributes must have the following structure, in which the attribute's name and value are variables, and the equals sign and double quotation marks are the prescribed punctuation:

attribute_name="Value"

Note that the attribute_name may use only lowercase letters, but, depending on the tag it applies to, the value might use upper- and lowercases, numbers, spaces, punctuation marks, and even full sentences.

HTML Color Codes
Example:

If you want white text on a black background, you can modify the <body> tag using the attribute named "style," with a detailed "value" that specifies these display features:

<body style="color: white; background-color: black">

  • "Standard attributes" usually either identify the element or determine how the element's contents will be displayed.

  • "Event attributes" provoke some reaction on the web page to an action by the user (for example, moving the cursor over a specified element, usually a link). These events are caused by a script program, most commonly written in JavaScript.

"Core" Standard Attributes

XHTML Syntax Summary

  • XHTML tags must be written between angle brackets in lowercase letters.
  • XHTML tags must be closed.
  • XHTML tags must be nested properly.
  • Values of XHTML attributes follow the equals sign and must be enclosed between double quotation marks.

XHTML vs. HTML

  • XHTML is a stricter, cleaner version of HTML.
  • XHTML does a better job of separating "style" from "content."
  • XHTML tags are almost identical to HTML tags, but many HTML display-related tags and attributes have been replaced with "class" or "style" attributes in XHTML (especially those related to fonts and text alignment).
  • XHTML is interoperable among a variety of applications, whereas HTML is limited pretty much to personal computers.

Data vs. Style:

What really distinguishes XHTML from HTML

XHTML more thoroughly separates the data content (i.e., what the document says) from its stylistic presentation (i.e., how the document looks, including fonts, background colors, margin widths, text-alignment, and the like).

Cascading Style Sheets (CSS)

Style features are assigned through the use of "cascading style sheets" (CSS) , which were first used with HTML4.0. They are called "cascading" because they may be assigned at different levels that flow from one level into the next to create the desired stylistic effects.

Cascading style instructions can be encoded using:

  • External style sheets, in which style instructions are linked from separate CSS files. External style sheets can be shared by multiple web pages, so they make it very easy to remain consistent from web page to web page, and to make global style changes almost instantly. You can use Notepad or other plain text editors to write CSS documents, too--just use the filename extension .css when you save it.

  • Internal style sheets, in which style instructions are given on an individual web page to specify, for example, how links display according their status:

              <style>
              a:link { color: blue; text-decoration: none; }
              a:visited { color: purple; text-decoration: none; }
              a:hover { color: red; text-decoration: underline; }
              </style>


  • Inline style definitions, in which style instructions are declared in the tag where the change occurs in the text, by using the style="?" attribute. These inline instructions can be applied both at the "block" level (for example, to headings, to paragraphs, by using the <div> tag, etc.), and phrase-by-phrase, word-by-word, and even letter-by-letter (by using the <span> tag).

Successive levels inherit most style properties from the preceding levels, unless they redefine the property. Approved Properties and Values

CSS Syntax

For each style definition, there is a "selector" (that specifies what you want to change) and a "declaration" (that tells how you want it to look).

  1. The selector in both external and internal style sheets is simply listed. For inline definitions the selector is the tag to which the style="?" attribute is applied. 

  2. The declaration has two parts, separated by a colon:
property: value
  1. property (the characteristic you wish to change)
  2. value (what it will change to)
color: red

In the style sheets the declaration is surrounded by curly braces:

{ color: red }

With inline style definitions the declaration becomes the value of the style attribute:

style="color: red"

If one selector has multiple declarations, the declarations are always separated by semicolons.

EXAMPLE:

Say you want your section headings (for which you've used <h2> </h2> as the markup) to have white text on a black background ...

  • With style sheets (assign once for the whole document):
  • h2 { color: white; background-color: black } 

  • With inline style definitions (repeat each time this type of heading is used):
  • <h2 style="color: white; background-color: black"> ... </h2>

The "Class" Attribute

Using either external or internal style sheets, it is possible to create one "class" selector that can be applied to any number of elements that will share the same style properties. Choose a name for the selector, such as "red" for a class that will change the font color to red. In the style sheet you list the selector's name beginning with the prescribed punction mark for classes, the period. Then set up your declaration:

.red { color: red }

Every time you want red letters just add the class attribute to whatever tag you're dealing with, and it will refer back to the specified declaration in the style sheet:

<p class="red"> </p> 
<span class="red"> </span>
<h2 class="red"> </h2>

The "ID" Attribute

Also used in style sheets, the id attribute works similarly to the class attribute, but in reverse. Instead of referring elements back to the style sheet the id refers style sheet definition(s) to just one specific element, and you use the # at the beginning of the id's name in the styles list instead of the period.

But why?

Say you have a group of paragraphs that you want to show up as white text in a navy box in the middle of the page. So, you group the paragraphs within a division that you id (and also "name" for "backward compatibility") as "navybox." Only then you realize that you can't see the blue-colored links -- you need the links in "navybox" to be in a contrasting color from the background, but you want the links outside "navybox" to stay as they are. You could try inline style attributes on all the "navybox" links, but the style sheet is easier, plus you can add some other effects not possible with the inline definitions.

EXAMPLE:
In the style sheet:

#navybox { color: white; background-color: navy; width: 50%; text-align: center; }
#navybox a:link { color: yellow } /* "a" for link "anchor" tags */
#navybox a:visited { color: silver }
#navybox a:hover { color: red; background-color: white }

      /* a:link, a:visited, a:hover are examples of "pseudo-classes" */

And in the body of the document:

  <div id="navybox" name="navybox">
    <p>A paragraph with an <a href="#example">internal link</a>.</p>
    <p>A paragraph with an <a href="https://www.loc.gov" target="_blank">offsite link</a> (to the Library of Congress).</p>
  </div>

This displays as:

As you see, CSS is complex topic all by itself, and there are many more effects possible. The W3C offers a brief tutorial called Adding a Touch of Style, by Dave Raggett. Another free online CSS Tutorial is available from www.w3schools.com.

Conclusion

In truth, XHTML isn't quite as easy as old-fashioned HTML (and certainly not as forgiving). Now you have the basic tools, though. Yes, there's still plenty more to learn, but you can do it!




Beginner's Guide to HTML ADDITIONAL PAGES : Approved Properties and Values (CSS Reference) Basic Tags for Text Markup Cha...