HTML has come a long way right from its initial version HTML. However, despite the fact that you can use HTML for much more than serving up static text documents, the basic organization and structure of the HTML document remains the same.
Before we dive into the specifics of various elements of HTML, it is important to summarize what each element is, what is it used for, and how it affects other elements in the document? A high-level overview of the standard HTML document and its elements is given below.
Specifying Document Type
The <!DOCTYPE> tag in a HTML document is used to specify a Document Type Definition (DTD). This tag is frequently overlooked by many of the web developers. This tag precedes all other tags in a HTML document. It specifies the format of the content that follows – what tags to expect, methods to support and so forth.
Validation systems use DTDs to actually perform the validation, using the DTD contents as a road map and a syntax guide. HTML editors use the DTD to provide tag auto completion and on-the-fly syntax checking.
The <!DOCTYPE> tag is generally written as follows:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>
This above tag specifies the following information:
- The document’s top tag level is HTML (html).
- The document adheres to the formal public identifier (FPI) ‘‘W3C HTML 4.01 Strict English’’ standards (PUBLIC “-// W3C//DTD HTML 4.01//EN”).
- The full DTD can be found at the URL www.w3.org/TR/html401/strict.dtd.
HTML Document Structure
All HTML tags will have three document-level tags in common. These tags (essential or fundamental tags in HTML), <html>, <head> and <body> delimit certain sections of the HTML document.
The <html> tag
The <html> tags enclose the entire HTML document. This tag tells the browser where the HTML markup begins and ends. Also tells the browser that it is a webpage. We can think of the <html> tags as the virtual top and bottom of a web page, as shown below:
Additional attributes can be specified along with the <html> tag. Attributes like lang and dir can be specified which tells the browser about the language in the document and about the direction of the text respectively. For the dir attribute, the possible values are ltr (left to right) and rtl (right to left).
The <head> tag
The <head> tag delimits the header section of a HTML document. The document’s title, meta information, scripts and style information are all contained in the <head> section. Most of the information available in the head tag is not visible to the user.
The information specified in the meta tags is useful for search engines like google, yahoo etc… The <head> tag contains one of the important tags in html, the <title> tag, which is used to specify the title for the web page.
The <body> tag
The HTML document’s main visual content is contained within the <body> tags. That’s not to say that everything appearing between the <body> tags will be visible, but, like a printed document, this is where the main body of the document is placed and appears.
Styles effect the way how the elements in a webpage are presented to the user. This mechanism should not be neglected while developing web pages. Style definitions may appear in the head section of a document or linked from a separate file or specified inside the tags by using the style attribute.
The important point about styles is that they enable you to radically change a document’s appearance by simply applying new styles. This enables you to display the document differently for different uses – different display or output devices or to provide a different look and feel for different audiences.
Like word processors (Ex: Microsoft Word, Notepad, Wordpad etc…), HTML includes several tags to format blocks of text. These tags include the following:
- <p> – Paragraphs
- <h1> through <h6> – Headings
- <blockquote> – Quoted text
- <pre> – Preformatted text
- <ul>, <ol> and <dl> – Lists
- <div> – Divisions
Each of the block elements results in a line break and noticeable space padding after the closing tag. The block elements works only on blocks of text. They cannot be used to format characters or words inside blocks of text.
Paragraphs: The paragraph tag (<p>) is used to delimit entire paragraphs of text.
Headings: HTML supports six levels of headings. A heading is displayed generally large and in bold formatting to identify it as a heading. Level 1 heading (<h1>) is the largest and Level 6 heading (<h6>) is the smallest.
Quoted Text: The <blockquote> tag displays the text as a quote instead of plain text. The text inside a block quote will be offset from the left margin.
List Elements: The list elements are used to display a list of items in a web page. There are three types of lists namely: unordered lists (<ul>), ordered lists (<ol>) and definition lists (<dl>).
Preformatted Text: Generally, while displaying text in a web browser, the browser eliminates unnecessary white spaces and tab spaces before displaying it to the user. To prevent that and to display the text along with all the whitespaces and tab spaces, the preformatted text tag (<pre>) can be used.
Divisions: Divisions are a higher level of block formatting, usually reserved for groups of related elements, entire pages or sometimes just a single element. Divisions are generally used as containers for elements in a web page. Divisions make it easier to apply styles to a set of elements.
The finest level of markup possible in HTML is at character level. Like in a word processor, HTML allows formatting on individual characters. Inline formatting elements include the following:
- <b> – Bold
- <i> – Italics
- <big> – Big text
- <small> – Small text
- <em> – Emphasized text
- <strong> – Strong text
- <tt> – Teletype (monospaced) text
Several inline tags like strikethrough <strike> and underline <u> tags, have been deprecated in the current HTML specifications. Even the <font> tag has been deprecated in favor of styles. The <strike> and <u> tags have been replaced with <del> (deleted) and <ins> (inserted).
Several experienced web developers recommend the usage of <strong> and <em> tags for strong and emphasized text instead of using <b> and <i> respectively. The reasoning has to do with what is to be accomplished – is it strengthen or emphasize text or the look (bold and italic).
The span tag (<span>) is used to span styles across one or more inline characters or words. In effect, the <span> tag enables you to apply your own inline styles.
Special Characters (Character Entities)
Some special characters must be referenced directly instead of simply typed into the document, and some of these characters cannot be typed on a standard keyboard, such as the trademark symbol (™) or the copyright symbol (©). Others could cause the HTML client confusion (such as the angle brackets,< and >). These specially coded characters are commonly referred to as character entities.
Entities are referenced by using a particular code in your documents. This code always begins with an ampersand (&) and ends with a semicolon (;). Three different ways to specify an entity exist:
- mnemonic code (such as copy for the copyright symbol)
- decimal value corresponding to the character (such as #169 for the copyright symbol)
- hexadecimal value corresponding to the character (such as #xA9 for the copyright symbol)
The following are all examples of valid entities:
- —A nonbreaking space (used to keep words together)
- <—The less-than symbol, or left-angle bracket (< )
- ©—The copyright symbol (©)
- &—An ampersand (&)
- ——An emdash ( — )
Two HTML elements allow us to organize information in a document: tables and forms. Tables allow us to present the data in column and row format, like a spreadsheet. Forms enable us to retrieve and present data using elements common to GUI interfaces (textboxes, radio buttons, check boxes etc…).
The main advantage of the World Wide Web (WWW) is the ability to link to other documents on the web. A link typically appears as underlined text and is often rendered in a different color than normal text. Hyperlinks (links) are created by using the anchor tag (<a>). The href attribute of the anchor tag specifies the address (URL) of the destination resource. For example a link to Google appears as follows:
One of the great innovations that the World Wide Web (WWW) and HTTP brought to the internet was the ability to serve up multimedia to the clients. The precursors to full-motion video and audio were graphical images in the Web-friendly Graphics Interchange Format (GIF) and Joint Photographic Experts Group (JPEG) format.
Images can be inserted into web pages by using the image tag (<img>). A typical image tag looks as shown below:
<img src=”/images/sun.jpg” alt=”Sunrise” width=”200” height=”100” />
The src attribute specifies the location of the image. The alt attribute specifies the alternate name for the image and the height and width attributes specifies the dimensions of the image.
Although HTML documents tend to be fairly legible all on their own, there are several advantages to adding comments in our HTML code. Some typical uses are: comments aid in document organization or marking particular document sections for later reference.
Comments in HTML begin with <!– tag and end with –> tag. A comment can span multiple lines. All the text in between the comment tags, is not visible to the end user.
HTML is used to deploy content which is static in nature. The static content is sent to a user agent (browser) where it is rendered and presented to the user, but once it is sent, it doesn’t change. However, there is a need in HTML documents for such things as decision making ability, form validation and in the case of Dynamic HTML (DHTML), dynamic object attribute changes.
After looking at all the elements that can be placed inside a HTML document, it can be said that all HTML documents should have a <!DOCTYPE> declaration, <html>, <body> tags and atleast a <title> tag within the <head> section. The rest of the elements are strictly optional, but they help define a document’s purpose, style and ultimately its usability.