This article explains about XML Document Structure. We will learn what does an XML document contain and some information about entities in XML documents.
An XML document often uses two supplementary files. One file specifies the syntactic rules and the other file specifies the presentation details about how the content of the document is displayed.
An XML document contains one or more entities that are logically related collections of information, ranging in size from a single character to a book chapter. One of these entities, called the document entity, is always physically in the file that represents the document. A document entity might contain references to entities in other documents.
Many documents include information that cannot be represented as text, such as images. Such information units are generally stored as binary data and must be specified separately. Such entities are called binary entities.
Entity names can be of any length. They must begin with a letter, a dash or a colon. After the first character, a name can have letters, digits, periods, dashes, underscores or colons. A reference to an entity is its name with a prepended ampersand and an appended semicolon. For example, if sun_image is the name of an entity, &sun_image; is a reference to it.
When several predefined entities must appear near each other in a XML document, their references clutter the content and make it difficult to read. In such cases, a character data section can be used. The content of a character data section is not parsed by the XML parser, so it cannot include any tags. A character data section is represented as shown below:
An example of using character data section is given below:
<![CDATA[The last word of the line is >>> here <<<]]>
As the content of a character data section is not parsed by the XML parser, any entity references that are included are not expanded. For example, the content of the line:
<![CDATA[The form of a tag is <tag name>]]>
is as follows:
The form of a tag is <tag name>