차례
  1. 9 XHTML 문법
    1. 9.1 XHTML 문서 작성

    2. 9.2 XHTML 문서 파싱

    3. 9.3 XHTML 직렬화

    4. 9.4 XHTML 조각 파싱

9 XHTML 문법

이 섹션은 XML 자원에 관해서만 설명합니다. text/html 자원에 대한 규칙은 "HTML 문법 섹션"에서 다루었습니다.

This section only describes the rules for XML resources. Rules for text/html resources are discussed in the section above entitled "The HTML syntax".

9.1 XHTML 문서 작성

XML과 함께 HTML을 사용하는 - XHTML 문서에서든, 아니면 다른 XML 문서에 포함된 형태로든 - 문법은 XML 네임스페이스와 명세에 포함되어 있습니다. [XML] [XMLNS]

The syntax for using HTML with XML, whether in XHTML documents or embedded in other XML documents, is defined in the XML and Namespaces in XML specifications. [XML] [XMLNS]

이 명세는 XML을 위해 정의된 것 이상으로 문법 레벨의 요구사항들을 정의하지는 않습니다.

This specification does not define any syntax-level requirements beyond those defined for XML proper.

XML 문서는 필요한 경우 DOCTYPE 포함할 수 있지만, 이것이 명세를 준수하기 위해 요구되는 것은 아닙니다. 이 명세는 퍼블릭 혹은 시스템 식별자를 정의하지 않으며, DTD 포맷을 제공하지도 않습니다.

XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification. This specification does not define a public or system identifier, nor provide a format DTD.

XML 명세에 의하면, XML 처리기는 DOCTYPE 에서 참조된 외부의 DTD 부분집합을 처리한다고 보장되어 있지는 않습니다. 이러한 것은, 예를 들어, 외부 파일에 정의된 문자 엔티티 참조를 XHTML 문서에서 사용하는 것은 위험할 수도 있다는 것을 의미합니다. (<, >, &, ", ' 는 예외입니다.)

According to the XML specification, XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XHTML documents is unsafe if they are defined in an external file (except for <, >, &, " and ').

9.2 XHTML 문서 파싱

This section describes the relationship between XML and the DOM, with a particular emphasis on how this interacts with HTML.

An XML parser, for the purposes of this specification, is a construct that follows the rules given in the XML specification to map a string of bytes or characters into a Document object.

An XML parser is either associated with a Document object when it is created, or creates one implicitly.

This Document must then be populated with DOM nodes that represent the tree structure of the input passed to the parser, as defined by the XML specification, the Namespaces in XML specification, and the DOM Core specification. DOM mutation events must not fire for the operations that the XML parser performs on the Document's tree, but the user agent must act as if elements and attributes were individually appended and set respectively so as to trigger rules in this specification regarding what happens when an element is inserted into a document or has its attributes set. [XML] [XMLNS] [DOMCORE] [DOMEVENTS]

Between the time an element's start tag is parsed and the time either the element's end tag is parsed or the parser detects a well-formedness error, the user agent must act as if the element was in a stack of open elements.

This is used by the object element to avoid instantiating plugins before the param element children have been parsed.

This specification provides the following additional information that user agents should use when retrieving an external entity: the public identifiers given in the following list all correspond to the URL given by this link.

Furthermore, user agents should attempt to retrieve the above external entity's content when one of the above public identifiers is used, and should not attempt to retrieve any other external entity's content.

This is not strictly a violation of the XML specification, but it does contradict the spirit of the XML specification's requirements. This is motivated by a desire for user agents to all handle entities in an interoperable fashion without requiring any network access for handling external subsets. [XML]

When an XML parser creates a script element, it must be marked as being "parser-inserted". If the parser was originally created for the XML fragment parsing algorithm, then the element must be marked as "already started" also. When the element's end tag is parsed, the user agent must run the script element. If this causes there to be a pending parsing-blocking script, then the user agent must run the following steps:

  1. Block this instance of the XML parser, such that the event loop will not run tasks that invoke it.

  2. Spin the event loop until the parser's Document has no style sheet that is blocking scripts and the pending parsing-blocking script's "ready to be parser-executed" flag is set.

  3. Unblock this instance of the XML parser, such that tasks that invoke it can again be run.

  4. Execute the pending parsing-blocking script.

  5. There is no longer a pending parsing-blocking script.

Since the document.write() API is not available for XML documents, much of the complexity in the HTML parser is not needed in the XML parser.

Certain algorithms in this specification spoon-feed the parser characters one string at a time. In such cases, the XML parser must act as it would have if faced with a single string consisting of the concatenation of all those characters.

When an XML parser reaches the end of its input, it must stop parsing, following the same rules as the HTML parser. An XML parser can also be aborted, which must again by done in the same way as for an HTML parser.

For the purposes of conformance checkers, if a resource is determined to be in the XHTML syntax, then it is an XML document.

9.3 XHTML 직렬화

The XML fragment serialization algorithm for a Document or Element node either returns a fragment of XML that represents that node or raises an exception.

For Documents, the algorithm must return a string in the form of a document entity, if none of the error cases below apply.

For Elements, the algorithm must return a string in the form of an internal general parsed entity, if none of the error cases below apply.

In both cases, the string returned must be XML namespace-well-formed and must be an isomorphic serialization of all of that node's child nodes, in tree order. User agents may adjust prefixes and namespace declarations in the serialization (and indeed might be forced to do so in some cases to obtain namespace-well-formed XML). User agents may use a combination of regular text, character references, and CDATA sections to represent text nodes in the DOM (and indeed might be forced to use representations that don't match the DOM's, e.g. if a CDATASection node contains the string "]]>").

For Elements, if any of the elements in the serialization are in no namespace, the default namespace in scope for those elements must be explicitly declared as the empty string. (This doesn't apply in the Document case.) [XML] [XMLNS]

For the purposes of this section, an internal general parsed entity is considered XML namespace-well-formed if a document consisting of an element with no namespace declarations whose contents are the internal general parsed entity would itself be XML namespace-well-formed.

If any of the following error cases are found in the DOM subtree being serialized, then the algorithm must raise an INVALID_STATE_ERR exception instead of returning a string:

These are the only ways to make a DOM unserializable. The DOM enforces all the other XML constraints; for example, trying to append two elements to a Document node will raise a HIERARCHY_REQUEST_ERR exception.

9.4 XHTML 조각 파싱

The XML fragment parsing algorithm for either returns a Document or raises a SYNTAX_ERR exception. Given a string input and an optional context element context, the algorithm is as follows:

  1. Create a new XML parser.

  2. If there is a context element, feed the parser just created the string corresponding to the start tag of that element, declaring all the namespace prefixes that are in scope on that element in the DOM, as well as declaring the default namespace (if any) that is in scope on that element in the DOM.

    A namespace prefix is in scope if the DOM Core lookupNamespaceURI() method on the element would return a non-null value for that prefix.

    The default namespace is the namespace for which the DOM Core isDefaultNamespace() method on the element would return true.

    If there is a context element, no DOCTYPE is passed to the parser, and therefore no external subset is referenced, and therefore no entities will be recognized.

  3. Feed the parser just created the string input.

  4. If there is a context element, feed the parser just created the string corresponding to the end tag of that element.

  5. If there is an XML well-formedness or XML namespace well-formedness error, then raise a SYNTAX_ERR exception and abort these steps.

  6. If there is a context element, then return the child nodes of the root element of the resulting Document, in tree order.

    Otherwise, return the children of the Document object, in tree order.