From 1a06d7d3de3d2f30eaf3d56b7b2d0fa3446d46d8 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Tue, 18 Nov 2014 09:30:19 -0800 Subject: docs: Regenerated documentation Also fix documentation jam rules for Windows. --- docs/manual/loading.html | 260 +++++++++++++++++++++++++---------------------- 1 file changed, 139 insertions(+), 121 deletions(-) (limited to 'docs/manual/loading.html') diff --git a/docs/manual/loading.html b/docs/manual/loading.html index e18cde6..d302f73 100644 --- a/docs/manual/loading.html +++ b/docs/manual/loading.html @@ -4,15 +4,15 @@ Loading document - - + +
-pugixml 1.4 manual | +pugixml 1.5 manual | Overview | Installation | Document: @@ -28,16 +28,16 @@

pugixml provides several functions for loading XML data from various places @@ -49,26 +49,25 @@ EOL handling or attribute value normalization) can impact parsing speed and thus can be disabled. However for vast majority of XML documents there is no performance difference between different parsing options. Parsing options also - control whether certain XML nodes are parsed; see Parsing options for + control whether certain XML nodes are parsed; see Parsing options for more information.

- XML data is always converted to internal character format (see Unicode interface) + XML data is always converted to internal character format (see Unicode interface) before parsing. pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little endian), UTF-32 (big and little endian); UCS-2 is naturally supported since it's a strict subset of UTF-16) and handles all encoding conversions automatically. Unless explicit encoding is specified, loading functions perform automatic encoding detection based on first few characters of XML data, so in almost all cases you do not have to specify document encoding. Encoding - conversion is described in more detail in Encodings. + conversion is described in more detail in Encodings.

-

- The - most common source of XML data is files; pugixml provides dedicated functions +

+ The most common source of XML data is files; pugixml provides dedicated functions for loading an XML document from file:

xml_parse_result xml_document::load_file(const char* path, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
@@ -76,8 +75,8 @@
 

These functions accept the file path as its first argument, and also two - optional arguments, which specify parsing options (see Parsing options) - and input data encoding (see Encodings). The path has the target + optional arguments, which specify parsing options (see Parsing options) + and input data encoding (see Encodings). The path has the target operating system format, so it can be a relative or absolute one, it should have the delimiters of the target system, it should have the exact case if the target file system is case-sensitive, etc. @@ -95,12 +94,13 @@ The result of the operation is returned in an xml_parse_result object; this object contains the operation status and the related information (i.e. last successfully parsed position in the input file, if parsing fails). - See Handling parsing errors for error handling details. + See Handling parsing errors for error handling details.

This is an example of loading XML document from file (samples/load_file.cpp):

+

pugi::xml_document doc;
 
@@ -113,19 +113,19 @@
 
-

- Sometimes XML data should be - loaded from some other source than a file, i.e. HTTP URL; also you may want - to load XML data from file using non-standard functions, i.e. to use your - virtual file system facilities or to load XML from gzip-compressed files. - All these scenarios require loading document from memory. First you should - prepare a contiguous memory block with all XML data; then you have to invoke - one of buffer loading functions. These functions will handle the necessary - encoding conversions, if any, and then will parse the data into the corresponding - XML tree. There are several buffer loading functions, which differ in the - behavior and thus in performance/memory usage: +

+ Sometimes XML data should be loaded from some other source than a file, i.e. + HTTP URL; also you may want to load XML data from file using non-standard + functions, i.e. to use your virtual file system facilities or to load XML + from gzip-compressed files. All these scenarios require loading document + from memory. First you should prepare a contiguous memory block with all + XML data; then you have to invoke one of buffer loading functions. These + functions will handle the necessary encoding conversions, if any, and then + will parse the data into the corresponding XML tree. There are several buffer + loading functions, which differ in the behavior and thus in performance/memory + usage:

xml_parse_result xml_document::load_buffer(const void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
 xml_parse_result xml_document::load_buffer_inplace(void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
@@ -135,7 +135,7 @@
         All functions accept the buffer which is represented by a pointer to XML
         data, contents, and data
         size in bytes. Also there are two optional arguments, which specify parsing
-        options (see Parsing options) and input data encoding (see Encodings).
+        options (see  Parsing options) and input data encoding (see  Encodings).
         The buffer does not have to be zero-terminated.
       

@@ -163,12 +163,11 @@ is the recommended function if you have to load the document from memory and performance is critical.

-

- There is also a simple helper function - for cases when you want to load the XML document from null-terminated character - string: +

+ There is also a simple helper function for cases when you want to load the + XML document from null-terminated character string:

-
xml_parse_result xml_document::load(const char_t* contents, unsigned int options = parse_default);
+
xml_parse_result xml_document::load_string(const char_t* contents, unsigned int options = parse_default);
 

It is equivalent to calling load_buffer @@ -184,6 +183,7 @@ (samples/load_memory.cpp):

+

const char source[] = "<mesh name='sphere'><bounds>0 0 1 1</bounds></mesh>";
 size_t size = sizeof(source);
@@ -191,57 +191,61 @@
 

+

-
// You can use load_buffer to load document from immutable memory block:
-pugi::xml_parse_result result = doc.load_buffer(source, size);
+
// You can use load_buffer to load document from immutable memory block:
+pugi::xml_parse_result result = doc.load_buffer(source, size);
 

+

-
// You can use load_buffer_inplace to load document from mutable memory block; the block's lifetime must exceed that of document
-char* buffer = new char[size];
+
// You can use load_buffer_inplace to load document from mutable memory block; the block's lifetime must exceed that of document
+char* buffer = new char[size];
 memcpy(buffer, source, size);
 
-// The block can be allocated by any method; the block is modified during parsing
-pugi::xml_parse_result result = doc.load_buffer_inplace(buffer, size);
+// The block can be allocated by any method; the block is modified during parsing
+pugi::xml_parse_result result = doc.load_buffer_inplace(buffer, size);
 
-// You have to destroy the block yourself after the document is no longer used
-delete[] buffer;
+// You have to destroy the block yourself after the document is no longer used
+delete[] buffer;
 

+

-
// You can use load_buffer_inplace_own to load document from mutable memory block and to pass the ownership of this block
-// The block has to be allocated via pugixml allocation function - using i.e. operator new here is incorrect
-char* buffer = static_cast<char*>(pugi::get_memory_allocation_function()(size));
+
// You can use load_buffer_inplace_own to load document from mutable memory block and to pass the ownership of this block
+// The block has to be allocated via pugixml allocation function - using i.e. operator new here is incorrect
+char* buffer = static_cast<char*>(pugi::get_memory_allocation_function()(size));
 memcpy(buffer, source, size);
 
-// The block will be deleted by the document
-pugi::xml_parse_result result = doc.load_buffer_inplace_own(buffer, size);
+// The block will be deleted by the document
+pugi::xml_parse_result result = doc.load_buffer_inplace_own(buffer, size);
 

+

-
// You can use load to load document from null-terminated strings, for example literals:
-pugi::xml_parse_result result = doc.load("<mesh name='sphere'><bounds>0 0 1 1</bounds></mesh>");
+
// You can use load to load document from null-terminated strings, for example literals:
+pugi::xml_parse_result result = doc.load_string("<mesh name='sphere'><bounds>0 0 1 1</bounds></mesh>");
 

-

- To enhance interoperability, pugixml - provides functions for loading document from any object which implements - C++ std::istream interface. This allows you to load - documents from any standard C++ stream (i.e. file stream) or any third-party - compliant implementation (i.e. Boost Iostreams). There are two functions, - one works with narrow character streams, another handles wide character ones: +

+ To enhance interoperability, pugixml provides functions for loading document + from any object which implements C++ std::istream + interface. This allows you to load documents from any standard C++ stream + (i.e. file stream) or any third-party compliant implementation (i.e. Boost + Iostreams). There are two functions, one works with narrow character streams, + another handles wide character ones:

xml_parse_result xml_document::load(std::istream& stream, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
 xml_parse_result xml_document::load(std::wistream& stream, unsigned int options = parse_default);
@@ -271,6 +275,7 @@
         the sample code for more complex examples involving wide streams and locales:
       

+

std::ifstream stream("weekly-utf-8.xml");
 pugi::xml_parse_result result = doc.load(stream);
@@ -280,14 +285,12 @@
 
-

- All document loading functions return the - parsing result via xml_parse_result - object. It contains parsing status, the offset of last successfully parsed - character from the beginning of the source stream, and the encoding of the - source stream: +

+ All document loading functions return the parsing result via xml_parse_result object. It contains parsing + status, the offset of last successfully parsed character from the beginning + of the source stream, and the encoding of the source stream:

struct xml_parse_result
 {
@@ -299,9 +302,8 @@
     const char* description() const;
 };
 
-

- Parsing - status is represented as the xml_parse_status +

+ Parsing status is represented as the xml_parse_status enumeration and can be one of the following:

    @@ -309,6 +311,7 @@ status_ok means that no error was encountered during parsing; the source stream represents the valid XML document which was fully parsed and converted to a tree.

    +
  • status_file_not_found is only @@ -327,6 +330,7 @@
  • status_internal_error means that something went horribly wrong; currently this error does not occur

    +
  • status_unrecognized_tag means @@ -373,12 +377,14 @@ indicates an empty or invalid document
-

- description() member function can be used to convert - parsing status to a string; the returned message is always in English, so - you'll have to write your own function if you need a localized string. However - please note that the exact messages returned by description() function may change from version to version, - so any complex status handling should be based on status +

+ description() + member function can be used to convert parsing status to a string; the returned + message is always in English, so you'll have to write your own function if + you need a localized string. However please note that the exact messages + returned by description() + function may change from version to version, so any complex status handling + should be based on status value. Note that description() returns a char string even in PUGIXML_WCHAR_MODE; you'll have to call as_wide to get the wchar_t string. @@ -393,18 +399,16 @@ attribute attr will contain the string value>some data</node>.

-

- In addition to the status code, parsing - result has an offset member, - which contains the offset of last successfully parsed character if parsing - failed because of an error in source data; otherwise offset - is 0. For parsing efficiency reasons, pugixml does not track the current - line during parsing; this offset is in units of pugi::char_t - (bytes for character mode, wide characters for wide character mode). Many - text editors support 'Go To Position' feature - you can use it to locate - the exact error position. Alternatively, if you're loading the document from - memory, you can display the error chunk along with the error description - (see the example code below). +

+ In addition to the status code, parsing result has an offset + member, which contains the offset of last successfully parsed character if + parsing failed because of an error in source data; otherwise offset is 0. For parsing efficiency reasons, + pugixml does not track the current line during parsing; this offset is in + units of pugi::char_t (bytes for character + mode, wide characters for wide character mode). Many text editors support + 'Go To Position' feature - you can use it to locate the exact error position. + Alternatively, if you're loading the document from memory, you can display + the error chunk along with the error description (see the example code below).

@@ -417,17 +421,16 @@ track the error position.

-

- Parsing result also has an encoding member, which can be used to check - that the source data encoding was correctly guessed. It is equal to the exact - encoding used during parsing (i.e. with the exact endianness); see Encodings for - more information. -

-

- Parsing result object can be implicitly - converted to bool; if you do - not want to handle parsing errors thoroughly, you can just check the return - value of load functions as if it was a bool: +

+ Parsing result also has an encoding + member, which can be used to check that the source data encoding was correctly + guessed. It is equal to the exact encoding used during parsing (i.e. with + the exact endianness); see Encodings for more information. +

+

+ Parsing result object can be implicitly converted to bool; + if you do not want to handle parsing errors thoroughly, you can just check + the return value of load functions as if it was a bool: if (doc.load_file("file.xml")) { ... } else { ... }.

@@ -435,9 +438,10 @@ This is an example of handling loading errors (samples/load_error_handling.cpp):

+

pugi::xml_document doc;
-pugi::xml_parse_result result = doc.load(source);
+pugi::xml_parse_result result = doc.load_string(source);
 
 if (result)
     std::cout << "XML [" << source << "] parsed without errors, attr value: [" << doc.child("node").attribute("attr").value() << "]\n\n";
@@ -453,7 +457,7 @@
 

All document loading functions accept the optional parameter options. This is a bitmask that customizes @@ -485,12 +489,14 @@ document declaration (node with type node_declaration) is to be put in DOM tree. If this flag is off, it is not put in the tree, but is still parsed and checked for correctness. This flag is off by default.

+

  • parse_doctype determines if XML document type declaration (node with type node_doctype) is to be put in DOM tree. If this flag is off, it is not put in the tree, but is still parsed and checked for correctness. This flag is off by default.

    +
  • parse_pi determines if processing instructions @@ -498,18 +504,21 @@ in DOM tree. If this flag is off, they are not put in the tree, but are still parsed and checked for correctness. Note that <?xml ...?> (document declaration) is not considered to be a PI. This flag is off by default.

    +
  • parse_comments determines if comments (nodes with type node_comment) are to be put in DOM tree. If this flag is off, they are not put in the tree, but are still parsed and checked for correctness. This flag is off by default.

    +
  • parse_cdata determines if CDATA sections (nodes with type node_cdata) are to be put in DOM tree. If this flag is off, they are not put in the tree, but are still parsed and checked for correctness. This flag is on by default.

    +
  • parse_trim_pcdata determines if leading @@ -518,6 +527,7 @@ often the application only cares about the non-whitespace contents so it's easier to trim whitespace from text during parsing. This flag is off by default.

    +
  • parse_ws_pcdata determines if PCDATA @@ -536,6 +546,7 @@ one child when parse_ws_pcdata is not set. This flag is off by default.

    +
  • parse_ws_pcdata_single determines @@ -554,6 +565,7 @@ This flag has no effect if parse_ws_pcdata is enabled. This flag is off by default.

    +
  • parse_fragment determines if document @@ -598,6 +610,7 @@ ones). If character/entity reference can not be expanded, it is left as is, so you can do additional processing later. Reference expansion is performed on attribute values and PCDATA content. This flag is on by default.

    +
  • parse_eol determines if EOL handling (that @@ -607,6 +620,7 @@ be performed on input data (that is, comments contents, PCDATA/CDATA contents and attribute values). This flag is on by default.

    +
  • parse_wconv_attribute determines @@ -617,6 +631,7 @@ is set, i.e. \r\n is converted to a single space. This flag is on by default.

    +
  • parse_wnorm_attribute determines @@ -656,6 +671,7 @@ so theoretically it is the fastest mode. However, as mentioned above, in practice parse_default is usually equally fast.

    +
  • parse_default is the default set of flags, @@ -665,6 +681,7 @@ in attribute values and performing EOL handling. Note, that PCDATA sections consisting only of whitespace characters are not parsed (by default) for performance reasons.

    +
  • parse_full is the set of flags which adds @@ -681,23 +698,24 @@ This is an example of using different parsing options (samples/load_options.cpp):

    +

    const char* source = "<!--comment--><node>&lt;</node>";
     
    -// Parsing with default options; note that comment node is not added to the tree, and entity reference &lt; is expanded
    -doc.load(source);
    +// Parsing with default options; note that comment node is not added to the tree, and entity reference &lt; is expanded
    +doc.load_string(source);
     std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
     
    -// Parsing with additional parse_comments option; comment node is now added to the tree
    -doc.load(source, pugi::parse_default | pugi::parse_comments);
    +// Parsing with additional parse_comments option; comment node is now added to the tree
    +doc.load_string(source, pugi::parse_default | pugi::parse_comments);
     std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
     
    -// Parsing with additional parse_comments option and without the (default) parse_escapes option; &lt; is not expanded
    -doc.load(source, (pugi::parse_default | pugi::parse_comments) & ~pugi::parse_escapes);
    +// Parsing with additional parse_comments option and without the (default) parse_escapes option; &lt; is not expanded
    +doc.load_string(source, (pugi::parse_default | pugi::parse_comments) & ~pugi::parse_escapes);
     std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
     
    -// Parsing with minimal option mask; comment node is not added to the tree, and &lt; is not expanded
    -doc.load(source, pugi::parse_minimal);
    +// Parsing with minimal option mask; comment node is not added to the tree, and &lt; is not expanded
    +doc.load_string(source, pugi::parse_minimal);
     std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
     

    @@ -705,15 +723,14 @@

  • -

    - pugixml supports all popular Unicode encodings - (UTF-8, UTF-16 (big and little endian), UTF-32 (big and little endian); UCS-2 - is naturally supported since it's a strict subset of UTF-16) and handles - all encoding conversions. Most loading functions accept the optional parameter - encoding. This is a value - of enumeration type xml_encoding, +

    + pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little + endian), UTF-32 (big and little endian); UCS-2 is naturally supported since + it's a strict subset of UTF-16) and handles all encoding conversions. Most + loading functions accept the optional parameter encoding. + This is a value of enumeration type xml_encoding, that can have the following values:

      @@ -751,6 +768,7 @@
    • Otherwise encoding is assumed to be UTF-8.

      +
    @@ -822,7 +840,7 @@

    pugixml is not fully W3C conformant - it can load any valid XML document, @@ -879,7 +897,7 @@


    -pugixml 1.4 manual | +pugixml 1.5 manual | Overview | Installation | Document: -- cgit v1.2.3