From ef50ba81d403514b603dbff366d3e1e504f73717 Mon Sep 17 00:00:00 2001 From: "arseny.kapoulkine" Date: Mon, 6 Nov 2006 19:01:17 +0000 Subject: Updated documentation (email, name, license information, child_value and new eol flags, etc.) git-svn-id: http://pugixml.googlecode.com/svn/trunk@4 99668b35-9821-0410-8761-19e4c4f06640 --- docs/index.html | 163 +++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 109 insertions(+), 54 deletions(-) (limited to 'docs') diff --git a/docs/index.html b/docs/index.html index f56af74..06045cc 100644 --- a/docs/index.html +++ b/docs/index.html @@ -51,8 +51,8 @@ from scratch). The main features (call it USP) are:

times, Xerces-DOM - ~20 times 1
  • extremely high parsing speed (well, I'm repeating myself, but it's so fast, that it outperforms expat by 2 times on test XML) 2
  • -
  • more or less standard-conformant (it will parse any standard-compliant file correctly in w3c- -compliance mode, with the exception of DTD related issues and XML namespaces)
  • +
  • more or less standard-conformant (it will parse any standard-compliant file correctly in w3c-compliance +mode, with the exception of DTD related issues and XML namespaces)
  • pretty much error-ignorant (it will not choke on something like <text>You & Me</text>, like expat will; it will try to recover the state even if meeting an error (like finding matching tags for closing ones); it will parse files with data in wrong encoding; and so on)
  • @@ -170,7 +170,7 @@ the necessary amount of memory (equivalent to stream's size) and read everything
    
    -        void parse(std::istream& stream, unsigned int optmsk = parse_noset);
    _Winnie C++ Colorizer
    + void parse(std::istream& stream, unsigned int optmsk = parse_noset);
    This function will create a buffer with the size equal to that of provided stream, read the chunk of data from the stream and parse it with provided options (optmsk). The stream does not have to persist after the call to the function, the lifetime of internal buffer @@ -181,7 +181,7 @@ with stream's data is managed by pugixml.
    
             char* parse(char* xmlstr, unsigned int optmsk = parse_noset);
    -
    _Winnie C++ Colorizer
    +
    This function parses the provided string with provided options, and returns the position where the parsing stopped (do not expect, that parsing will stop on every error, or on most of them - as I've said, pugixml is error ignorant). The input string is modified. The string must persist for the @@ -190,13 +190,13 @@ lifetime of the parser.
     
    
    -        xml_parser(std::istream& stream, unsigned int optmsk = parse_default);
    _Winnie C++ Colorizer
    + xml_parser(std::istream& stream, unsigned int optmsk = parse_default);
    Just a convenience ctor, that calls the corresponding parse() function.
     
    
    -        xml_parser(char* xmlstr, unsigned int optmsk = parse_default);
    _Winnie C++ Colorizer
    + xml_parser(char* xmlstr, unsigned int optmsk = parse_default);
    Just a convenience ctor, that calls the corresponding parse() function.
    @@ -211,7 +211,7 @@ using the following functions:

    
             operator xml_node() const;
             xml_node document() const;
    -
    _Winnie C++ Colorizer
    +

    Ok, easy part behind - now let's dive into parsing options. There is a variety of them, and you must choose them wisely to get the needed results and the best speed/least memory overhead. At first, @@ -262,8 +262,9 @@ is performed for PCDATA content
    Default value: on
    In W3C mode: off

  • If parse_trim_attribute is on, then the trimming of leading/trailing space-like characters -is performed for attribute values -
    Default value: on +is performed for attribute values (this is non-standard behavior and is here only for compatibility +reasons (PugXML had this flag). +
    Default value: off
    In W3C mode: off
  • If parse_escapes_pcdata is on, then the character reference expansion is done for PCDATA content (replacing <lt; with <, &#4c; with L, etc.). @@ -287,10 +288,15 @@ values (this is a subset of whitespace normalization, and includes only replacin with spaces). If parse_wnorm_attribute is on, this flag has no effect.
    Default value: on
    In W3C mode: on
  • -
  • If parse_eol_cdata is on, then the end-of-line handling is done for CDATA content (this +
  • If parse_eol_pcdata is on, then the end-of-line handling is done for PCDATA content (this includes converting any pair of 0x0d 0x0a characters to a single 0x0a and converting any standalone -0x0d to 0x0a). Note, that end-of-line handling is done for all content (PCDATA, attribute values) -except CDATA sections (if this flag is off). +0x0d to 0x0a). +
    Default value: on +
    In W3C mode: on
  • +
  • If parse_eol_attribute is on, then the end-of-line handling is done for attribute values. +
    Default value: on +
    In W3C mode: on
  • +
  • If parse_eol_cdata is on, then the end-of-line handling is done for CDATA content.
    Default value: on
    In W3C mode: on
  • @@ -333,7 +339,7 @@ You can access the current options of parser by options() method:
    
             unsigned int options() const;
             unsigned int options(unsigned int optmsk);
    -
    _Winnie C++ Colorizer
    + (the latter one returns previous options). These options are used when parse_noset flag set is passed to parse() functions (which is the default value of corresponding parameter).

    @@ -369,14 +375,14 @@ strings like 'ell_23_xref', 'cell_0_x' or 'cell_0a_x'.

    /// Access iterators for this node's collection of siblings. iterator siblings_begin() const; iterator siblings_end() const; -_Winnie C++ Colorizer +

    Functions, returning the iterators to walk through children/siblings/attributes. More on that in Iterators section.

    
             operator unspecified_bool_type() const;
    -
    _Winnie C++ Colorizer
    +

    This is a safe bool-like conversion operator. You can check node's validity (if (xml_node), if (!xml_node), if (node1 && node2 && !node3 && cond1 && ...) - you get the idea) with @@ -390,13 +396,13 @@ it. bool operator>(const xml_node& r) const; bool operator<=(const xml_node& r) const; bool operator>=(const xml_node& r) const; -_Winnie C++ Colorizer +

    Comparison operators

    
             bool empty() const;
    -
    _Winnie C++ Colorizer
    +

    if (node.empty()) is equivalent to if (!node)

    @@ -404,7 +410,7 @@ it. xml_node_type type() const; const char* name() const; const char* value() const; -_Winnie C++ Colorizer +

    Access node's properties (type, name and value). If there is no name/value, the corresponding functions return "" - they never return NULL.

    @@ -412,7 +418,7 @@ return "" - they never return NULL.

    
             xml_node child(const char* name) const;
             xml_node child_w(const char* name) const;
    -
    _Winnie C++ Colorizer
    +

    Get a child node with specified name, or xml_node() (this is an invalid node) if nothing is found

    @@ -420,7 +426,7 @@ found

    
             xml_attribute attribute(const char* name) const;
             xml_attribute attribute_w(const char* name) const;
    -
    _Winnie C++ Colorizer
    +

    Get an attribute with specified name, or xml_attribute() (this is an invalid attribute) if nothing is found

    @@ -428,7 +434,7 @@ nothing is found

    
             xml_node sibling(const char* name) const;
             xml_node sibling_w(const char* name) const;
    -
    _Winnie C++ Colorizer
    +

    Get a node's sibling with specified name, or xml_node() if nothing is found.
    node.sibling(name) is equivalent to node.parent().child(name).

    @@ -437,7 +443,7 @@ nothing is found

    xml_node next_sibling(const char* name) const; xml_node next_sibling_w(const char* name) const; xml_node next_sibling() const; -_Winnie C++ Colorizer +

    These functions get the next sibling, that is, one of the siblings of that node, that is to the right. next_sibling() just returns the right brother of the node (or xml_node()), @@ -447,29 +453,41 @@ the two other functions are searching for the sibling with the given name

    xml_node previous_sibling(const char* name) const; xml_node previous_sibling_w(const char* name) const; xml_node previous_sibling() const; -_Winnie C++ Colorizer +

    These functions do exactly the same as next_sibling ones, with the exception that they search for the left siblings.

    
             xml_node parent() const;
    -
    _Winnie C++ Colorizer
    +

    Get a parent node. The parent node for the root one (the document) is considered to be the document itself.

    
             const char* child_value() const;
    -
    _Winnie C++ Colorizer
    +

    Look for the first node of type node_pcdata or node_cdata among the children of the current node and return its contents (or "" if nothing is found)

    +
    
    +    const char* child_value(const char* name) const;
    +
    + +

    This is the convenient way of looking into child's child value - that is, node.child_value(name) is equivalent to node.child(name).child_value().

    + +
    
    +    const char* child_value_w(const char* name) const;
    +
    + +

    This is the convenient way of looking into child's child value - that is, node.child_value_w(name) is equivalent to node.child_w(name).child_value().

    +
    
             xml_attribute first_attribute() const;
             xml_attribute last_attribute() const;
    -
    _Winnie C++ Colorizer
    +

    These functions get the first and last attributes of the node (or xml_attribute() if the node has no attributes).

    @@ -477,7 +495,7 @@ has no attributes).

    
             xml_node first_child() const;
             xml_node last_child() const;
    -
    _Winnie C++ Colorizer
    +

    These functions get the first and last children of the node (or xml_node() if the node has no children).

    @@ -485,7 +503,7 @@ no children).

    
             template <typename OutputIterator> void all_elements_by_name(const char* name, OutputIterator it) const;
             template <typename OutputIterator> void all_elements_by_name_w(const char* name, OutputIterator it) const;
    -
    _Winnie C++ Colorizer
    +

    Get all elements with the specified name in the subtree (depth-first search) and return them with the help of output iterator (i.e. std::back_inserter)

    @@ -494,7 +512,7 @@ the help of output iterator (i.e. std::back_inserter)

    template <typename Predicate> xml_attribute find_attribute(Predicate pred) const; template <typename Predicate> xml_node find_child(Predicate pred) const; template <typename Predicate> xml_node find_element(Predicate pred) const; -_Winnie C++ Colorizer +

    Find attribute, child or a node in the subtree (find_element - depth-first search) with the help of the given predicate. Predicate should behave like a function which accepts a xml_node or @@ -514,7 +532,7 @@ or xml_attribute() is returned.

    xml_node first_element_by_attribute(const char* attr_name, const char* attr_value) const; xml_node first_element_by_attribute_w(const char* attr_name, const char* attr_value) const; -_Winnie C++ Colorizer +

    Find the first node (depth-first search), which corresponds to the given criteria (i.e. either has a matching name, or a matching value, or has an attribute with given name/value, or has an attribute @@ -522,20 +540,20 @@ and has a matching name). Note that _w versions treat all parameters as w
    
             xml_node first_node(xml_node_type type) const;
    -
    _Winnie C++ Colorizer
    +

    Return a first node (depth-first search) with a given type, or xml_node().

    
             std::string path(char delimiter = '/') const;
    -
    _Winnie C++ Colorizer
    +

    Get a path of the node (i.e. the string of names of the nodes on the path from the DOM tree root to the node, separated with delimiter (/ by default).

    
             xml_node first_element_by_path(const char* path, char delimiter = '/') const;
    -
    _Winnie C++ Colorizer
    +

    Get the first element that has the following path. The path can be absolute (beginning with delimiter) or relative, '..' means 'up-level' (so if we are at the path mesh/fragment/geometry/stream, ../.. @@ -543,7 +561,7 @@ will lead us to mesh/fragment, and /mesh will lead us to mesh

    
             bool traverse(xml_tree_walker& walker) const;
    -
    _Winnie C++ Colorizer +

    Traverse the subtree (beginning with current node) with the walker, return the result. See Miscellaneous section for details.

    @@ -560,19 +578,19 @@ will lead us to mesh/fragment, and /mesh will lead us to meshbool operator>(const xml_attribute& r) const; bool operator<=(const xml_attribute& r) const; bool operator>=(const xml_attribute& r) const; -_Winnie C++ Colorizer +

    Comparison operators.

    
             operator unspecified_bool_type() const;
    -
    _Winnie C++ Colorizer
    +

    Safe bool conversion - like in xml_node, use this to check for validity.

    
             bool empty() const;
    -
    _Winnie C++ Colorizer
    +

    Like with xml_node, if (attr.empty()) is equivalent to if (!attr).

    @@ -580,7 +598,7 @@ will lead us to mesh/fragment, and /mesh will lead us to mesh
    
             xml_attribute next_attribute() const;
             xml_attribute previous_attribute() const;
    -
    _Winnie C++ Colorizer +

    Get the next/previous attribute of the node, that owns the current attribute. Return xml_attribute() if no such attribute is found.

    @@ -588,7 +606,7 @@ if no such attribute is found.

    
             const char* name() const;
             const char* value() const;
    -
    _Winnie C++ Colorizer
    +

    Get the name and value of the attribute. These methods never return NULL - they return "" instead.

    @@ -596,14 +614,14 @@ if no such attribute is found.

    int as_int() const; double as_double() const; float as_float() const; -_Winnie C++ Colorizer +

    Convert the value of an attribute to the desired type. If the conversion is not successfull, return default value (0 for int, 0.0 for double, 0.0f for float). These functions rely on CRT functions ato*.

    
             bool as_bool() const;
    -
    _Winnie C++ Colorizer
    +

    Convert the value of an attribute to bool. This method returns true if the first character of the value is '1', 't', 'T', 'y' or 'Y'. Otherwise it returns false.

    @@ -629,7 +647,7 @@ do something like: xml_attribute last_attrib = *(--node.attributes_end()); ... } -_Winnie C++ Colorizer +

    @@ -642,7 +660,7 @@ do something like:
    
             virtual bool begin(const xml_node&);
             virtual bool end(const xml_node&);
    -
    _Winnie C++ Colorizer
    +

    These functions are called when the processing of the node starts/ends. First begin() is called, then all children of the node are processed recursively, then end() is called. If @@ -652,14 +670,14 @@ returns false.

    
             virtual void push();
             virtual void pop();
    -
    _Winnie C++ Colorizer
    +

    These functions are called before and after the processing of node's children. If node has no children, none of these is called. The default behavior is to increment/decrement current node depth.

    
             virtual int depth() const;
    -
    _Winnie C++ Colorizer
    +

    Get the current depth. You can use this function to do your own indentation, for example.

    @@ -667,7 +685,7 @@ none of these is called. The default behavior is to increment/decrement current
    
             bool value = node.child("stream").attribute("compress").as_bool();
    -
    _Winnie C++ Colorizer
    + If node has a child with the name 'geometry', and this child has an attribute 'compress', than everything is ok. If node has a child with the name 'geometry' with no attribute 'compress', then attribute("compress") @@ -768,7 +786,7 @@ it (name, value, attributes list, nearby nodes in a tree - siblings, parent and } } } -
    _Winnie C++ Colorizer +

    We can also write a class that will traverse the DOM tree and store the information from nodes based on their names, depths, attributes, etc. This way is well known by the users of SAX parsers. To do that, @@ -830,7 +848,7 @@ we have to write an implementation of xml_tree_walker interface

    if (!parser.document().traverse(mp)) // generate an error } -_Winnie C++ Colorizer +
    @@ -921,7 +939,7 @@ parsers already.

    FAQ

    -

    I'm always open for questions; feel free to write them to zeux@mathcentre.com. +

    I'm always open for questions; feel free to write them to arseny.kapoulkine@gmail.com.


    @@ -929,12 +947,15 @@ parsers already.

    Bugs

    -

    I'm always open for bug reports; feel free to write them to zeux@mathcentre.com. +

    I'm always open for bug reports; feel free to write them to arseny.kapoulkine@gmail.com. Please provide as much information as possible - version of pugixml, compiling and OS environment (compiler and it's version, STL version, OS version, etc.), the description of the situation in which the bug arises, the code and data files that show the bug, etc. - the more, the better. Though, please, do not send executable files.

    +

    Note, that you can also submit bug reports/suggestions at +project page. +


    @@ -952,7 +973,7 @@ if necessary) changes)
  • Externally provided entity reference table (or perhaps even taken from DOCTYPE?)
  • More intelligent parsing of DOCTYPE (it does not always skip DOCTYPE for now) -
  • XML 1.1 changes (changed EOL handling, normalization issues, +
  • XML 1.1 changes (changed EOL handling, normalization issues, etc.)
  • XPath support
  • Name your own? @@ -966,6 +987,15 @@ changes)
    15.07.2006 - v0.1
    First private release for testing purposes +
    6.11.2006 - v0.2 +
    First public release. Changes:
      +
    • Introduced child_value(name) and child_value_w(name) +
    • Fixed child_value() (for empty nodes) +
    • Fixed xml_parser_impl warning at W4 +
    • parse_eol_pcdata and parse_eol_attribute flags + parse_minimal optimizations +
    • Optimizations of strconv_t +
    +
    @@ -983,11 +1013,36 @@ changes)

    License

    -

    The pugixml parser is released into the public domain (though this may change).

    +

    The pugixml parser is distributed under the MIT license:

    + +
    +Copyright (c) 2006 Arseny Kapoulkine
    +
    +Permission is hereby granted, free of charge, to any person
    +obtaining a copy of this software and associated documentation
    +files (the "Software"), to deal in the Software without
    +restriction, including without limitation the rights to use,
    +copy, modify, merge, publish, distribute, sublicense, and/or sell
    +copies of the Software, and to permit persons to whom the
    +Software is furnished to do so, subject to the following
    +conditions:
    +
    +The above copyright notice and this permission notice shall be
    +included in all copies or substantial portions of the Software.
    +
    +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
    +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
    +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
    +OTHER DEALINGS IN THE SOFTWARE.
    +

    -

    Revised 15 July, 2006

    -

    © Copyright Zeux 2006. All Rights Reserved.

    +

    Revised 6 November, 2006

    +

    © Copyright Arseny Kapoulkine 2006. All Rights Reserved.

    -- cgit v1.2.3