pugixml.git - Mirror for https://github.com/zeux/pugixml

diff options

author	Arseny Kapoulkine <arseny.kapoulkine@gmail.com>	2015-03-18 08:34:23 -0700
committer	Arseny Kapoulkine <arseny.kapoulkine@gmail.com>	2015-03-18 09:59:17 -0700
commit	5f996eba6deaa804bf4caced8acc65d8626720d6 (patch)
tree	6f950e655956c17b657f1239ab0a9f655bf83c87 /contrib
parent	51da129b50a0b99ee85af20cc4a4b77f6bc823ff (diff)

Do not emit surrounding whitespace for text nodes

Previously we omitted extra whitespace for single PCDATA/CDATA children, but in mixed content there was extra indentation before/after text nodes. One of the problems with that is that the text that you saved is not exactly the same as the parsing result using default flags (parse_trim_pcdata helps). Another problem is that parse-format cycles do not have a fixed point for mixed content - the result expands indefinitely. Some XML libraries, like Python minidom, have the same issue, but this is definitely a problem. Pretty-printing mixed content is hard. It seems that the only other sensible choice is to switch mixed content nodes to raw formatting. In a way the code in this change is a weaker version of that - it removes indentation around text nodes but still keeps it around element siblings/children. Thus we can switch to mixed-raw formatting at some point later, which will be a superset of the current behavior. To do this we have to either switch at the first text node (.NET XmlDocument does that), or scan the children of each element for a possible text node and switch before we output the first child. The former behavior seems non-intuitive (and a bit broken); unfortunately, the latter behavior can cost up to 20% of the output time for trees *without* mixed content. Fixes #13.

Diffstat (limited to 'contrib')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: