docs: Clarify Unicode validation behavior

It has always been the case that pugixml does not perform Unicode validation or name/tag Unicode character class validation, but it wasn't very obvious from documentation. Fixes #162
author: Arseny Kapoulkine <arseny.kapoulkine@gmail.com> 2017-08-29 20:46:30 -0700
committer: Arseny Kapoulkine <arseny.kapoulkine@gmail.com> 2017-08-29 20:46:30 -0700
commit: 900a1cc94353b9202dcaee66b95d67e31331940e (patch)
tree: ef31da1cfd0ca8175dca90aaf2cd1fbe2b7c861a /docs
parent: 4f2ad720c867f29f3156b953eadfe9be5efb511a (diff)
2 files changed, 7 insertions, 3 deletions
diff --git a/docs/manual.adoc b/docs/manual.adoc
index 7f4fc8b..b901a54 100644
--- a/docs/manual.adoc
+++ b/docs/manual.adoc
@@ -811,12 +811,13 @@ There is only one non-conformant behavior when dealing with valid XML documents:
 As for rejecting invalid XML documents, there are a number of incompatibilities with W3C specification, including:
 
 * Multiple attributes of the same node can have equal names.
-* All non-ASCII characters are treated in the same way as symbols of English alphabet, so some invalid tag names are not rejected.
+* Tag and attribute names are not fully validated for consisting of allowed characters, so some invalid tags are not rejected
 * Attribute values which contain `<` are not rejected.
 * Invalid entity/character references are not rejected and are instead left as is.
 * Comment values can contain `--`.
 * XML data is not required to begin with document declaration; additionally, document declaration can appear after comments and other nodes.
 * Invalid document type declarations are silently ignored in some cases.
+* Unicode validation is not performed so invalid UTF sequences are not rejected.
 
 [[access]]
 == Accessing document data
diff --git a/docs/manual.html b/docs/manual.html
index 627f570..1bed481 100644
--- a/docs/manual.html
+++ b/docs/manual.html
@@ -1941,7 +1941,7 @@ The current behavior for Unicode conversion is to skip all invalid UTF sequences
 <p>Multiple attributes of the same node can have equal names.</p>
 </li>
 <li>
-<p>All non-ASCII characters are treated in the same way as symbols of English alphabet, so some invalid tag names are not rejected.</p>
+<p>Tag and attribute names are not fully validated for consisting of allowed characters, so some invalid tags are not rejected</p>
 </li>
 <li>
 <p>Attribute values which contain <code>&lt;</code> are not rejected.</p>
@@ -1958,6 +1958,9 @@ The current behavior for Unicode conversion is to skip all invalid UTF sequences
 <li>
 <p>Invalid document type declarations are silently ignored in some cases.</p>
 </li>
+<li>
+<p>Unicode validation is not performed so invalid UTF sequences are not rejected.</p>
+</li>
 </ul>
 </div>
 </div>
@@ -5672,7 +5675,7 @@ If exceptions are disabled, then in the event of parsing failure the query is in
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2017-08-21 08:46:53 DST
+Last updated 2017-08-29 20:45:58 DST
 </div>
 </div>
 </body>
author	Arseny Kapoulkine <arseny.kapoulkine@gmail.com>	2017-08-29 20:46:30 -0700
committer	Arseny Kapoulkine <arseny.kapoulkine@gmail.com>	2017-08-29 20:46:30 -0700
commit	900a1cc94353b9202dcaee66b95d67e31331940e (patch)
tree	ef31da1cfd0ca8175dca90aaf2cd1fbe2b7c861a /docs
parent	4f2ad720c867f29f3156b953eadfe9be5efb511a (diff)