diff options
author | Arseny Kapoulkine <arseny.kapoulkine@gmail.com> | 2017-08-29 20:46:30 -0700 |
---|---|---|
committer | Arseny Kapoulkine <arseny.kapoulkine@gmail.com> | 2017-08-29 20:46:30 -0700 |
commit | 900a1cc94353b9202dcaee66b95d67e31331940e (patch) | |
tree | ef31da1cfd0ca8175dca90aaf2cd1fbe2b7c861a /docs/manual.adoc | |
parent | 4f2ad720c867f29f3156b953eadfe9be5efb511a (diff) |
docs: Clarify Unicode validation behavior
It has always been the case that pugixml does not perform Unicode
validation or name/tag Unicode character class validation, but it wasn't
very obvious from documentation.
Fixes #162
Diffstat (limited to 'docs/manual.adoc')
-rw-r--r-- | docs/manual.adoc | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/docs/manual.adoc b/docs/manual.adoc index 7f4fc8b..b901a54 100644 --- a/docs/manual.adoc +++ b/docs/manual.adoc @@ -811,12 +811,13 @@ There is only one non-conformant behavior when dealing with valid XML documents: As for rejecting invalid XML documents, there are a number of incompatibilities with W3C specification, including: * Multiple attributes of the same node can have equal names. -* All non-ASCII characters are treated in the same way as symbols of English alphabet, so some invalid tag names are not rejected. +* Tag and attribute names are not fully validated for consisting of allowed characters, so some invalid tags are not rejected * Attribute values which contain `<` are not rejected. * Invalid entity/character references are not rejected and are instead left as is. * Comment values can contain `--`. * XML data is not required to begin with document declaration; additionally, document declaration can appear after comments and other nodes. * Invalid document type declarations are silently ignored in some cases. +* Unicode validation is not performed so invalid UTF sequences are not rejected. [[access]] == Accessing document data |