WIP: Replace parameter tree parser with a new parser
It comes with two test suites: The old suite, which is available via
parametertreetest
and tests general parameter tree functionality, as well as a new test suite available via
iniparsertest
which is entirely dedicated to parsing.
The language that the parser accepts is:
- Each line either contains no content, a scope, or an assignment (not both). A line ends at the first newline character. There is one exception: An assignment with a quoted string may contain multiple newlines.
- Each of the above three may be followed by a comment, which starts with a
#
. It may not be followed by anything else. - A scope is made up of the character
[
, a prefix, and the character]
. The prefix is a (potentially zero-length) string made up of[a-zA-Z0-9._+- \t]
(that's what the parser sees. it's easy to restrict that further, but that can happen outside of the parser). Each of the three aforementioned tokens may be preceded or followed by an arbitrary amount of whitespace, i.e.[ \t]
. Leading and trailing whitespace are stripped from the prefix. - An assignment is made up of a key (a non-empty string made up of
[a-zA-Z0-9._+- \t]
; again, it would be easy to restrict this further from outside the parser), the character=
, and a right-hand side. Each of these three tokens may be preceded or followed by an arbitrary amount of whitespace. Leading and trailing whitespace are stripped from the key. The right-hand side can be either of the following three: A simple string, a single-quoted string, or a double-quoted string. - A simple string is a (potentially zero-length) sequence of characters of any character other than
[#'"\\]
. Leading and trailing whitespace are ignored. - A single-quoted string starts and ends with a single quote:
'
. Inside, any character is allowed and treated as literal, except for two characters:\
and'
. A single'
may not appear: It needs to be escaped like this:\'
. A single\
may not appear: it needs to be escaped like this:\\
. The special sequence\n
is treated as a newline character. Escaping any other character by preceding it with a backslash is an error. In addition to the special sequence\n
, a single-quoted string may contain actual newline characters. They are preserved. - Double-quoted strings behave exactly like single-quoted strings with
"
in place of'
. Thus, inside such a string,"
needs to be escaped but'
may not.
The main deviations from the current parser I see, are thus:
- You can now put comments everywhere. The
#
character is treated properly inside and outside of strings. - Multiline strings now reliably preserve newlines.
- The selection of admissible characters for a prefix is much narrower, maybe too narrow.
- The selection of admissible characters for a key is much narrower, maybe too narrow.
Merge request reports
Activity
mentioned in issue #54 (closed)
added 2 commits
mentioned in merge request !248 (closed)
added 8 commits
Toggle commit listadded 1 commit
- f8484a95 - Make whitespace part of the identifierWhitelist
added 2 commits
added 2 commits
added 21 commits
- e0646bde - Import a new ini parser
- e24b12d1 - Handle empty prefix
- 6168f634 - Fix error messages
- 403a3a59 - Clean up test
- e16e9d3f - Forbid a few evil characters in simple strings
- 8a47e99d - Cleanup
- 6cffd0df - Use *Whitelist/Blacklist names
- a77127cf - Add comments to tests
- 750d1d4c - Allow [+-] in names
- 0ce62b5c - Allow = in simple strings
- 618d133b - Add a comment
- db3e4e83 - Make tests emit diagnostic output
- 6bbb33fb - Allow whitespace in unquoted strings
- 04454da9 - Permit whitespace in prefixes
- 85ce7fca - Permit whitespace in keys
- 20e5bc6c - Replace unbounded lookahead by trimming (1/2)
- c3c0653a - Make whitespace part of the identifierWhitelist
- 6b30d182 - Replace unbounded lookahead by trimming (2/2)
- a8332f49 - Cleanup
- c2c22b4e - Bug fix: Detect empty key
- a1ad3a89 - Fully replace old parameter tree parser
Toggle commit listAlright, here is what I think:
-
This parser is ugly and hard to understand. But that seems to be the common theme among all hand-written parsers I have seen so far, so it does not really count as a negative point against this parser.
-
On the plus side, it does seem to support most of the things needed, and the syntax is pretty consistent.
What I haven't found time to do: look at the test cases, check that they cover a reasonable amount of the syntax, and in particular of the syntax errors that should produce exceptions. As I said, the parser is ugly, so this is of particular importance. @core Since I'm busy preparing the mailinglists for the move to Münster at the moment, maybe someone else can have a look?
-
@joe What's wrong asking Elias doing this? Or are you asking for someone cooperating with him?