Skip to content

WIP: Replace parameter tree parser with a new parser

Elias Pipping requested to merge feature/new-ini-parser into master

It comes with two test suites: The old suite, which is available via

  parametertreetest

and tests general parameter tree functionality, as well as a new test suite available via

  iniparsertest

which is entirely dedicated to parsing.

The language that the parser accepts is:

  • Each line either contains no content, a scope, or an assignment (not both). A line ends at the first newline character. There is one exception: An assignment with a quoted string may contain multiple newlines.
  • Each of the above three may be followed by a comment, which starts with a #. It may not be followed by anything else.
  • A scope is made up of the character [, a prefix, and the character ]. The prefix is a (potentially zero-length) string made up of [a-zA-Z0-9._+- \t] (that's what the parser sees. it's easy to restrict that further, but that can happen outside of the parser). Each of the three aforementioned tokens may be preceded or followed by an arbitrary amount of whitespace, i.e. [ \t]. Leading and trailing whitespace are stripped from the prefix.
  • An assignment is made up of a key (a non-empty string made up of [a-zA-Z0-9._+- \t]; again, it would be easy to restrict this further from outside the parser), the character =, and a right-hand side. Each of these three tokens may be preceded or followed by an arbitrary amount of whitespace. Leading and trailing whitespace are stripped from the key. The right-hand side can be either of the following three: A simple string, a single-quoted string, or a double-quoted string.
  • A simple string is a (potentially zero-length) sequence of characters of any character other than [#'"\\]. Leading and trailing whitespace are ignored.
  • A single-quoted string starts and ends with a single quote: '. Inside, any character is allowed and treated as literal, except for two characters: \ and '. A single ' may not appear: It needs to be escaped like this: \'. A single \ may not appear: it needs to be escaped like this: \\. The special sequence \n is treated as a newline character. Escaping any other character by preceding it with a backslash is an error. In addition to the special sequence \n, a single-quoted string may contain actual newline characters. They are preserved.
  • Double-quoted strings behave exactly like single-quoted strings with " in place of '. Thus, inside such a string, " needs to be escaped but ' may not.

The main deviations from the current parser I see, are thus:

  • You can now put comments everywhere. The # character is treated properly inside and outside of strings.
  • Multiline strings now reliably preserve newlines.
  • The selection of admissible characters for a prefix is much narrower, maybe too narrow.
  • The selection of admissible characters for a key is much narrower, maybe too narrow.
Edited by Christian Engwer

Merge request reports