Parser Combinator Experiments in Rust - Part 3: Performance and impl Trait

07 Sep 2015

Performance update

Geoffroy Couprie, the creator of Nom, suggested to me earlier that I should replace the is_token implementation from the previous performance tests since it is pretty slow. The previous version was a pretty simple one-liner with some significant overhead since it was written to be easy to read and to be comparable to the notInClass function from the Haskell parsers in terms of readability:

fn is_token(c: u8) -> bool {
    c < 128 && c > 31 && b"()<>@,;:\\\"/[]?={} \t".iter()
                           .position(|&i| i == c).is_none()
}

The version above is performing a naive linear search, which is far from the most efficient method of determining if a character is in the given set. Attoparsec actually creates a binary search tree, which is much faster. Here is the optimized version Geoffroy suggested:

fn is_token(c: u8) -> bool {
    // roughly follows the order of ascii chars: "\"(),/:;<=>?@[\\]{} \t"
    c < 128 && c > 32 && c != b'\t' && c != b'"' && c != b'(' && c != b')' &&
        c != b',' && c != b'/' && !(c > 57 && c < 65) && !(c > 90 && c < 94) &&
        c != b'{' && c != b'}'
}

This is of course much harder to read but gives a decent performance boost. Since it affects all Rust parser-combinators I have replaced their is_token implementations with the optimized version from above:

Parser	Time, 21 kB	Time, 2 MB	Time, 204 MB
C http-parser	0.003 s	0.009 s	0.62 s
Experiment¹	0.003 s	0.009 s	0.71 s
Nom	0.003 s	0.013 s	1.01 s
Attoparsec	0.004 s	0.021 s	1.45 s
Combine²	0.004 s	0.067 s	6.10 s
Parsec³	0.009 s	0.490 s	47.75 s

1: Library code from last post, with verbose errors enabled.

2: Combine is now stable. Sadly it does not include the Ranged Stream pull-request which was used in the Combine-pre benchmark in the last post, this means that the parser tested here is not a zero-copy parser and should be compared with the Combine-beta from the last post.

3: Uses the exact same parser-code as the Attoparsec benchmark.

Experimental `impl Trait`

There is an experimental branch maintained by eddyb which implements the now closed RFC PR #105. This implementation allows for unboxed abstract return types through the use of the impl keyword in a function return type.

The interesting thing with this in the context of monadic parser-combinator is that it is suddenly no longer required to box closures whenever any are returned. This avoids a lot of allocations as well as enables for more optimizations at the same time it makes it somewhat easier to manage the types.

Monadic parser combinator

The basics of a monadic parser-combinator is to use the combinator functions to create new functions which are to be executed later once a parsing context exists (ie. some input to parse). This means that the monadic type Parser<T> in simplified terms is Fn(&[u8]) -> Option<(T, &[u8])>, where u8 is the input type, and the combinators themselves as well as the functions which create parsers in the Parser<T> monad are of the type Fn(...) -> (Fn(&[u8]) -> Option<(T, &[u8])>).

In the manual threading-of-state the input is present from the start and the functions can immediately operate on the input: Fn(&[u8], ...) -> Option<(T, &[u8])>. This is not as composable as the purely monadic parser-combinator since the state always needs to be considered during combination and not only when building the combinators and primitive parsers.

The unboxed `Parser<T>`

The boxed example from the first article of this series is using the (simplified) type Box<FnBox(&[u8]) -> Option<(T, &[u8])>> which is heap-allocated. But with the experimental branch we can use the impl Trait notation to make this closure stack-allocated: impl Fn(&[u8]) -> Option<(T, &[u8])>, of course this is not something we want to copy all over the place, so we implement a trait which is implemented for Fn(&[u8]) -> Option<(T, &[u8])>:

pub trait Parser<'a, T> {
    fn parse(self, &'a [u8]) -> Option<(T, &'a [u8])>;
}

impl<'a, T> Parser<'a, T> for F
  where F: FnOnce(&'a [u8]) -> Option<(T, &'a [u8])> {
    fn parse(self, i: &'a [u8]) -> Option<(T, &'a [u8])> {
        self(i)
    }
}

This is a very simplified signature which does not support any error handling or incomplete input, or different input types. The 'a lifetime is exposed through the Parser<'a, T> type to allow zero-copy parsers to return Parser<'a, &'a [u8]>, typically this 'a will be free unless constructions like this are needed.

The above allows us to write:

struct MyData<'a> {
    name:      &'a [u8],
    last_name: &'a [u8],
}

fn my_parser<'a>() -> impl Parser<'a, MyStruct<'a>> {
    bind(take_till(|c| c == b' '), |name|
        bind(char(b' '), |_|
            bind(take_till(|c| == b'\n'), |last_name|
                MyData{
                    name:      name,
                    last_name: last_name,
                })))
}

fn main() {
    let buf = b"Martin Wernstål\n";

    let parser = my_parser();

    println!("{:?}", parser.parse(buf));
    // MyData{name: &b"Martin", last_name: &b"Wernstål"}
}

Performance

After making some very suspect (and test breaking) changes to eddyb’s code I finally got the code to compile. This also includes avoiding any kind of linking since metadata is broken for impl Trait at the moment of writing. (Diff: Fixed a probable typo, added monomorphize at some spots and finally removed the check for has_escaping_regions specifically for impl Trait syntax.)

The same files were used as in the last test:

Parser	Time, 21 kB	Time, 2 MB	Time, 204 MB
[slow `is_token`]	0.003 s	0.014 s	1.16 s
fast `is_token`	0.003 s	0.010 s	0.80 s

This is actually a very competitive result, seeing as the hand-written C-parser clocks in at 0.6 seconds, and neither the parser combinator nor the specific modifications to rustc have been optimized. The code is used as is and only --release was supplied to cargo when compiling the binary. To put it in perspective; it is only 30% slower than the hand-written C-parser which uses switch-case-goto and 15% slower than the manual-threading of state variant. But it is 25% faster than Nom and 80% faster compared to Attoparsec.

These are really promising results and I am hoping that Rust can land this feature in the near future, unboxed abstract returns and higher kinded types would be amazing tools to have in an already great language.

EDIT: Published on reddit: /r/rust