Crate chomp [] [src]

Chomp is a fast monadic-style parser combinator library for the Rust programming language. It was written as the culmination of the experiments detailed in these blog posts:

For its current capabilities, you will find that Chomp performs consistently as well, if not better, than optimized C parsers, while being vastly more expressive. For an example that builds a performant HTTP parser out of smaller parsers, see http_parser.rs.

Example

use chomp::prelude::*;

#[derive(Debug, Eq, PartialEq)]
struct Name<B: Buffer> {
    first: B,
    last:  B,
}

fn name<I: U8Input>(i: I) -> SimpleResult<I, Name<I::Buffer>> {
    parse!{i;
        let first = take_while1(|c| c != b' ');
                    token(b' ');  // skipping this char
        let last  = take_while1(|c| c != b'\n');

        ret Name{
            first: first,
            last:  last,
        }
    }
}

assert_eq!(parse_only(name, "Martin Wernstål\n".as_bytes()), Ok(Name{
    first: &b"Martin"[..],
    last: "Wernstål".as_bytes()
}));

Usage

Chomp's functionality is split between three modules:

A parser is, at its simplest, a function that takes a slice of input and returns a ParserResult<I, T, E>, where I, T, and E are the input, output, and error types, respectively. Parsers are usually parameterized over values or other parsers as well, so these appear as extra arguments in the parsing function. As an example, here is the signature of the token parser, which matches a particular input.

fn token<I: Input>(i: I, t: I::Token) -> ParseResult<I, I::Token, Error<I::Token>> { ... }

Notice that the first argument is an Input<I>, and the second argument is some I. Input<I> is just a datatype over the current state of the parser and a slice of input I, and prevents the parser writer from accidentally mutating the state of the parser. Later, when we introduce the parse! macro, we will see that using a parser in this macro just means supplying all of the arguments but the input, as so:

token(b'T');

Note that you cannot do this outside of the parse! macro. SimpleResult<I, T> is a convenience type alias over ParseResult<I, T, Error<u8>>, and Error<I> is just a convenient "default" error type that will be sufficient for most uses. For more sophisticated usage, one can always write a custom error type.

A very useful parser is the satisfy parser:

fn satisfy<I: Input, F>(mut i: I, f: F) -> ParseResult<I, I::Token, Error<I::Token>>
  where F: FnOnce(I::Token) -> bool { ... }

Besides the input state, satisfy's only parameter is a predicate function and will succeed only if the next piece of input satisfies the supplied predicate. Here's an example that might be used in the parse! macro:

satisfy(|c| {
    match c {
        b'c' | b'h' | b'a' | b'r' => true,
        _ => false,
    }
})

This parser will only succeed if the character is one of the characters in "char".

Lastly, here is the parser combinator count, which will attempt to run a parser a number of times on its input.

pub fn count<I: Input, T, E, F, U>(i: I, num: usize, p: F) -> ParseResult<I, T, E>
  where F: FnMut(I) -> ParseResult<I, U, E>,
        T: FromIterator<U> { ... }

Using parsers is almost entirely done using the parse! macro, which enables us to do three distinct things:

In other words, just as a normal Rust function usually looks something like this:

fn f() -> (u8, u8, u8) {
    let a = read_number();
    let b = read_number();
    launch_missiles();
    return (a, b, a + b);
}

A Chomp parser with a similar structure looks like this:

fn f<I: U8Input>(i: I) -> SimpleResult<I, (u8, u8, u8)> {
    parse!{i;
        let a = digit();
        let b = digit();
                string(b"missiles");
        ret (a, b, a + b)
    }
}

fn digit<I: U8Input>(i: I) -> SimpleResult<I, u8> {
    satisfy(i, |c| b'0' <= c && c <= b'9').map(|c| c - b'0')
}

Readers familiar with Haskell or F# will recognize this as a "monadic computation" or "computation expression".

You use the parse! macro as follows:

The entire grammar for the macro is listed elsewhere in this documentation.

Features

Modules

ascii

Utilities and parsers for dealing with ASCII data in u8 format.

buffer

Utilities for parsing streams of data.

combinators

Basic combinators.

parsers

Basic parsers.

prelude

Basic prelude.

primitives

Module used to construct fundamental parsers and combinators.

types

Types which facillitates the chaining of parsers and their results.

Macros

parse

Macro emulating do-notation for the parser monad, automatically threading the linear type.

parser

Macro wrapping an invocation to parse! in a closure, useful for creating parsers inline.

Functions

parse_only

Runs the given parser on the supplied finite input.

parse_only_str

Runs the given parser on the supplied string.

run_parser

Runs the supplied parser over the input.