Macro chomp::parse [−] [src]

macro_rules! parse {
    ( $($t:tt)* ) => { ... };
}

Macro emulating do-notation for the parser monad, automatically threading the linear type.

parse!{input;
                parser("parameter");
    let value = other_parser();

    ret do_something(value)
}
// is equivalent to:
parser(input, "parameter").bind(|i, _|
    other_parser(i).bind(|i, value|
        i.ret(do_something(value))))

Examples

Parsing into a struct using the basic provided parsers:

use chomp::prelude::{Buffer, Error, Input, ParseResult, parse_only, take_while1, token};

#[derive(Debug, Eq, PartialEq)]
struct Name<B: Buffer> {
    first: B,
    last:  B,
}

fn parser<I: Input<Token=u8>>(i: I) -> ParseResult<I, Name<I::Buffer>, Error<I::Token>> {
    parse!{i;
        let first = take_while1(|c| c != b' ');
                    token(b' ');
        let last  = take_while1(|c| c != b'\n');

        ret @ _, Error<u8>: Name{
            first: first,
            last:  last,
        }
    }
}

assert_eq!(parse_only(parser, "Martin Wernstål\n".as_bytes()), Ok(Name{
    first: &b"Martin"[..],
    last: "Wernstål".as_bytes()
}));

Parsing an IP-address with a string-prefix and terminated with semicolon using the <* (skip) operator to make it more succint:

use chomp::prelude::{U8Input, SimpleResult, parse_only, string, token};
use chomp::ascii::decimal;

fn parse_ip<I: U8Input>(i: I) -> SimpleResult<I, (u8, u8, u8, u8)> {
    parse!{i;
                string(b"ip:");
        let a = decimal() <* token(b'.');
        let b = decimal() <* token(b'.');
        let c = decimal() <* token(b'.');
        let d = decimal();
                token(b';');
        ret (a, b, c, d)
    }
}

assert_eq!(parse_only(parse_ip, b"ip:192.168.0.1;"), Ok((192, 168, 0, 1)));

Parsing a log-level using the <|> alternation (or) operator:

use chomp::prelude::{parse_only, string};

#[derive(Debug, Eq, PartialEq)]
enum Log {
    Error,
    Warning,
    Info,
    Debug,
};

let level        = |i, b, r| string(i, b).map(|_| r);
let log_severity = parser!{
    level(b"ERROR", Log::Error)   <|>
    level(b"WARN",  Log::Warning) <|>
    level(b"INFO",  Log::Info)    <|>
    level(b"DEBUG", Log::Debug)
};

assert_eq!(parse_only(log_severity, b"INFO"), Ok(Log::Info));

Grammar

EBNF using $ty, $expr, $ident and $pat for the equivalent Rust macro patterns.

Block     ::= Statement* Expr
Statement ::= Bind ';'
            | Expr ';'
Bind      ::= 'let' Var '=' Expr
Var       ::= $pat
            | $ident ':' $ty

/* Expr is split this way to allow for operator precedence */
Expr      ::= ExprAlt
            | ExprAlt   ">>" Expr
ExprAlt   ::= ExprSkip
            | ExprSkip "<|>" ExprAlt
ExprSkip  ::= Term
            | Term     "<*" ExprSkip

Term      ::= Ret
            | Err
            | '(' Expr ')'
            | Inline
            | Named

Ret       ::= "ret" Typed
            | "ret" $expr
Err       ::= "err" Typed
            | "err" $expr
Typed     ::= '@' $ty ',' $ty ':' $expr
Inline    ::= $ident "->" $expr
Named     ::= $ident '(' ($expr ',')* (',')* ')'

Statement

A statement is a line ending in a semicolon. This must be followed by either another statement or by an expression which ends the block.

parse!{i;
    token(b':');
    let n: u32 = decimal();
    ret n * 2
}

A bind statement uses a let-binding to bind a value of a parser-expression within the parsing context. The expression to the right of the equal-sign will be evaluated and if the parser is still in a success state the value will be bound to the pattern following let.

The patter can either just be an identifier but it can also be any irrefutable match-pattern, types can also be declared with identifier: type when necessary (eg. declare integer type used with the decimal parser).

Action

An action is any parser-expression, ended with a semicolon. This will be executed and its result will be discarded before proceeding to the next statement or the ending expression. Any error will exit early and will be propagated.

Expression

A parser expression can either be the only part of a parse! macro (eg. for alternating as seen above) or it can be a part of a bind or action statement or it is the final result of a parse-block.

Named

A named action is like a function call, but will be expanded to include the parsing context (Input) as the first parameter. The syntax is currently limited to a rust identifier followed by a parameter list within parentheses. The parentheses are mandatory.

fn do_it<'a, I: U8Input>(i: I, s: &'a str) -> SimpleResult<I, &'a str> { i.ret(s) }

parse!{i;
    do_it("second parameter")
}

Ret and Err

Many times you want to move a value into the parser monad, eg. to return a result or report an error. The ret and err keywords provide this functionality inside of parse!-expressions.

let r: Result<_, (_, ())> = parse_only(
    parser!{ ret "some success data" },
    b"input data"
);

assert_eq!(r, Ok("some success data"));

In the example above the Result<_, (_, ())> type-annotation is required since ret leaves the error type E free which means that the parser! expression above cannot infer the error type without the annotation. ret and end both provide a mechanism to supply this information inline:

let r = parse_only(parser!{ err @ u32, _: "some error data" }, b"input data");

assert_eq!(r, Err((&b"input data"[..], "some error data")));

Note that we only declare the success type (u32 above) and leave out the type of the error (by using _) since that can be uniquely inferred.

Inline

An inline expression is essentially a closure where the parser state (Input type) is exposed. This is useful for doing eg. inline match statements or to delegate to another parser which requires some plain Rust logic:

fn other_parser<I: Input>(i: I) -> ParseResult<I, &'static str, &'static str> {
    i.ret("Success!")
}

let condition = true;

let p = parser!{
    state -> match condition {
        true  => other_parser(state),
        false => Input::err(state, "failure"),
    }
};

assert_eq!(parse_only(p, b""), Ok("Success!"));

Operators

Expressions also supports using operators in between sub-expressions to make common actions more succint. These are infix operators with right associativity (ie. they are placed between expression terms and are grouped towards the right). The result of the expression as a whole will be deiced by the operator.

Ordered after operator precedence:

<*, skip

Evaluates the parser to the left first and on success evaluates the parser on the right, skipping its result.
```
let p = parser!{ decimal() <* token(b';') };

assert_eq!(parse_only(p, b"123;"), Ok(123u32));
```
<|>, or

Attempts to evaluate the parser on the left and if that fails it will backtrack and retry with the parser on the right. Is equivalent to stacking or combinators.
```
let p = parser!{ token(b'a') <|> token(b'b') };

assert_eq!(parse_only(p, b"b"), Ok(b'b'));
```
>>, then

Evaluates the parser to the left, then throws away any value and evaluates the parser on the right.
```
let p = parser!{ token(b'a') >> token(b';') };

assert_eq!(parse_only(p, b"a;"), Ok(b';'));
```

These operators correspond to the equivalent operators found in Haskell's Alternative, Applicative and Monad typeclasses, with the exception of being right-associative (the operators are left-associative in Haskell).

An Inline expression needs to be wrapped in parenthesis to parse ($expr pattern in macros require ; or , to be terminated at the same nesting level):

let p = parser!{ (i -> Input::err(i, "foo")) <|> (i -> Input::ret(i, "bar")) };

assert_eq!(parse_only(p, b"a;"), Ok("bar"));

Debugging

Errors in Rust macros can be hard to decipher at times, especially when using very complex macros which incrementally parse their input. This section is provided to give some hints and solutions for common problems. If this still does not solve the problem, feel free to ask questions on GitHub or via email or open an issue.

Macro recursion limit

The parse! macro is expanding by recursively invoking itself, parsing a bit of the input each iteration. This sometimes reaches the recursion-limit for macros in Rust:

src/macros.rs:439:99: 439:148 error: recursion limit reached while expanding the macro `__parse_internal`
src/macros.rs:439     ( @EXPR_SKIP($input:expr; $($lhs:tt)*) $t1:tt $t2:tt )                                   => { __parse_internal!{@TERM($input) $($lhs)* $t1 $t2} };

The default recursion limit is 64, this can be raised by using a crate-annotation in the crate where the recursion limit is an issue:

#![recursion_limit="100"]

Debugging macro expansion

If you are using the nightly version of rust you can use the feature trace_macros to see how the macro is expanded:

#![feature(trace_macros)]

trace_macros!(true);
let p = parser!{ decimal() <* token(b';') };
trace_macros!(false);

This will result in a printout similar to this:

parser! { decimal (  ) < * token ( b';' ) }
parse! { i ; decimal (  ) < * token ( b';' ) }
__parse_internal! { i ; decimal (  ) < * token ( b';' ) }
__parse_internal! { @ STATEMENT ( ( i ; _ ) ) decimal (  ) < * token ( b';' ) }
__parse_internal! { @ BIND ( ( i ; _ ) decimal (  ) < * token ( b';' ) ) }
__parse_internal! { @ EXPR ( i ; ) decimal (  ) < * token ( b';' ) }
__parse_internal! { @ EXPR_ALT ( i ; ) decimal (  ) < * token ( b';' ) }
__parse_internal! { @ EXPR_SKIP ( i ; ) decimal (  ) < * token ( b';' ) }
__parse_internal! { @ TERM ( i ) decimal (  ) }
__parse_internal! { @ EXPR_SKIP ( i ; ) token ( b';' ) }
__parse_internal! { @ TERM ( i ) token ( b';' ) }

Output like the above can make it clearer where it is actually failing, and can sometimes highlight the exact problem (with the help of looking at the grammar found above).

Function error pointing to macro code

Sometimes non-syntax errors will occur in macro code, rustc currently (on stable) has issues with actually displaying the actual code which causes the problem. Instead the macro-part will be highlighted as the cause of the issue:

src/macros.rs:431:71: 431:97 error: this function takes 1 parameter but 2 parameters were supplied [E0061]
src/macros.rs:431     ( @TERM($input:expr) $func:ident ( $($param:expr),* $(,)*) ) => { $func($input, $($param),*) };
                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~
src/macros.rs:489:99: 489:148 note: in this expansion of __parse_internal! (defined in src/macros.rs)
...

Usually this is related to a Named expression which is used to invoke a function, but the function-parameters do not match the expected. Check all the named invocations in the macro-invocation and keep in mind that the first parameter will be an Input<I> which is added automatically. If that still does not help, try using nighly and the trace_macro feature to see what is expanded.

`error: expected ident, found foo`

This error (with foo being a user-defined symbol) can be caused by having a Bind statement as the last statement in a parse! block. The last part of a parse! block must be an expression.

src/macros.rs:551:111: 551:116 error: expected ident, found foo
src/macros.rs:551     ( $input:expr ; let $name:pat = $($tail:tt)+ )
    => { __parse_internal!{@STATEMENT(($input; $name)) $($tail)+} };
                                               ^~~~~