Macro chomp::parse
[−]
[src]
macro_rules! parse { ( $($t:tt)* ) => { ... }; }
Macro emulating do
-notation for the parser monad, automatically threading the linear type.
parse!{input; parser("parameter"); let value = other_parser(); ret do_something(value) } // is equivalent to: parser(input, "parameter").bind(|i, _| other_parser(i).bind(|i, value| i.ret(do_something(value))))
Examples
Parsing into a struct using the basic provided parsers:
use chomp::prelude::{Buffer, Error, Input, ParseResult, parse_only, take_while1, token}; #[derive(Debug, Eq, PartialEq)] struct Name<B: Buffer> { first: B, last: B, } fn parser<I: Input<Token=u8>>(i: I) -> ParseResult<I, Name<I::Buffer>, Error<I::Token>> { parse!{i; let first = take_while1(|c| c != b' '); token(b' '); let last = take_while1(|c| c != b'\n'); ret @ _, Error<u8>: Name{ first: first, last: last, } } } assert_eq!(parse_only(parser, "Martin Wernstål\n".as_bytes()), Ok(Name{ first: &b"Martin"[..], last: "Wernstål".as_bytes() }));
Parsing an IP-address with a string-prefix and terminated with semicolon using the <*
(skip)
operator to make it more succint:
use chomp::prelude::{U8Input, SimpleResult, parse_only, string, token}; use chomp::ascii::decimal; fn parse_ip<I: U8Input>(i: I) -> SimpleResult<I, (u8, u8, u8, u8)> { parse!{i; string(b"ip:"); let a = decimal() <* token(b'.'); let b = decimal() <* token(b'.'); let c = decimal() <* token(b'.'); let d = decimal(); token(b';'); ret (a, b, c, d) } } assert_eq!(parse_only(parse_ip, b"ip:192.168.0.1;"), Ok((192, 168, 0, 1)));
Parsing a log-level using the <|>
alternation (or) operator:
use chomp::prelude::{parse_only, string}; #[derive(Debug, Eq, PartialEq)] enum Log { Error, Warning, Info, Debug, }; let level = |i, b, r| string(i, b).map(|_| r); let log_severity = parser!{ level(b"ERROR", Log::Error) <|> level(b"WARN", Log::Warning) <|> level(b"INFO", Log::Info) <|> level(b"DEBUG", Log::Debug) }; assert_eq!(parse_only(log_severity, b"INFO"), Ok(Log::Info));
Grammar
EBNF using $ty
, $expr
, $ident
and $pat
for the equivalent Rust macro patterns.
Block ::= Statement* Expr
Statement ::= Bind ';'
| Expr ';'
Bind ::= 'let' Var '=' Expr
Var ::= $pat
| $ident ':' $ty
/* Expr is split this way to allow for operator precedence */
Expr ::= ExprAlt
| ExprAlt ">>" Expr
ExprAlt ::= ExprSkip
| ExprSkip "<|>" ExprAlt
ExprSkip ::= Term
| Term "<*" ExprSkip
Term ::= Ret
| Err
| '(' Expr ')'
| Inline
| Named
Ret ::= "ret" Typed
| "ret" $expr
Err ::= "err" Typed
| "err" $expr
Typed ::= '@' $ty ',' $ty ':' $expr
Inline ::= $ident "->" $expr
Named ::= $ident '(' ($expr ',')* (',')* ')'
Statement
A statement is a line ending in a semicolon. This must be followed by either another statement or by an expression which ends the block.
parse!{i; token(b':'); let n: u32 = decimal(); ret n * 2 }
Bind
A bind statement uses a let
-binding to bind a value of a parser-expression within the parsing
context. The expression to the right of the equal-sign will be evaluated and if the parser is
still in a success state the value will be bound to the pattern following let
.
The patter can either just be an identifier but it can also be any irrefutable match-pattern,
types can also be declared with identifier: type
when necessary (eg. declare integer type
used with the decimal
parser).
Action
An action is any parser-expression, ended with a semicolon. This will be executed and its result will be discarded before proceeding to the next statement or the ending expression. Any error will exit early and will be propagated.
Expression
A parser expression can either be the only part of a parse!
macro (eg. for alternating as
seen above) or it can be a part of a bind or action statement or it is the final result of a
parse-block.
Named
A named action is like a function call, but will be expanded to include the parsing context
(Input
) as the first parameter. The syntax is currently limited to a rust identifier followed
by a parameter list within parentheses. The parentheses are mandatory.
fn do_it<'a, I: U8Input>(i: I, s: &'a str) -> SimpleResult<I, &'a str> { i.ret(s) } parse!{i; do_it("second parameter") }
Ret and Err
Many times you want to move a value into the parser monad, eg. to return a result or report an
error. The ret
and err
keywords provide this functionality inside of parse!
-expressions.
let r: Result<_, (_, ())> = parse_only( parser!{ ret "some success data" }, b"input data" ); assert_eq!(r, Ok("some success data"));
In the example above the Result<_, (_, ())>
type-annotation is required since ret
leaves the error type E
free which means that the parser!
expression above cannot infer the
error type without the annotation. ret
and end
both provide a mechanism to supply this
information inline:
let r = parse_only(parser!{ err @ u32, _: "some error data" }, b"input data"); assert_eq!(r, Err((&b"input data"[..], "some error data")));
Note that we only declare the success type (u32
above) and leave out the type of the error
(by using _
) since that can be uniquely inferred.
Inline
An inline expression is essentially a closure where the parser state (Input
type) is exposed.
This is useful for doing eg. inline match
statements or to delegate to another parser which
requires some plain Rust logic:
fn other_parser<I: Input>(i: I) -> ParseResult<I, &'static str, &'static str> { i.ret("Success!") } let condition = true; let p = parser!{ state -> match condition { true => other_parser(state), false => Input::err(state, "failure"), } }; assert_eq!(parse_only(p, b""), Ok("Success!"));
Operators
Expressions also supports using operators in between sub-expressions to make common actions more succint. These are infix operators with right associativity (ie. they are placed between expression terms and are grouped towards the right). The result of the expression as a whole will be deiced by the operator.
Ordered after operator precedence:
<*
, skipEvaluates the parser to the left first and on success evaluates the parser on the right, skipping its result.
let p = parser!{ decimal() <* token(b';') }; assert_eq!(parse_only(p, b"123;"), Ok(123u32));
<|>
, orAttempts to evaluate the parser on the left and if that fails it will backtrack and retry with the parser on the right. Is equivalent to stacking
or
combinators.let p = parser!{ token(b'a') <|> token(b'b') }; assert_eq!(parse_only(p, b"b"), Ok(b'b'));
>>
, thenEvaluates the parser to the left, then throws away any value and evaluates the parser on the right.
let p = parser!{ token(b'a') >> token(b';') }; assert_eq!(parse_only(p, b"a;"), Ok(b';'));
These operators correspond to the equivalent operators found in Haskell's Alternative
,
Applicative
and Monad
typeclasses, with the exception of being right-associative (the
operators are left-associative in Haskell).
An Inline expression needs to be wrapped in parenthesis to parse ($expr
pattern in macros
require ;
or ,
to be terminated at the same nesting level):
let p = parser!{ (i -> Input::err(i, "foo")) <|> (i -> Input::ret(i, "bar")) }; assert_eq!(parse_only(p, b"a;"), Ok("bar"));
Debugging
Errors in Rust macros can be hard to decipher at times, especially when using very complex macros which incrementally parse their input. This section is provided to give some hints and solutions for common problems. If this still does not solve the problem, feel free to ask questions on GitHub or via email or open an issue.
Macro recursion limit
The parse!
macro is expanding by recursively invoking itself, parsing a bit of the input each
iteration. This sometimes reaches the recursion-limit for macros in Rust:
src/macros.rs:439:99: 439:148 error: recursion limit reached while expanding the macro `__parse_internal`
src/macros.rs:439 ( @EXPR_SKIP($input:expr; $($lhs:tt)*) $t1:tt $t2:tt ) => { __parse_internal!{@TERM($input) $($lhs)* $t1 $t2} };
The default recursion limit is 64
, this can be raised by using a crate-annotation in the
crate where the recursion limit is an issue:
#![recursion_limit="100"]
Debugging macro expansion
If you are using the nightly version of rust you can use the feature trace_macros
to see how
the macro is expanded:
#![feature(trace_macros)] trace_macros!(true); let p = parser!{ decimal() <* token(b';') }; trace_macros!(false);
This will result in a printout similar to this:
parser! { decimal ( ) < * token ( b';' ) }
parse! { i ; decimal ( ) < * token ( b';' ) }
__parse_internal! { i ; decimal ( ) < * token ( b';' ) }
__parse_internal! { @ STATEMENT ( ( i ; _ ) ) decimal ( ) < * token ( b';' ) }
__parse_internal! { @ BIND ( ( i ; _ ) decimal ( ) < * token ( b';' ) ) }
__parse_internal! { @ EXPR ( i ; ) decimal ( ) < * token ( b';' ) }
__parse_internal! { @ EXPR_ALT ( i ; ) decimal ( ) < * token ( b';' ) }
__parse_internal! { @ EXPR_SKIP ( i ; ) decimal ( ) < * token ( b';' ) }
__parse_internal! { @ TERM ( i ) decimal ( ) }
__parse_internal! { @ EXPR_SKIP ( i ; ) token ( b';' ) }
__parse_internal! { @ TERM ( i ) token ( b';' ) }
Output like the above can make it clearer where it is actually failing, and can sometimes highlight the exact problem (with the help of looking at the grammar found above).
Function error pointing to macro code
Sometimes non-syntax errors will occur in macro code, rustc
currently (on stable) has issues
with actually displaying the actual code which causes the problem. Instead the macro-part will
be highlighted as the cause of the issue:
src/macros.rs:431:71: 431:97 error: this function takes 1 parameter but 2 parameters were supplied [E0061]
src/macros.rs:431 ( @TERM($input:expr) $func:ident ( $($param:expr),* $(,)*) ) => { $func($input, $($param),*) };
^~~~~~~~~~~~~~~~~~~~~~~~~~
src/macros.rs:489:99: 489:148 note: in this expansion of __parse_internal! (defined in src/macros.rs)
...
Usually this is related to a Named expression which is used to invoke a function, but the
function-parameters do not match the expected. Check all the named invocations in the
macro-invocation and keep in mind that the first parameter will be an Input<I>
which is added
automatically. If that still does not help, try using nighly and the trace_macro
feature to
see what is expanded.
error: expected ident, found foo
This error (with foo
being a user-defined symbol) can be caused by having a Bind statement
as the last statement in a parse!
block. The last part of a parse!
block must be an
expression.
src/macros.rs:551:111: 551:116 error: expected ident, found foo
src/macros.rs:551 ( $input:expr ; let $name:pat = $($tail:tt)+ )
=> { __parse_internal!{@STATEMENT(($input; $name)) $($tail)+} };
^~~~~