m4rw3r2016-08-31T13:27:38+00:00http://m4rw3r.github.io/Martin Wernstålm4rw3r@gmail.comChomp and impl Trait, revisited2016-08-24T00:00:00+00:00http://m4rw3r.github.io//chomp-impl-trait--revisited<p>Now that <a href="https://github.com/rust-lang/rust/pull/35091">conservative <code class="highlighter-rouge">impl Trait</code></a> has landed in nightly I decided to finally take the time to reimplement all parsers of <a href="http://github.com/m4rw3r/chomp">Chomp</a> using <code class="highlighter-rouge">impl Trait</code> to create a proper parser-monad.</p>
<p>Before doing this I finished up most of the work required for version 0.3 of chomp; <a href="https://github.com/m4rw3r/chomp/pull/45">abstracting over the input type</a> which will enable users to plug the appropriate input-type. Hopefully this will open up for more specialized input-types in the future, like rope-based structures which are filled and dropped as the parsing proceeds as well as <a href="https://github.com/m4rw3r/chomp/pull/49">implementations wrapping iterators</a>.</p>
<p>But before finishing up 0.3 I just had to give <code class="highlighter-rouge">impl Trait</code> a try, I do not want to wait for too long before I release the results. The branch containing the code can be found in the <a href="https://github.com/m4rw3r/chomp/tree/experiment/impl_trait">Chomp repo</a>.</p>
<p>I am not yet totally sure if this will actually be the 1.0 of Chomp, or if it will be a separate crate called <code class="highlighter-rouge">chomp2</code> (or something else) but I am leaning towards creating a new crate. <code class="highlighter-rouge">chomp</code> still works on stable while the <code class="highlighter-rouge">impl Trait</code> version does not (and will probably not do so in the forseeable future since it uses <code class="highlighter-rouge">conservative_impl_trait</code>, <code class="highlighter-rouge">fn_traits</code> and <code class="highlighter-rouge">unboxed_closures</code>).</p>
<h2 id="chomp-and-monad-like-parsers">Chomp and monad-like parsers</h2>
<p>The monad-like state of Chomp makes it possible to easily compose parser-actions by passing an input state followed by any parameters the parser requires and in return a <code class="highlighter-rouge">ParseResult</code> is obtained, containing the remainder of the input-state as well as any success or failure value.</p>
<p>Parsers themselves do not implement any specific trait, instead they all follow about the same function-signature:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="n">Fn</span><span class="o">*<</span><span class="n">I</span><span class="p">:</span> <span class="n">Input</span><span class="o">></span><span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="err">...</span><span class="p">)</span> <span class="k">-></span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span>
</code></pre>
</div>
<p>where <code class="highlighter-rouge">Fn*</code> is the appropriate function-trait (depending on internal implementation of the parser, <code class="highlighter-rouge">many</code> requires a <code class="highlighter-rouge">FnMut</code>parser for example to be able to repeat) and <code class="highlighter-rouge">T</code> and <code class="highlighter-rouge">E</code> are success and error respectively.</p>
<p>There are two issues with this way of structuring parsers; first all input-state needs to be threaded through the parsers, making usage slightly more complicated, and parsers are not unified under one type which can make it slightly harder to compose them.</p>
<h2 id="a-proper-parser-monad">A proper parser monad</h2>
<p>A proper monad on the other hand does no actual parsing in the parser-functions themselves (like <code class="highlighter-rouge">any</code>, <code class="highlighter-rouge">string</code>, <code class="highlighter-rouge">take_while</code>, <code class="highlighter-rouge">many</code> and even user-defined parsers). The <code class="highlighter-rouge">ParseResult</code> type no longer exists and instead all parser-functions produce types implementing the <code class="highlighter-rouge">Parser</code> trait which are later invoked with a parser input:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">trait</span> <span class="n">Parser</span><span class="o"><</span><span class="n">I</span><span class="p">:</span> <span class="n">Input</span><span class="o">></span> <span class="p">{</span>
<span class="k">type</span> <span class="n">Output</span><span class="p">;</span>
<span class="k">type</span> <span class="n">Error</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">parse</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">Result</span><span class="o"><</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span><span class="p">,</span> <span class="nn">Self</span><span class="p">::</span><span class="n">Error</span><span class="o">></span><span class="p">);</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This trait is analogous to the following closure-signature:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nf">Fn</span><span class="p">(</span><span class="n">input</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="n">input</span><span class="o">-</span><span class="n">remainder</span><span class="p">,</span> <span class="n">Result</span><span class="o"><</span><span class="n">success</span><span class="p">,</span> <span class="n">error</span><span class="o">></span><span class="p">)</span>
</code></pre>
</div>
<p>which will create a Parser monad when used with the appropriate <code class="highlighter-rouge">bind</code> and <code class="highlighter-rouge">return</code> implementations:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="k">return</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">Result</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">);</span>
<span class="k">fn</span> <span class="nf">bind</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">)</span> <span class="k">-></span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">Result</span><span class="o"><</span><span class="n">U</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">)</span>
<span class="n">where</span> <span class="n">A</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">Result</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">)</span>
<span class="n">B</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">Result</span><span class="o"><</span><span class="n">U</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">);</span>
</code></pre>
</div>
<p>Preferably <code class="highlighter-rouge">impl Trait</code> should be used as much as possible since it prevents heavyweight syntax, but trait methods cannot return anomymized types (yet). This means that the basic combinators (like <code class="highlighter-rouge">bind</code>, <code class="highlighter-rouge">then</code> and <code class="highlighter-rouge">map</code>) provided on the <code class="highlighter-rouge">Parser</code> trait will, just like in <a href="http://alexcrichton.com/futures-rs/futures/trait.Future.html"><code class="highlighter-rouge">Future</code> in the <code class="highlighter-rouge">futures</code> crate</a>, have concrete types for those combinators.</p>
<h2 id="converting-chomp-to-a-proper-monad">Converting Chomp to a proper monad</h2>
<p>The basic parsers and combinators were no issues to convert whatsoever, mainly the function signature had to change to return an <code class="highlighter-rouge">impl Parser</code> and the function body had to be wrapped in a closure. Here is an example of the <code class="highlighter-rouge">or</code> combinator difference:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">or</span><span class="o"><</span><span class="n">I</span><span class="p">:</span> <span class="n">Input</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">G</span><span class="o">></span><span class="p">(</span><span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">,</span> <span class="n">g</span><span class="p">:</span> <span class="n">G</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">Parser</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">Output</span><span class="o">=</span><span class="nn">F</span><span class="p">::</span><span class="n">Output</span><span class="p">,</span> <span class="n">Error</span><span class="o">=</span><span class="nn">F</span><span class="p">::</span><span class="n">Error</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="n">Parser</span><span class="o"><</span><span class="n">I</span><span class="o">></span><span class="p">,</span>
<span class="n">G</span><span class="p">:</span> <span class="n">Parser</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">Output</span><span class="o">=</span><span class="nn">F</span><span class="p">::</span><span class="n">Output</span><span class="p">,</span> <span class="n">Error</span><span class="o">=</span><span class="nn">F</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span>
<span class="k">move</span> <span class="p">|</span><span class="n">i</span><span class="p">:</span> <span class="n">I</span><span class="p">|</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">m</span> <span class="o">=</span> <span class="n">i</span><span class="nf">.mark</span><span class="p">();</span>
<span class="k">match</span> <span class="n">f</span><span class="nf">.parse</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">d</span><span class="p">))</span> <span class="k">=></span> <span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">d</span><span class="p">)),</span>
<span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nf">Err</span><span class="p">(</span><span class="n">_</span><span class="p">))</span> <span class="k">=></span> <span class="n">g</span><span class="nf">.parse</span><span class="p">(</span><span class="n">b</span><span class="nf">.restore</span><span class="p">(</span><span class="n">m</span><span class="p">)),</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>It is almost identical to the monad-like definition:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">or</span><span class="o"><</span><span class="n">I</span><span class="p">:</span> <span class="n">Input</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">G</span><span class="o">></span><span class="p">(</span><span class="n">i</span><span class="p">:</span> <span class="n">I</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">,</span> <span class="n">g</span><span class="p">:</span> <span class="n">G</span><span class="p">)</span> <span class="k">-></span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">,</span>
<span class="n">G</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">-></span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">m</span> <span class="o">=</span> <span class="n">i</span><span class="nf">.mark</span><span class="p">();</span>
<span class="k">match</span> <span class="nf">f</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="nf">.into_inner</span><span class="p">()</span> <span class="p">{</span>
<span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">d</span><span class="p">))</span> <span class="k">=></span> <span class="n">b</span><span class="nf">.ret</span><span class="p">(</span><span class="n">d</span><span class="p">),</span>
<span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nf">Err</span><span class="p">(</span><span class="n">_</span><span class="p">))</span> <span class="k">=></span> <span class="nf">g</span><span class="p">(</span><span class="n">b</span><span class="nf">.restore</span><span class="p">(</span><span class="n">m</span><span class="p">)),</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Same goes for the bounded combinators; <code class="highlighter-rouge">many</code>, <code class="highlighter-rouge">many1</code>, <code class="highlighter-rouge">skip_many</code> and <code class="highlighter-rouge">many_till</code>, with one difference; the parameter is not the parser itself but instead a parser-constructor (ie. an <code class="highlighter-rouge">FnMut() -> Parser</code>, to be able to reuse parsers without requiring them to be <code class="highlighter-rouge">Clone</code>). The parser constructor and inner iterator-type will be a part of the closure being returned, the iterator-instance will be constructed and used in <code class="highlighter-rouge">FromIterator</code> when the parser is run.</p>
<p><code class="highlighter-rouge">sep_by</code> on the other hand was a bit problematic due to the lifetime of the supplied closures since they need to be tied to the return value for as long as it has not yet been used. But by adding a parser for <code class="highlighter-rouge">Option<Parser></code> the parser could be unified as a single type and be used with the appropriate <code class="highlighter-rouge">many</code> combinator.</p>
<p>Sadly for the bounded combinators I had to do the same thing as with the <code class="highlighter-rouge">Parser</code> methods, implement concrete structures for each combinator as well as split them up into different traits due to different requirements of the generics. This also made for some annoyance to make a generic <code class="highlighter-rouge">sep_by</code> taking any type implementing <code class="highlighter-rouge">BoundedMany</code> since the closure used in <code class="highlighter-rouge">sep_by</code> had to be converted to a struct implementing <code class="highlighter-rouge">FnMut</code> so that the types could be fully described.</p>
<p>This made the <code class="highlighter-rouge">sep_by</code> function kinda ugly, but the alternative was to have 5 copies of <code class="highlighter-rouge">sep_by</code> for the different types of ranges:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">sep_by</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">G</span><span class="p">,</span> <span class="n">P</span><span class="p">,</span> <span class="n">Q</span><span class="p">,</span> <span class="n">R</span><span class="o">></span><span class="p">(</span><span class="n">r</span><span class="p">:</span> <span class="n">R</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">,</span> <span class="n">sep</span><span class="p">:</span> <span class="n">G</span><span class="p">)</span> <span class="k">-></span> <span class="nn">R</span><span class="p">::</span><span class="n">ManyParser</span>
<span class="n">where</span> <span class="n">I</span><span class="p">:</span> <span class="n">Input</span><span class="p">,</span>
<span class="n">T</span><span class="p">:</span> <span class="n">FromIterator</span><span class="o"><</span><span class="nn">P</span><span class="p">::</span><span class="n">Output</span><span class="o">></span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="nf">FnMut</span><span class="p">()</span> <span class="k">-></span> <span class="n">P</span><span class="p">,</span>
<span class="n">G</span><span class="p">:</span> <span class="nf">FnMut</span><span class="p">()</span> <span class="k">-></span> <span class="n">Q</span><span class="p">,</span>
<span class="n">P</span><span class="p">:</span> <span class="n">Parser</span><span class="o"><</span><span class="n">I</span><span class="o">></span><span class="p">,</span>
<span class="n">Q</span><span class="p">:</span> <span class="n">Parser</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">Error</span><span class="o">=</span><span class="nn">P</span><span class="p">::</span><span class="n">Error</span><span class="o">></span><span class="p">,</span>
<span class="n">R</span><span class="p">:</span> <span class="n">BoundedMany</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">SepByInnerParserCtor</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">G</span><span class="o">></span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="nn">P</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span>
<span class="nn">BoundedMany</span><span class="p">::</span><span class="nf">many</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">SepByInnerParserCtor</span> <span class="p">{</span>
<span class="n">item</span><span class="p">:</span> <span class="k">false</span><span class="p">,</span>
<span class="n">f</span><span class="p">:</span> <span class="n">f</span><span class="p">,</span>
<span class="n">sep</span><span class="p">:</span> <span class="n">sep</span><span class="p">,</span>
<span class="n">_i</span><span class="p">:</span> <span class="n">PhantomData</span><span class="p">,</span>
<span class="p">})</span>
<span class="p">}</span>
</code></pre>
</div>
<h2 id="performance">Performance</h2>
<p>Now, for the interesting part. How well does the conservative <code class="highlighter-rouge">impl Trait</code> version of Chomp perform?</p>
<p>Using the benchmarks present in the <a href="https://github.com/m4rw3r/chomp/tree/a8fe651dbf19c9cf1a53bfd36c48f150f01859e2/benches"><code class="highlighter-rouge">benches</code> directory</a> for both the <a href="https://github.com/m4rw3r/chomp/tree/a8fe651dbf19c9cf1a53bfd36c48f150f01859e2">input trait branch</a> and the <a href="https://github.com/m4rw3r/chomp/tree/68fa6941cf01b13280f1c692817516a022891c45/benches"><code class="highlighter-rouge">impl Trait</code> branch</a> as well as using the <code class="highlighter-rouge">http_parser.rs</code> example used in some of the <a href="http://m4rw3r.github.io/parser-combinators-road-chomp-0-1/">previous</a> <a href="http://m4rw3r.github.io/parser-combinator-experiments-rust/">posts</a> we obtain the following numbers:</p>
<table>
<thead>
<tr>
<th>Test</th>
<th style="text-align: right"><code class="highlighter-rouge">impl Trait</code></th>
<th style="text-align: left"> </th>
<th style="text-align: right">PR: input trait</th>
<th style="text-align: left"> </th>
</tr>
</thead>
<tbody>
<tr>
<td>count_vec_10k</td>
<td style="text-align: right">6,802 ns/iter</td>
<td style="text-align: left">(+/- 2,478)</td>
<td style="text-align: right">6,856 ns/iter</td>
<td style="text-align: left">(+/- 694)</td>
</tr>
<tr>
<td>count_vec_10k_maybe_incomplete</td>
<td style="text-align: right">6,094 ns/iter</td>
<td style="text-align: left">(+/- 2,092)</td>
<td style="text-align: right">5,887 ns/iter</td>
<td style="text-align: left">(+/- 1,828)</td>
</tr>
<tr>
<td>count_vec_1k</td>
<td style="text-align: right">725 ns/iter</td>
<td style="text-align: left">(+/- 265)</td>
<td style="text-align: right">724 ns/iter</td>
<td style="text-align: left">(+/- 106)</td>
</tr>
<tr>
<td>many1_vec_10k</td>
<td style="text-align: right">6,576 ns/iter</td>
<td style="text-align: left">(+/- 1,741)</td>
<td style="text-align: right">6,574 ns/iter</td>
<td style="text-align: left">(+/- 1,518)</td>
</tr>
<tr>
<td>many1_vec_10k_maybe_incomplete</td>
<td style="text-align: right">6,590 ns/iter</td>
<td style="text-align: left">(+/- 2,868)</td>
<td style="text-align: right">6,698 ns/iter</td>
<td style="text-align: left">(+/- 1,254)</td>
</tr>
<tr>
<td>many1_vec_1k</td>
<td style="text-align: right">958 ns/iter</td>
<td style="text-align: left">(+/- 128)</td>
<td style="text-align: right">961 ns/iter</td>
<td style="text-align: left">(+/- 164)</td>
</tr>
<tr>
<td>many_vec_10k</td>
<td style="text-align: right">6,560 ns/iter</td>
<td style="text-align: left">(+/- 918)</td>
<td style="text-align: right">6,609 ns/iter</td>
<td style="text-align: left">(+/- 1,104)</td>
</tr>
<tr>
<td>many_vec_10k_maybe_incomplete</td>
<td style="text-align: right">6,641 ns/iter</td>
<td style="text-align: left">(+/- 2,140)</td>
<td style="text-align: right">6,657 ns/iter</td>
<td style="text-align: left">(+/- 1,750)</td>
</tr>
<tr>
<td>many_vec_1k</td>
<td style="text-align: right">948 ns/iter</td>
<td style="text-align: left">(+/- 108)</td>
<td style="text-align: right">957 ns/iter</td>
<td style="text-align: left">(+/- 36)</td>
</tr>
<tr>
<td>multiple_requests</td>
<td style="text-align: right">44,617 ns/iter</td>
<td style="text-align: left">(+/- 1,953)</td>
<td style="text-align: right">44,681 ns/iter</td>
<td style="text-align: left">(+/- 5,344)</td>
</tr>
<tr>
<td>single_request</td>
<td style="text-align: right">606 ns/iter</td>
<td style="text-align: left">(+/- 96)</td>
<td style="text-align: right">610 ns/iter</td>
<td style="text-align: left">(+/- 175)</td>
</tr>
<tr>
<td>single_request_large</td>
<td style="text-align: right">980 ns/iter</td>
<td style="text-align: left">(+/- 329)</td>
<td style="text-align: right">979 ns/iter</td>
<td style="text-align: left">(+/- 52)</td>
</tr>
<tr>
<td>single_request_minimal</td>
<td style="text-align: right">107 ns/iter</td>
<td style="text-align: left">(+/- 9)</td>
<td style="text-align: right">114 ns/iter</td>
<td style="text-align: left">(+/- 7)</td>
</tr>
<tr>
<td>http_parser.rs, 204 MB</td>
<td style="text-align: right">~0.548 s</td>
<td style="text-align: left"> </td>
<td style="text-align: right">~0.559 s</td>
<td style="text-align: left"> </td>
</tr>
</tbody>
</table>
<p>All these numbers seem to be very promising, especially the “real-world” usage in the HTTP-parser reading from a file on disk.</p>
<p>Kudos to Eddyb and everyone involved who managed to make <code class="highlighter-rouge">impl Trait</code> happen!</p>
Rust: The `?` operator2016-01-28T00:00:00+00:00http://m4rw3r.github.io//rust-questionmark-operator<p>For people who are not familiar with Haskell or Scala, Rust’s <code class="highlighter-rouge">Option</code> and <code class="highlighter-rouge">Result</code> types might
feel a bit cumbersome and verbose to work with. To make it easier and less verbose to use them
the RFC <a href="https://github.com/rust-lang/rfcs/pull/243">PR #243: Trait-based exception handling</a> has
been proposed.</p>
<p>In this blog post I will go through some basics of the RFC and then compare with a hypothetical
<code class="highlighter-rouge">do</code>-notation.</p>
<p>The RFC proposes a <code class="highlighter-rouge">?</code> operator which is a compiler-assisted rewrite of expressions around <code class="highlighter-rouge">?</code>
characters. It is a unary suffix operator which can be placed on an expression to unwrap the value
on the left hand side of <code class="highlighter-rouge">?</code> while propagating any error through an early return:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="err">?</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span>
</code></pre>
</div>
<p>Would be transformed to:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">match</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">=></span> <span class="n">t</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">),</span>
<span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">=></span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="nf">.into</span><span class="p">()),</span>
<span class="p">}</span>
</code></pre>
</div>
<p>On its own <code class="highlighter-rouge">?</code> is just syntactic sugar for the <code class="highlighter-rouge">try!</code> macro, making it easier to write code
chaining expressions which can fail:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nd">try!</span><span class="p">(</span><span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">))</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span>
</code></pre>
</div>
<h1 id="try-and-catch"><code class="highlighter-rouge">try</code> and <code class="highlighter-rouge">catch</code></h1>
<p>The RFC also details a <code class="highlighter-rouge">try</code>-<code class="highlighter-rouge">catch</code> expression which would “catch” any early returns performed by
the <code class="highlighter-rouge">?</code> operator. Essentially the early returns would jump to the <code class="highlighter-rouge">catch</code> block and the whole
<code class="highlighter-rouge">try</code>-<code class="highlighter-rouge">catch</code> expression would assume that value. If no <code class="highlighter-rouge">catch</code> block is provided the <code class="highlighter-rouge">try</code> block
will return a wrapped result:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="n">try</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">f</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="err">?</span><span class="p">;</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span><span class="err">?</span>
<span class="p">}</span>
<span class="c">// can also be written as</span>
<span class="n">try</span> <span class="p">{</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="err">?</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world"</span><span class="p">)</span><span class="err">?</span> <span class="p">}</span>
</code></pre>
</div>
<p>Note that the <code class="highlighter-rouge">?</code> is required at the last line since we want a <code class="highlighter-rouge">Result<(), io::Error></code>, not a
<code class="highlighter-rouge">Result<Result<(), io::Error>, io::Error></code>. The <code class="highlighter-rouge">Result</code> type will automatically re-wrap the
return value of the block if there is no <code class="highlighter-rouge">catch</code> block, so that the whole expression assumes
a <code class="highlighter-rouge">Result<T, E></code> without the need to wrap the return value yourself.</p>
<p>Adding the <code class="highlighter-rouge">catch</code> would be equivalent to using <code class="highlighter-rouge">Result::or_else</code> with <code class="highlighter-rouge">try</code> and <code class="highlighter-rouge">match</code>:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="n">try</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">f</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="err">?</span><span class="p">;</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span><span class="err">?</span>
<span class="p">}</span>
<span class="n">catch</span> <span class="p">{</span>
<span class="c">// we only have one type to match on</span>
<span class="n">e</span> <span class="k">=></span> <span class="p">{</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Is equivalent to:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="n">try</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">f</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="err">?</span><span class="p">;</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span><span class="err">?</span>
<span class="p">}</span><span class="nf">.or_else</span><span class="p">(|</span><span class="n">e</span><span class="p">|</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
<span class="p">)</span>
</code></pre>
</div>
<p>The difference here is that any <code class="highlighter-rouge">return</code> inside of <code class="highlighter-rouge">Result::or_else</code> cannot immediately result in
an early return.</p>
<p>The <code class="highlighter-rouge">?</code> also allows us to use it at an arbitrary nesting within the <code class="highlighter-rouge">try</code> block (and in code in
general):</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">logging_on</span><span class="p">()</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="nb">bool</span><span class="p">,</span> <span class="nn">io</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="k">fn</span> <span class="nf">read_values</span><span class="p">()</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">SomeData</span><span class="p">,</span> <span class="nn">io</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="k">fn</span> <span class="nf">log_values</span><span class="p">(</span><span class="n">values</span><span class="p">:</span> <span class="o">&</span><span class="n">SomeData</span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="p">(),</span> <span class="nn">io</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="n">try</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nf">read_values</span><span class="p">()</span><span class="err">?</span><span class="p">;</span>
<span class="k">if</span> <span class="nf">logging_on</span><span class="p">()</span><span class="err">?</span> <span class="p">{</span>
<span class="nf">log_values</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="err">?</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">data</span>
<span class="p">}</span>
</code></pre>
</div>
<h1 id="do-notation">Do-notation</h1>
<p>So called <code class="highlighter-rouge">do</code>-notation is a syntactic-sugar which allows us to write statements and expressions
dealing with the computation within a context. For example values of types like <code class="highlighter-rouge">Option</code> and
<code class="highlighter-rouge">Result</code> enable us to perform operations without having to worry about the failure-state of the
same inside of the <code class="highlighter-rouge">do</code>-expression:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">do</span> <span class="p">{</span>
<span class="k">mut</span> <span class="n">f</span> <span class="o"><-</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">);</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre>
</div>
<p>The first line inside of the <code class="highlighter-rouge">do</code>-block is a so called monadic bind: it will bind the value
contained inside of the type on the right of <code class="highlighter-rouge"><-</code> to the identifier on the left for the rest of
the block. The result of the rest of the block will be merged with the context from the value on
the right of <code class="highlighter-rouge"><-</code>. In the case of <code class="highlighter-rouge">Result</code> and <code class="highlighter-rouge">Option</code> this is very simple and would just not
evaluate the rest of the block if the right hand side is of an error-variant.</p>
<p>The second line is an expression which is evaluated within the context of the first line: <code class="highlighter-rouge">f</code>
is available and can be mutated and the result is another <code class="highlighter-rouge">Result</code> which will be returned to the
monadic-bind method of the first <code class="highlighter-rouge">Result</code> value for merging (in <code class="highlighter-rouge">Result</code> this would be a no-op
since there is no state to merge).</p>
<p>The result of the <code class="highlighter-rouge">do</code>-block expression is <code class="highlighter-rouge">Result<(), io::Error></code>, since that is the return value
of <code class="highlighter-rouge">write_all</code>. The two expressions are compatible since they both return a value of type
<code class="highlighter-rouge">Result<T, io::Error></code>.</p>
<p>The above code desugars to:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="nf">.and_then</span><span class="p">(|</span><span class="k">mut</span> <span class="n">f</span><span class="p">|</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">))</span>
</code></pre>
</div>
<p>Each expression in the <code class="highlighter-rouge">do</code>-notation above evaluates to some <code class="highlighter-rouge">Result<T, io::Error></code> (for any <code class="highlighter-rouge">T</code>) which
means that when adding expressions not resulting in the <code class="highlighter-rouge">Result<T, io::Error></code> type they need to be
wrapped (this is called “lifting” in Haskell terminology) to produce a <code class="highlighter-rouge">Result</code> which then fits into the
<code class="highlighter-rouge">do</code>-block:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">let</span> <span class="n">h</span> <span class="o">=</span> <span class="s">"Hello"</span><span class="nf">.to_owned</span><span class="p">();</span>
<span class="k">do</span> <span class="p">{</span>
<span class="k">mut</span> <span class="n">f</span> <span class="o"><-</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">);</span>
<span class="n">s</span> <span class="o"><-</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">h</span> <span class="o">+</span> <span class="s">" world!"</span><span class="p">);</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="p">}</span>
</code></pre>
</div>
<p>In the <code class="highlighter-rouge">try</code>-block we could just add it as usual:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">let</span> <span class="n">h</span> <span class="o">=</span> <span class="s">"Hello"</span><span class="nf">.to_owned</span><span class="p">();</span>
<span class="n">try</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">f</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="err">?</span><span class="p">;</span>
<span class="k">let</span> <span class="n">s</span> <span class="o">=</span> <span class="n">h</span> <span class="o">+</span> <span class="s">" world!"</span><span class="p">;</span>
<span class="n">f</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="err">?</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Though in the code examples above would be more suitable to just move the expression assigned to
<code class="highlighter-rouge">s</code> into the call to <code class="highlighter-rouge">write_all</code>. It would also be desirable to allow the use of normal
<code class="highlighter-rouge">let</code>-binds inside of <code class="highlighter-rouge">do</code>-blocks to allow the declarations of variables without having to use
monadic bind.</p>
<p>Another difference is that <code class="highlighter-rouge">do</code>-notation only works on the statement-level whereas <code class="highlighter-rouge">?</code> works at
any nesting inside of the <code class="highlighter-rouge">try</code>-block. A direct translation of the nesting-example of the
<code class="highlighter-rouge">try</code>-block would look like this:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">logging_on</span><span class="p">()</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="nb">bool</span><span class="p">,</span> <span class="nn">io</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="k">fn</span> <span class="nf">read_values</span><span class="p">()</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">SomeData</span><span class="p">,</span> <span class="nn">io</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="k">fn</span> <span class="nf">log_values</span><span class="p">(</span><span class="n">values</span><span class="p">:</span> <span class="o">&</span><span class="n">SomeData</span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="p">(),</span> <span class="nn">io</span><span class="p">::</span><span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="k">do</span> <span class="p">{</span>
<span class="n">data</span> <span class="o"><-</span> <span class="nf">read_values</span><span class="p">();</span>
<span class="k">log</span> <span class="o"><-</span> <span class="nf">logging_on</span><span class="p">();</span>
<span class="k">if</span> <span class="k">log</span> <span class="p">{</span>
<span class="nf">log_values</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nf">Ok</span><span class="p">(())</span>
<span class="p">};</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Note that we cannot use <code class="highlighter-rouge">logging_on</code> directly as the condition of the <code class="highlighter-rouge">if</code>-expression and that the
<code class="highlighter-rouge">if</code>-expression needs to return a <code class="highlighter-rouge">Result</code> from both branches.</p>
<p>Utility functions can easily alleviate some of this, but Higher Kinded Types are required to make
many of them generic enough.</p>
<h1 id="monad-with-state">Monad with state</h1>
<p>The <code class="highlighter-rouge">Result</code> and <code class="highlighter-rouge">Option</code> monads only carry state in the type, eg. <code class="highlighter-rouge">Option</code> has <code class="highlighter-rouge">Some(T)</code> and
<code class="highlighter-rouge">None</code> but there is no extra value describing any state. But there are monads which carry state
as another data-item, like the State, Iterator and Parser monads (the State monad would probably
not be very interesting for Rust, but list-comprehension and parser combinators certainly are).</p>
<p>The signature of the proposed <code class="highlighter-rouge">?</code> is “<code class="highlighter-rouge">M<T, E> -> T</code>”, essentially the same as <code class="highlighter-rouge">unwrap</code> but with
the invisible addition that an early return or jump will be performed if the type decides that it
is in an “error” state.</p>
<p>Monadic bind on the other hand has the signature <code class="highlighter-rouge">M<T> -> (Fn*(T) -> M<U>) -> M<U></code>; we have an
initial context <code class="highlighter-rouge">M<T></code> which is then unwrapped to let the closure <code class="highlighter-rouge">Fn*(T) -> M<U></code> act on it to
produce another wrapped value and then the returned <code class="highlighter-rouge">M<U></code> is merged with the remaining state of
the original <code class="highlighter-rouge">M<T></code> (this is a simplification). Both the unwrapping and merging of the value is
under the control which implements the bind-operator and the original state is still available,
which means that context can be carried through in an appropriate way.</p>
<p>The closure denoted <code class="highlighter-rouge">Fn*</code> is one of the three closure types <code class="highlighter-rouge">FnOnce</code>, <code class="highlighter-rouge">FnMut</code> and <code class="highlighter-rouge">Fn</code> since
different types of bind-implementations have different requirements (eg. <code class="highlighter-rouge">Result</code> would use
<code class="highlighter-rouge">FnOnce</code> since it is just run once, whereas an iterator would need <code class="highlighter-rouge">FnMut</code> since the closure would
be executed once for each item). To let a trait-signature be generic over closures in this way is
something which is not yet possible in Rust at the moment, but once that is possible <code class="highlighter-rouge">do</code>-notation
should not be far off.</p>
<p>The proposal also mentions that the signature of <code class="highlighter-rouge">?</code> inside of <code class="highlighter-rouge">try</code> could be written as
<code class="highlighter-rouge">M<T, E> -> (FnOnce(T) -> R, FnOnce(E) -> R) -> R</code> if the compiler rewrites it to the trait-part of
the proposal. This <code class="highlighter-rouge">R</code> would then be wrapped using the static method <code class="highlighter-rouge">M<R, E>::normal</code> once it is
returned from the <code class="highlighter-rouge">try</code>-block, which has the signature <code class="highlighter-rouge">R -> M<R, E></code>.</p>
<p>This is very similar to monadic bind, but with some key differences in its use and the return value
<code class="highlighter-rouge">R</code> is not actually merged in the monadic context <code class="highlighter-rouge">M</code>. By adding a <code class="highlighter-rouge">= M<U></code> constraint to <code class="highlighter-rouge">R</code> we
can allow the wrapping method to investigate and update the state of any returned <code class="highlighter-rouge">M<U></code> and
actually provide a proper monadic bind for the type. Though the proposal is lacking one very
important piece which is needed to make it at all possible, and that is how the state should be
carried over from the original <code class="highlighter-rouge">M<T, E></code> to the new <code class="highlighter-rouge">M<R, E></code>.</p>
<p>The trait-version of the proposal for <code class="highlighter-rouge">?</code> and <code class="highlighter-rouge">try</code>-blocks is essentially a do-notation for a
bi-monad (ie. a monad carrying two values) but without the needed restrictions on the types or
any way of carrying state from the left hand side to the return value. This makes it impossible
to actually use for anything but the most simple monad types.</p>
<h1 id="try-vs-do"><code class="highlighter-rouge">try</code> vs <code class="highlighter-rouge">do</code></h1>
<p>Differences:</p>
<ul>
<li><code class="highlighter-rouge">do</code> requires users to lift expressions into the used type whereas <code class="highlighter-rouge">try</code> requires users to unwrap
values <em>out</em> of the type.</li>
<li><code class="highlighter-rouge">do</code> only works on one level, requiring nested expressions to use their own <code class="highlighter-rouge">do</code>-blocks when
necessary. <code class="highlighter-rouge">try</code> allows the <code class="highlighter-rouge">?</code> to short-circuit the whole thing whenever needed.</li>
<li><code class="highlighter-rouge">try</code> automatically wraps the resulting value from the block whereas <code class="highlighter-rouge">do</code> requires the block to
return a wrapped value.</li>
<li><code class="highlighter-rouge">do</code> allows the type to control the state-management between statements, <code class="highlighter-rouge">try</code> explicitly
disallows the carrying of state between the original type and the resulting type.</li>
</ul>
<p>Similarities:</p>
<ul>
<li>Both result in a wrapped value</li>
</ul>
<h1 id="alternatives">Alternatives</h1>
<p>The <code class="highlighter-rouge">try</code>-blocks and <code class="highlighter-rouge">?</code>-operator are intrinsically tied to the execution around a “failure-state”
and does not consider any other type of state. There are alternatives which are more general and
would open up for different types of contextual-execution.</p>
<p>Personally I do not see any gain by the <code class="highlighter-rouge">catch</code> itself, since it can easily be constructed using
existing constructions in the language. The <code class="highlighter-rouge">try</code>-blocks or <code class="highlighter-rouge">do</code>-notation is another matter,
if implemented properly this would be a nice composable way of dealing with compuations in a
context.</p>
<h2 id="method-position-macros">Method-position macros</h2>
<p>Method-position macros could make the <code class="highlighter-rouge">?</code> operator without <code class="highlighter-rouge">try</code>-blocks superflous, sine we would
be able to write the following:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nn">File</span><span class="p">::</span><span class="nf">create</span><span class="p">(</span><span class="s">"foo.txt"</span><span class="p">)</span><span class="py">.try</span><span class="o">!</span><span class="nf">.write_all</span><span class="p">(</span><span class="n">b</span><span class="s">"Hello world!"</span><span class="p">)</span><span class="py">.try</span><span class="o">!</span><span class="p">;</span>
</code></pre>
</div>
<p>It does not replace the <code class="highlighter-rouge">?</code> + <code class="highlighter-rouge">try</code>-blocks functionality but could serve as a good complement to
some kind of <code class="highlighter-rouge">do</code>-notation.</p>
<h2 id="do-notation-1"><code class="highlighter-rouge">do</code>-notation</h2>
<p>As detailed above it would be much more composable with different types compared to the
<code class="highlighter-rouge">try</code>-blocks which would be limited to just short-circuiting types with simple control-flow.</p>
<p>If we compare a few of the examples from the comments in the RFC-comments and how they look if
we use <code class="highlighter-rouge">do</code>-notation we can see that they are not so bad:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">self</span><span class="py">.type_variables</span><span class="nf">.borrow</span><span class="p">()</span>
<span class="nf">.probe</span><span class="p">(</span><span class="n">v</span><span class="p">)</span>
<span class="nf">.map</span><span class="p">(|</span><span class="n">t</span><span class="p">|</span> <span class="k">self</span><span class="nf">.shallow_resolve</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="nf">.unwrap_or</span><span class="p">(</span><span class="n">typ</span><span class="p">)</span>
<span class="c">// is equivalent to:</span>
<span class="n">try</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">t</span> <span class="o">=</span> <span class="k">self</span><span class="py">.type_variables</span><span class="nf">.borrow</span><span class="p">()</span><span class="nf">.probe</span><span class="p">(</span><span class="n">v</span><span class="p">)</span><span class="err">?</span><span class="p">;</span>
<span class="k">self</span><span class="nf">.shallow_resolve</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="p">}</span> <span class="n">catch</span> <span class="p">{</span>
<span class="n">_</span> <span class="k">=></span> <span class="n">typ</span>
<span class="p">}</span>
<span class="c">// which is equivalent to:</span>
<span class="k">do</span> <span class="p">{</span>
<span class="n">t</span> <span class="o"><-</span> <span class="k">self</span><span class="py">.type_variables</span><span class="nf">.borrow</span><span class="p">()</span><span class="nf">.probe</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
<span class="nf">Ok</span><span class="p">(</span><span class="k">self</span><span class="nf">.shallow_resolve</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="p">}</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="n">typ</span><span class="p">)</span>
</code></pre>
</div>
<p>Making a <code class="highlighter-rouge">Option<(A, B)></code> from two <code class="highlighter-rouge">Option</code>:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="c">// current:</span>
<span class="n">a</span><span class="nf">.and_then</span><span class="p">(|</span><span class="n">x</span><span class="p">|</span> <span class="n">b</span><span class="nf">.map</span><span class="p">(|</span><span class="n">y</span><span class="p">|</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)))</span>
<span class="c">// try + ?:</span>
<span class="n">try</span> <span class="p">{</span> <span class="p">(</span><span class="n">a</span><span class="err">?</span><span class="p">,</span> <span class="n">b</span><span class="err">?</span><span class="p">)</span> <span class="p">}</span>
<span class="c">// do and map:</span>
<span class="k">do</span> <span class="p">{</span> <span class="n">x</span> <span class="o"><-</span> <span class="n">a</span><span class="p">;</span> <span class="n">b</span><span class="nf">.map</span><span class="p">(|</span><span class="n">y</span><span class="p">|</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span> <span class="p">}</span>
<span class="c">// only do:</span>
<span class="k">do</span> <span class="p">{</span> <span class="n">x</span> <span class="o"><-</span> <span class="n">a</span><span class="p">;</span> <span class="n">y</span> <span class="o"><-</span> <span class="n">b</span><span class="p">;</span> <span class="nf">Some</span><span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span> <span class="p">}</span>
</code></pre>
</div>
<p>Multiple <code class="highlighter-rouge">try!</code> macros in a row would also be easier to deal with, especially if their
success-value is not needed:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="c">// from libsyntax, printing code fragments:</span>
<span class="nd">try!</span><span class="p">(</span><span class="k">self</span><span class="nf">.space_if_not_bol</span><span class="p">());</span>
<span class="nd">try!</span><span class="p">(</span><span class="k">self</span><span class="nf">.ibox</span><span class="p">(</span><span class="n">indent_unit</span><span class="p">));</span>
<span class="nd">try!</span><span class="p">(</span><span class="k">self</span><span class="nf">.word_nbsp</span><span class="p">(</span><span class="s">"let"</span><span class="p">));</span>
<span class="c">// can be written as:</span>
<span class="k">do</span> <span class="p">{</span>
<span class="k">self</span><span class="nf">.space_if_not_bol</span><span class="p">();</span>
<span class="k">self</span><span class="nf">.ibox</span><span class="p">(</span><span class="n">indent_unit</span><span class="p">);</span>
<span class="k">self</span><span class="nf">.word_nbsp</span><span class="p">(</span><span class="s">"let"</span><span class="p">)</span>
<span class="p">}</span><span class="py">.try</span><span class="o">!</span>
</code></pre>
</div>
<p>Personally I am a fan of <code class="highlighter-rouge">do</code>-notation since it much more general than just a control-flow-specific
language-construction and allows much more advanced ways of composing operations.</p>
<p><strong>EDIT:</strong> Posted on reddit: <a href="https://www.reddit.com/r/rust/comments/435572/blog_the_operator_and_try_vs_do/">/r/rust</a></p>
Parser Combinators: The road to Chomp 0.12015-11-28T00:00:00+00:00http://m4rw3r.github.io//parser-combinators-the-road-to-0_1<p>A few months ago I wrote some <a href="http://m4rw3r.github.io/parser-combinator-experiments-rust">articles</a> <a href="http://m4rw3r.github.io/parser-combinator-experiments-errors">about</a> <a href="http://m4rw3r.github.io/parser-combinator-experiments-part-3">parser-combinators</a>. Now I finally caved in and decided to settle with “good enough”.</p>
<p>I have now released the initial version of <a href="https://github.com/m4rw3r/chomp">Chomp</a>, a monadic-style parser-combinator compatible with stable Rust.</p>
<h1 id="monadic-style">Monadic-style?</h1>
<p>I call it monadic-style since it is not strictly monadic. Instead of implementing a real monad, I decided to go with the idea behind the third version of my <a href="https://github.com/m4rw3r/rust_parser_experiments/tree/third">parser combinator experiments</a> and explicitly thread the state through the parsers. To write a real monad which can carry state besides what is present in its type (eg. <code class="highlighter-rouge">Option</code>, <code class="highlighter-rouge">Result</code>) boxing the values and/or actions is almost mandatory.</p>
<p>In the case of a parser combinator like Parsec and Attoparsec and my own <a href="https://github.com/m4rw3r/rust_parser_experiments/blob/fourth/src/lib.rs">boxed implementation</a> it requires stacking of closures which means that we either have to settle for severely limited expressiveness in our parsers or that we have to box every single parser (in some cases both apply, see <code class="highlighter-rouge">Fn</code> vs <code class="highlighter-rouge">FnOnce</code> vs <code class="highlighter-rouge">FnMut</code>).</p>
<p>I selected manual threading of state instead of any of the other versions since it produced a parser with good performance, simple parsers, short function declarations (compare to stacking structs) for the parsers themselves as well as being straightforward to use.</p>
<h1 id="why-decide-to-write-chomp">Why decide to write Chomp?</h1>
<p>The goals for Chomp are to make a parser combinator which…</p>
<ul>
<li>…does not rely on lots of macros.</li>
<li>…does not allocate anything unless absolutely necessary.</li>
<li>…allows for monadic-style computation and composition.</li>
<li>…lends itself well to writing lots of small composable functions.</li>
<li>…hides the parser state from the user unless asked for.</li>
</ul>
<p>And then a few personal reasons:</p>
<ul>
<li>It is fun and a challenge.</li>
<li>Exploring what rust is capable of.</li>
<li>I needed something to replace the hand-written parser in one of my own projects.</li>
</ul>
<h1 id="features">Features</h1>
<p>I am going to describe two of the more important features of Chomp: The approximation of linear types used to make sure the parser state is threaded through all the parsers and the macro which makes monadic composition so much more pleasant to write and read.</p>
<h2 id="input-and-parseresult-are-opaque-linear-types"><code class="highlighter-rouge">Input</code> and <code class="highlighter-rouge">ParseResult</code> are opaque linear types</h2>
<p>The idea of explicitly passing a state parameter through all functions is definitely not a new idea, but languages like <a href="https://en.wikipedia.org/wiki/Clean_%28programming_language%29">Clean</a> have taken this a step further and use something called <a href="https://en.wikipedia.org/wiki/Substructural_type_system#Linear_type_systems">linear types</a>. A linear type is a type which cannot be cloned or destroyed and has to be used exactly once. In a lazy language this will serialize all the side-effects but in a language like Rust we can use it to make sure that we do not let the state get misused nor forgotten.</p>
<p>Rust does not have support for linear types but its ownership model allows us to get close. The first thing we do is to limit the creation and destruction of values of the type to a library function which will pass it on to a closure and then expect the linear type back. The second thing is to prevent clones and copies, this is the default in Rust so we just neglect to derive <code class="highlighter-rouge">Clone</code> or <code class="highlighter-rouge">Copy</code>. The third part we cannot strictly enforce, but we can get close by using the <code class="highlighter-rouge">must_use</code> annotation.</p>
<p>The parser type is actually a pair of linear types, <code class="highlighter-rouge">Input<'a, I></code> and <code class="highlighter-rouge">ParseResult<'a, I, T, E></code>. <code class="highlighter-rouge">Input</code> is a completely separate type to make sure that they are not confused and to prevent accidental data loss (eg. passing a <code class="highlighter-rouge">ParseResult</code> to a function which continues to parse without using the value), and to carry some extra state which <code class="highlighter-rouge">ParseResult</code> does not carry in all its variants.</p>
<p>Example of it in action with comments and full type annotations (skipping input and error type):</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="c">// `Stream::parse` accepts a closure taking an `Input` and expects a</span>
<span class="c">// `ParseResult` as a return value.</span>
<span class="k">let</span> <span class="n">r</span><span class="p">:</span> <span class="n">Result</span><span class="o"><</span><span class="n">MyData</span><span class="p">,</span> <span class="n">_</span><span class="o">></span> <span class="o">=</span> <span class="n">buffer</span><span class="nf">.parse</span><span class="p">(|</span><span class="n">i</span><span class="p">:</span> <span class="n">Input</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">|</span> <span class="k">-></span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">_</span><span class="p">,</span> <span class="n">MyData</span><span class="p">,</span> <span class="n">_</span><span class="o">></span> <span class="p">{</span>
<span class="c">// We call a parser, this consumes `i` and moves the state into `d`.</span>
<span class="k">let</span> <span class="n">d</span><span class="p">:</span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">_</span><span class="p">,</span> <span class="n">Value</span><span class="p">,</span> <span class="n">_</span><span class="o">></span> <span class="o">=</span> <span class="nf">some_parser</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="s">"some param"</span><span class="p">);</span>
<span class="c">// if we forget to use `d` here we get a warning, and probably also a</span>
<span class="c">// compile error since `buffer.parse` expects a `ParseResult` back with</span>
<span class="c">// the same lifetime as the initial `i`.</span>
<span class="n">d</span><span class="nf">.bind</span><span class="p">(|</span><span class="n">i2</span><span class="p">:</span> <span class="n">Input</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">,</span> <span class="n">t</span><span class="p">:</span> <span class="n">Value</span><span class="p">|</span> <span class="k">-></span> <span class="n">ParseResult</span><span class="o"><</span><span class="n">_</span><span class="p">,</span> <span class="n">MyData</span><span class="p">,</span> <span class="n">_</span><span class="o">></span> <span class="p">{</span>
<span class="c">// bind extracts the value of the `ParseResult` and also provides the</span>
<span class="c">// state we need to continue parsing, same rules apply as above.</span>
<span class="c">// Here we pass on `i2` to a parser and then `map` over the result:</span>
<span class="nf">decimal</span><span class="p">(</span><span class="n">i2</span><span class="p">)</span><span class="nf">.map</span><span class="p">(|</span><span class="n">n</span><span class="p">|</span> <span class="nf">MyData</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="p">})</span>
<span class="c">// And here we hand the `ParseResult` back to `parse` which destructures it</span>
<span class="c">// for the user and also manages any input state.</span>
<span class="p">});</span>
</code></pre>
</div>
<p>And to prevent the state from being observed in an uncontrolled manner there are no public properties or methods which allow the state to be seen. This will make it hard to read data out-of-band and disrupt the parsing, which makes the high-level parsers to be very composable.</p>
<p>Of course you cannot write parsers without being able to observe the data which you are parsing. For this Chomp has the <a href="http://m4rw3r.github.io/chomp/chomp/primitives/index.html">primitives</a> module which contains traits to do just this. The <code class="highlighter-rouge">combinators</code> and <code class="highlighter-rouge">parsers</code> modules use this module to observe and modify the parser state in a controlled manner.</p>
<p>And it is a public module so writing these fundamental parsers is not exclusive to the Chomp crate.</p>
<h2 id="the-parse-macro">The <code class="highlighter-rouge">parse!</code> macro</h2>
<p><em>One macro to rule them all.</em></p>
<p>This feature was a must, I started detailing the syntax I wanted for it months ago but just recently got to implementing the macro itself. Flexibility and simplicity was key for the syntax yet i needed to be powerful enough that people would not avoid using it whenever they needed to use some more powerful features.</p>
<p>Normally you would have to keep on chaining calls to <code class="highlighter-rouge">bind</code> and wrap your code in closure after closure like this (this is typical of monadic code):</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nf">take_while1</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">is_token</span><span class="p">)</span><span class="nf">.bind</span><span class="p">(|</span><span class="n">i</span><span class="p">,</span> <span class="n">method</span><span class="p">|</span>
<span class="nf">take_while1</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">is_space</span><span class="p">)</span><span class="nf">.then</span><span class="p">(|</span><span class="n">i</span><span class="p">|</span>
<span class="nf">take_while1</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">is_not_space</span><span class="p">)</span><span class="nf">.bind</span><span class="p">(|</span><span class="n">i</span><span class="p">,</span> <span class="n">uri</span><span class="p">|</span>
<span class="nf">take_while1</span><span class="p">(</span><span class="n">is_space</span><span class="p">)</span><span class="nf">.then</span><span class="p">(|</span><span class="n">i</span><span class="p">|</span>
<span class="nf">http_version</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="nf">.bind</span><span class="p">(|</span><span class="n">i</span><span class="p">,</span> <span class="n">version</span><span class="p">|</span>
<span class="n">i</span><span class="nf">.ret</span><span class="p">(</span><span class="n">Request</span> <span class="p">{</span>
<span class="n">method</span><span class="p">:</span> <span class="n">method</span><span class="p">,</span>
<span class="n">uri</span><span class="p">:</span> <span class="n">uri</span><span class="p">,</span>
<span class="n">version</span><span class="p">:</span> <span class="n">version</span><span class="p">,</span>
<span class="p">}))))))</span>
</code></pre>
</div>
<p>This is not easy to parse a human, it is way too noisy and the assignment to values through <code class="highlighter-rouge">bind</code> is not easy to spot. If you read it line by line it might be simple, but not as a whole.</p>
<p>Using the <a href="https://github.com/m4rw3r/chomp/blob/6bb50e22513c6b670dd1c22ba144be2b6884c8ab/src/macros.rs#L76">parse!</a> macro we can instead write the above code like this:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nd">parse!</span><span class="p">{</span><span class="n">i</span><span class="p">;</span>
<span class="k">let</span> <span class="n">method</span> <span class="o">=</span> <span class="nf">take_while1</span><span class="p">(</span><span class="n">is_token</span><span class="p">);</span>
<span class="nf">take_while1</span><span class="p">(</span><span class="n">is_space</span><span class="p">);</span>
<span class="k">let</span> <span class="n">uri</span> <span class="o">=</span> <span class="nf">take_while1</span><span class="p">(</span><span class="n">is_not_space</span><span class="p">);</span>
<span class="nf">take_while1</span><span class="p">(</span><span class="n">is_space</span><span class="p">);</span>
<span class="k">let</span> <span class="n">version</span> <span class="o">=</span> <span class="nf">http_version</span><span class="p">();</span>
<span class="n">ret</span> <span class="n">Request</span> <span class="p">{</span>
<span class="n">method</span><span class="p">:</span> <span class="n">method</span><span class="p">,</span>
<span class="n">uri</span><span class="p">:</span> <span class="n">uri</span><span class="p">,</span>
<span class="n">version</span><span class="p">:</span> <span class="n">version</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>The result is the same, the performance is the same and the compile time is not affected much at all, but it is much easier to read and understand what is happening.</p>
<h1 id="performance">Performance</h1>
<p>The performance is pretty good, especially when using the specific buffering available in the <a href="http://m4rw3r.github.io/chomp/chomp/buffer/index.html">buffer</a> module of Chomp.</p>
<p>The <a href="https://raw.githubusercontent.com/Geal/nom/95c228c75c2964b20f0e1e42ee11a3877c1725ef/assets/bigbuckbunny.mp4">bigbuckbunny.mp4</a> and <a href="https://raw.githubusercontent.com/Geal/nom/95c228c75c2964b20f0e1e42ee11a3877c1725ef/assets/small.mp4">small.mp4</a> are parsed using <a href="https://github.com/Geal/nom_benchmarks/tree/f12814b075145795fb2cc26c279ca5a1d7d17e34">nom_benchmarks</a>. The code for Nom, Attoparsec and Chomp can be found in the repository.</p>
<p>The HTTP file tests are the same as I used in my previous blog posts, and they invlove more than just parsing since they are reading from a file on disk and parsing it as they are reading.</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th style="text-align: right">bigbuckbunny.mp4 bench</th>
<th style="text-align: right">small.mp4 bench</th>
<th style="text-align: right">HTTP file, 2 MB</th>
<th style="text-align: right">HTTP file, 204 MB</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/m4rw3r/chomp/blob/6bb50e22513c6b670dd1c22ba144be2b6884c8ab/examples/http_parser.rs">Chomp</a><sup>1</sup></td>
<td style="text-align: right">307 ns</td>
<td style="text-align: right">355 ns</td>
<td style="text-align: right">8 ms</td>
<td style="text-align: right">486 ms</td>
</tr>
<tr>
<td><a href="https://github.com/m4rw3r/chomp/blob/6bb50e22513c6b670dd1c22ba144be2b6884c8ab/examples/http_parser.rs">Chomp</a><sup>2</sup></td>
<td style="text-align: right">384 ns</td>
<td style="text-align: right">436 ns</td>
<td style="text-align: right">9 ms</td>
<td style="text-align: right">512 ms</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/df10573763ff838988a8">Nom</a><sup>3</sup></td>
<td style="text-align: right">379 ns</td>
<td style="text-align: right">395 ns</td>
<td style="text-align: right">9 ms</td>
<td style="text-align: right">572 ms</td>
</tr>
<tr>
<td><a href="https://github.com/bos/attoparsec/blob/4f137347be02106765f6897059b88219c79bb86c/examples/rfc2616.c">Joyent http-parser</a></td>
<td style="text-align: right">–</td>
<td style="text-align: right">–</td>
<td style="text-align: right">9 ms</td>
<td style="text-align: right">626 ms</td>
</tr>
<tr>
<td><a href="https://github.com/bos/attoparsec/blob/4f137347be02106765f6897059b88219c79bb86c/examples/RFC2616.hs">Attoparsec</a></td>
<td style="text-align: right">829 ns</td>
<td style="text-align: right">833 ns</td>
<td style="text-align: right">20 ms</td>
<td style="text-align: right">1,382 ms</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/a1f8d17e120828963a19">Combine</a><sup>4</sup></td>
<td style="text-align: right">–</td>
<td style="text-align: right">–</td>
<td style="text-align: right">26 ms</td>
<td style="text-align: right">2,151 ms</td>
</tr>
</tbody>
</table>
<p>1: Chomp was compiled without <code class="highlighter-rouge">verbose_error</code> (<code class="highlighter-rouge">--no-default-features</code>).</p>
<p>2: Chomp was compiled with <code class="highlighter-rouge">verbose_error</code> feature, this is the default.</p>
<p>3: Nom seems to have issues supporting buffered reading, if an incomplete state is encountered inside of a <code class="highlighter-rouge">many0!</code> or <code class="highlighter-rouge">many1</code> it will cause the parser to error. By modifying the <code class="highlighter-rouge">many1</code> macro <a href="https://gist.github.com/m4rw3r/e7018d0b9689530c18f7">accordingly</a> (NOTE: Breaks inference for some code) I could use the <a href="https://github.com/m4rw3r/chomp/issues/14">Nom adapter</a> in Chomp to drive Nom using the buffers provided by Chomp.</p>
<p>4: Compiled with <code class="highlighter-rouge">range_stream</code> feature to enable zero-copy parsing. Combine does not have support for incomplete parsing so the whole file was loaded into memory before parsing started.</p>
<p>Note that the <code class="highlighter-rouge">verbose_error</code> version is much slower for the mp4-tests. It seems like the culprit is the <code class="highlighter-rouge">string</code> parser and its verbose error message, since it copies the string it was trying to match on error. This causes a lot of small allocations when trying to match the mp4 box tags trough an alteration. <a href="https://github.com/m4rw3r/chomp/issues/15">An issue</a> has been created for it and the mp4-parser currently works around it by not using <code class="highlighter-rouge">string</code> in a hot-path.</p>
<p>I might want to overhaul the error handling later, but the core should still be fine since it does not actually care about the type of the error itself (it is local to the <code class="highlighter-rouge">parsers</code> module).</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th>Version</th>
</tr>
</thead>
<tbody>
<tr>
<td>Attoparsec</td>
<td>0.13.0.1</td>
</tr>
<tr>
<td>Chomp</td>
<td>0.1.1</td>
</tr>
<tr>
<td>Combine</td>
<td>1.1.0</td>
</tr>
<tr>
<td>Nom</td>
<td>1.0.1</td>
</tr>
</tbody>
</table>
<p><strong>EDIT:</strong> Posted on reddit: <a href="https://www.reddit.com/r/rust/comments/3ulf3k/parser_combinators_the_road_to_chomp_01/">/r/rust</a></p>
Rust and the Monad trait - Not just higher kinded types2015-09-19T00:00:00+00:00http://m4rw3r.github.io//rust-and-monad-trait<p>Higher kinded types is something which has been discussed a lot related to Rust in the past year,
both as a feature people want, but also as a feature people do not really know what to do with.
I am going to focus on the <code class="highlighter-rouge">Monad</code> trait in this post, since that is one of the things which
makes higher kinded types so appealing.</p>
<p>Rust is strict in trait definitions, both types and lifetimes need to match the definition exactly
and there is very little leeway except for using more generics. This is both good and bad, it
guarantees a lot of invariants for the trait but for higher kinded types like <code class="highlighter-rouge">Monad</code> and
<code class="highlighter-rouge">Functor</code> it is maybe a bit too restrictive in its current form.</p>
<p>Of course this is not a full proposal — and most of this does not actually make sense as a
proper feature or syntax in a programming language — but it is more of an exploration of
what would be needed to properly implement a generic <code class="highlighter-rouge">Monad</code> trait. Hopefully this article can
serve as some kind of start for a discussion about a real RFC.</p>
<h1 id="simple-monad-definition">Simple <code class="highlighter-rouge">Monad</code> definition</h1>
<p>This is a definiton of <code class="highlighter-rouge">Monad</code> which is similar to the definition Scala uses, with the
difference that the <code class="highlighter-rouge">Applicative</code> trait is not involved:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">trait</span> <span class="n">Monad</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">;</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This looks pretty neat and works nicely for <code class="highlighter-rouge">Option<T></code>:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">impl</span> <span class="n">Monad</span><span class="p">[</span><span class="nb">Option</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="nb">Option</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="k">match</span> <span class="n">m</span> <span class="p">{</span>
<span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">=></span> <span class="nf">f</span><span class="p">(</span><span class="n">t</span><span class="p">),</span>
<span class="nb">None</span> <span class="k">=></span> <span class="nb">None</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Though once we try to apply this on <code class="highlighter-rouge">Result<T, E></code> we run into a problem: what do we do with
<code class="highlighter-rouge">E</code>? <code class="highlighter-rouge">impl<T, E> Monad<Result<T>> for Result<T, E></code>? That will not work since <code class="highlighter-rouge">Monad</code>
expects a type with only one type-parameter and <code class="highlighter-rouge">Result</code> has two. That leaves us with two
options, either denote the higher kinded parameter with a separate syntax, or allow for partial
application of type-constructors:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">E</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="n">R</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="n">R</span>
<span class="n">where</span> <span class="n">R</span><span class="o"><</span><span class="n">O</span><span class="o">></span> <span class="o">=</span> <span class="n">Result</span><span class="o"><</span><span class="n">O</span><span class="p">,</span> <span class="n">E</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="n">R</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">R</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">R</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="k">match</span> <span class="n">m</span> <span class="p">{</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">=></span> <span class="nf">f</span><span class="p">(</span><span class="n">t</span><span class="p">),</span>
<span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">=></span> <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">),</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">R</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>The purpose of the syntax above would be to create a type alias <code class="highlighter-rouge">R<O></code> as <code class="highlighter-rouge">Result<O, E></code> for
any fixed <code class="highlighter-rouge">E</code>, this would allow us to define <code class="highlighter-rouge">Monad</code> on <code class="highlighter-rouge">R<O></code> since it only requires a
single type-parameter.</p>
<h2 id="fn-fnmut-vs-fnonce"><code class="highlighter-rouge">Fn</code>, <code class="highlighter-rouge">FnMut</code> vs <code class="highlighter-rouge">FnOnce</code></h2>
<p>Another problematic issue with <code class="highlighter-rouge">Monad</code> is the type of the function <code class="highlighter-rouge">F</code> passed to <code class="highlighter-rouge">bind</code>;
it will require either <code class="highlighter-rouge">Fn</code>, <code class="highlighter-rouge">FnMut</code> or <code class="highlighter-rouge">FnOnce</code> depending on how it is used. Some monads,
like <code class="highlighter-rouge">Option<T></code>, <code class="highlighter-rouge">Result<T, E></code> and my own <code class="highlighter-rouge">Parser</code> monad are happily accepting <code class="highlighter-rouge">FnOnce</code>
which is the most permissive generic for the user of <code class="highlighter-rouge">bind</code> since they are allowed to do
almost anything inside of <code class="highlighter-rouge">F</code>.</p>
<p>On the other hand, implementing <code class="highlighter-rouge">Monad</code> for <code class="highlighter-rouge">Iterator<Item=T></code> requires an <code class="highlighter-rouge">FnMut</code> bound
on <code class="highlighter-rouge">F</code>, since <code class="highlighter-rouge">F</code> will be executed once for each item in the iterator and that disqualifies
<code class="highlighter-rouge">FnOnce</code>. And for some kind of <code class="highlighter-rouge">Future<T></code> <code class="highlighter-rouge">F</code> will probably need to be something like
<code class="highlighter-rouge">FnOnce + Send + 'static</code>, and some parallel monad would want <code class="highlighter-rouge">Fn + Send + 'static</code>.</p>
<p><code class="highlighter-rouge">FnOnce</code> is desirable since it allows the user of the monad to do simple sequencing without
putting any requirements on the code used, like this:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="c">// Note: Not copy, and not clone</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">struct</span> <span class="n">Foo</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">makeFoo</span><span class="p">()</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">Foo</span><span class="p">,</span> <span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">Foo</span><span class="p">)</span> <span class="p">}</span>
<span class="k">fn</span> <span class="nf">theAnswer</span><span class="p">()</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="nb">i32</span><span class="p">,</span> <span class="n">Error</span><span class="o">></span> <span class="p">{</span> <span class="nf">Ok</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> <span class="p">}</span>
<span class="k">let</span> <span class="n">r</span> <span class="o">=</span> <span class="nf">makeFoo</span><span class="p">()</span>
<span class="nf">.bind</span><span class="p">(|</span><span class="n">foo</span><span class="p">|</span> <span class="nf">theAnswer</span><span class="p">()</span>
<span class="nf">.bind</span><span class="p">(</span><span class="k">move</span> <span class="p">|</span><span class="n">data</span><span class="p">|</span> <span class="p">(</span><span class="n">foo</span><span class="p">,</span> <span class="n">data</span><span class="p">)));</span>
<span class="k">match</span> <span class="n">r</span> <span class="p">{</span>
<span class="nf">Ok</span><span class="p">((</span><span class="n">foo</span><span class="p">,</span> <span class="n">data</span><span class="p">))</span> <span class="k">=></span> <span class="nd">println!</span><span class="p">(</span><span class="s">"Foo: {:?}, the answer is: {}"</span><span class="p">,</span> <span class="n">foo</span><span class="p">,</span> <span class="n">data</span><span class="p">),</span>
<span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">=></span> <span class="nd">println!</span><span class="p">(</span><span class="s">"Something went wrong: {:?}"</span><span class="p">,</span> <span class="n">e</span><span class="p">),</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This typechecks nicely with a <code class="highlighter-rouge">bind(M<T>, FnOnce(T) -> M<U>)</code>, but would fail if <code class="highlighter-rouge">bind</code>
required <code class="highlighter-rouge">F</code> to be a <code class="highlighter-rouge">FnMut</code> or <code class="highlighter-rouge">Fn</code> since <code class="highlighter-rouge">Foo</code> in the code above is not <code class="highlighter-rouge">Clone</code> or
<code class="highlighter-rouge">Copy</code>. This kind of sequencing is necessary for any real-world usage of <code class="highlighter-rouge">std::io</code> or
<code class="highlighter-rouge">std::fs</code> since most structures in those modules do not implement <code class="highlighter-rouge">Clone</code> (this includes
<code class="highlighter-rouge">File</code>, <code class="highlighter-rouge">Metadata</code>, <code class="highlighter-rouge">DirEntry</code> and the list goes on).</p>
<p>Another reason for allowing <code class="highlighter-rouge">FnOnce</code> for <code class="highlighter-rouge">bind</code> is to avoid unnecessary allocations and copies
which both degrade performance and increases the risk of users being confused and frustrated by
accidentally operating on copies of the data or having to jump through hoops to satisfy the
type-checker.</p>
<p>This means that the <code class="highlighter-rouge">Monad</code> trait needs to also somehow be generic over the function-type
used by the concrete implementation while still enforcing the general signature of
<code class="highlighter-rouge">Fn*(T) -> M<U></code>. This will require <a href="https://github.com/rust-lang/rfcs/pull/1210">impl specialization</a>
since <code class="highlighter-rouge">FnOnce</code> implements <code class="highlighter-rouge">FnMut</code> and <code class="highlighter-rouge">FnMut</code> implements <code class="highlighter-rouge">Fn</code> which means that any attempt
to unify all three <code class="highlighter-rouge">Fn</code>* types under one generic will fail due to conflicting impl-blocks.</p>
<p>Another possibility is to have separate implementations of <code class="highlighter-rouge">Monad</code>, <code class="highlighter-rouge">MonadMut</code> and
<code class="highlighter-rouge">MonadOnce</code> where they all have a different <code class="highlighter-rouge">Fn*</code> type. In addition to this we provide default
implementations for the other compatible variants so that a <code class="highlighter-rouge">MonadOnce</code> can be used as a <code class="highlighter-rouge">Monad</code>:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">trait</span> <span class="n">Unit</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="k">trait</span> <span class="n">Monad</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]:</span> <span class="n">Unit</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">Fn</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="k">trait</span> <span class="n">MonadMut</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]:</span> <span class="n">Unit</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnMut</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="k">trait</span> <span class="n">MonadOnce</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]:</span> <span class="n">Unit</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="k">mod</span> <span class="n">impls</span> <span class="p">{</span>
<span class="k">use</span> <span class="nn">super</span><span class="p">::{</span><span class="n">Monad</span><span class="p">,</span> <span class="n">MonadMut</span><span class="p">,</span> <span class="n">MonadOnce</span><span class="p">};</span>
<span class="k">impl</span><span class="o"><</span><span class="n">M</span><span class="p">:</span> <span class="n">MonadOnce</span><span class="o">></span> <span class="n">MonadMut</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="n">M</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnMut</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="nn">MonadOnce</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">M</span><span class="p">:</span> <span class="n">MonadMut</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="n">M</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">Fn</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="nn">MonadMut</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Example without higher kinded types: <a href="https://play.rust-lang.org/?gist=4480a38393cd27097bb2&version=stable">playground</a>.</p>
<p>The drawback of this is that the trait needs to be specified for each use, and the traits
themselves are incompatible in the reverse direction of the inheritance chain. This is not so
ergonomic for the user, what monad-trait should he/she use?</p>
<h1 id="the-iterator-monad">The <code class="highlighter-rouge">Iterator</code> monad</h1>
<p>Attempting to define the <code class="highlighter-rouge">Monad</code> implementation for an arbitrary <code class="highlighter-rouge">I: Iterator</code> is problematic
due to the type-parameters for the return types of <code class="highlighter-rouge">bind</code> and <code class="highlighter-rouge">F</code> since neither <code>I<U></code> nor
<code class="highlighter-rouge">Iterator<Item=U></code> can be used because the source iterator (ie. the type after the <code class="highlighter-rouge">for</code>
keyword) is not of the same concrete type as the iterator returned by either <code class="highlighter-rouge">F</code> or <code class="highlighter-rouge">bind</code>,
and plain traits are not allowed as a return-type.</p>
<p>My own <code class="highlighter-rouge">Parser</code> monad also has this issue, and it is also shared by the <code class="highlighter-rouge">State</code>–,
<code class="highlighter-rouge">Future</code>–, and possibly some async-IO–monad. The base-type is generic but <code class="highlighter-rouge">F</code>,
<code class="highlighter-rouge">bind</code> and <code class="highlighter-rouge">unit</code> do not return the same concrete type (since they generally are all closures,
which means that their actual type is a compiler implementation-detail and cannot be described and
is always unique).</p>
<p>This would require us to be able to implement a trait for another trait, and not just a generic,
since the generic type would be the same <code class="highlighter-rouge">Self</code> in the return from <code class="highlighter-rouge">F</code>, <code class="highlighter-rouge">bind</code> and <code class="highlighter-rouge">unit</code>.
If this constraint remains in place it would be impossible to make a generic and extensible
monad implementing the <code class="highlighter-rouge">Monad</code> trait without having to box everything on every <code class="highlighter-rouge">bind</code> and
<code class="highlighter-rouge">unit</code> operation which will cause horrible performance issues.</p>
<p>Here I assume that we can use <code class="highlighter-rouge">impl Trait</code> to denote that it is some concrete type implementing
<code class="highlighter-rouge">Trait</code> (but not necessarily the same concrete type), which would allow us to use
<code class="highlighter-rouge">impl I<T> where I<X> = Iterator<Item=X></code> (we need to somehow alias the associated type into a
normal generic, or otherwise be able to use an associated type in HKT):</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">I</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="k">impl</span> <span class="n">I</span>
<span class="n">where</span> <span class="n">I</span><span class="o"><</span><span class="n">X</span><span class="o">></span> <span class="o">=</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">Item</span><span class="o">=</span><span class="n">X</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnMut</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="c">// Create new iterator wrapping self and F yielding items from</span>
<span class="c">// the iterator f(self.next()). Essentially flat_map but</span>
<span class="c">// without the need for allocations.</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">unit</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="c">// Iterator yielding the item once</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This monad would allow us to build lazy list-comprehensions and similar constructions:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">let</span> <span class="n">r</span><span class="p">:</span> <span class="nb">Vec</span><span class="o"><</span><span class="p">(</span><span class="nb">u32</span><span class="p">,</span> <span class="nb">u32</span><span class="p">)</span><span class="o">></span> <span class="o">=</span> <span class="mi">0</span><span class="err">..</span><span class="mi">10</span><span class="nf">.iter</span><span class="p">()</span>
<span class="nf">.bind</span><span class="p">(|</span><span class="n">i</span><span class="p">|</span> <span class="n">i</span><span class="err">..</span><span class="mi">10</span><span class="nf">.iter</span><span class="p">()</span>
<span class="nf">.bind</span><span class="p">(|</span><span class="n">j</span><span class="p">|</span> <span class="nn">Monad</span><span class="p">::</span><span class="nf">unit</span><span class="p">((</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">))))</span>
<span class="nf">.collect</span><span class="p">();</span>
<span class="c">// r = [(0, 0), (0, 1), ..., (1, 1), (1, 2), ..., (8, 8), (8, 9), (9, 9)]</span>
</code></pre>
</div>
<p>Another possibility to solve this would be to use a newtype allowing <code class="highlighter-rouge">impl Trait</code> inside it. But
it might be problematic and confusing for the the user since suddenly you have a type which acts as
a <code class="highlighter-rouge">Sized</code> type in most aspects (can be stack allocated, can be passed by value, can be put inside
of structs without <code class="highlighter-rouge">Box</code>es) but does not have the same size as any other instance of the same
type (eg. this will prevent you from putting multiple instances inside of a <code class="highlighter-rouge">Vec</code> without
<code class="highlighter-rouge">Box</code>ing the items first).</p>
<p>The existing closures behave just like that, so there is precedent for allowing types which behave
like this. But the downside of this approach is that the user would be required to do an explicit
cast the value by wrapping it in its newtype before being able to treat it as a <code class="highlighter-rouge">Monad</code>.</p>
<h2 id="lifetimes">Lifetimes</h2>
<p>So, let’s say we got all the above working, now we have our <code class="highlighter-rouge">Monad</code> trait and we are done with this,
right? No, we still have another class of types which exists in Rust: lifetimes.</p>
<p>Lifetimes which are a part of the monad can be moved out by using partial application of
type-constructors and treating the lifetime as just another type parameter:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">M</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="n">M</span>
<span class="n">where</span> <span class="n">M</span><span class="o"><</span><span class="n">X</span><span class="o">></span> <span class="o">=</span> <span class="n">MyType</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">X</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>On the other hand, a monad which is lazy (eg. the <code class="highlighter-rouge">Iterator</code> monad) will require <code class="highlighter-rouge">m</code> and
<code class="highlighter-rouge">F</code> to have the same lifetime as the returned <code>M<U></code> (It will also require the same
<code class="highlighter-rouge">impl Trait</code> or similar feature to allow different concrete types):</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">I</span><span class="p">,</span> <span class="n">M</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="k">impl</span> <span class="n">I</span>
<span class="n">where</span> <span class="n">I</span><span class="o"><</span><span class="n">X</span><span class="o">></span> <span class="o">=</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">Item</span><span class="o">=</span><span class="n">X</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">S</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="n">S</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span>
<span class="n">where</span> <span class="n">S</span><span class="p">:</span> <span class="n">I</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="nf">FnMut</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span> <span class="p">{</span>
<span class="n">BindIter</span> <span class="p">{</span> <span class="n">source</span><span class="p">:</span> <span class="n">m</span><span class="p">,</span> <span class="n">fun</span><span class="p">:</span> <span class="n">f</span><span class="p">,</span> <span class="n">current</span><span class="p">:</span> <span class="nf">UnitIter</span><span class="p">(</span><span class="nb">None</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="nf">UnitIter</span><span class="p">(</span><span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>Note the <code class="highlighter-rouge">'a</code> constraint on <code class="highlighter-rouge">S</code>, <code class="highlighter-rouge">F</code> and the return of <code class="highlighter-rouge">bind</code>. This is necessary
because lifetime elision will restrict the lifetimes specified on <code class="highlighter-rouge">bind</code> to be (using
<code class="highlighter-rouge">+ 'lifetime</code> to detail the restriction)
<code>fn bind(m + 'a, f: F + 'b) -> I<U> + 'a where F: FnMut(T + 'c) -> I<U> + 'c</code> and <code class="highlighter-rouge">'c</code>
shorter than <code class="highlighter-rouge">'b</code> which is shorter than <code class="highlighter-rouge">'a</code>. So either <code class="highlighter-rouge">F</code> (elide all lifetimes) or
<code class="highlighter-rouge">S</code> (when <code class="highlighter-rouge">'a</code> is added to <code class="highlighter-rouge">F</code> and the return of <code class="highlighter-rouge">bind</code>) will not live long enough
without these extra annotations.</p>
<p>The <code class="highlighter-rouge">Option</code>, <code class="highlighter-rouge">Result</code> — and other types which do not need <code class="highlighter-rouge">F</code> and/or <code class="highlighter-rouge">S</code> to live as
long as the return from <code class="highlighter-rouge">bind</code> — will not use this constraint at all.</p>
<p>This means that <code class="highlighter-rouge">Monad[M<_>]</code> not only needs to be generic over the type it is implemented on
and the <code class="highlighter-rouge">Fn*</code> type it uses, it also needs to be generic over the lifetimes in the signature
of <code class="highlighter-rouge">bind</code>. Another way to accomplish this would be to let traits be used as associated types,
then the desired function-trait can be used as an associated type by using the <code class="highlighter-rouge">unboxed_closure</code>
feature.</p>
<p>So now we have a <code class="highlighter-rouge">Monad[M<_>]</code> trait which looks more like this:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="c">// Need to be separate due to inference issues</span>
<span class="k">trait</span> <span class="n">Unit</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="c">// Separate trait just for bind since F is intended to be used as a free</span>
<span class="c">// generic specified by the trait-implementation, and T needs to be available</span>
<span class="c">// for the Fn* generic</span>
<span class="k">trait</span> <span class="n">Monad</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">F</span><span class="o">></span><span class="p">:</span> <span class="n">Unit</span><span class="p">[</span><span class="n">M</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="c">// Let's assume we can use the trait Func and its associated types</span>
<span class="c">// Input and Output to enforce a function-signature</span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="n">Func</span><span class="p">,</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Input</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span><span class="p">,),</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Output</span> <span class="o">=</span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This enables us to add lifetimes to <code class="highlighter-rouge">F</code>, and with the help of <code class="highlighter-rouge">impl Trait</code> in type-position
we can also add the same lifetime to <code class="highlighter-rouge">m</code> and the return of <code class="highlighter-rouge">bind</code>. It is a bit cumbersome,
but looks promising:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="c">// Option monad</span>
<span class="k">impl</span> <span class="n">Unit</span><span class="p">[</span><span class="nb">Option</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="nb">Option</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span> <span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">F</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="nb">Option</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">F</span><span class="o">></span> <span class="k">for</span> <span class="nb">Option</span>
<span class="c">// We leave out the return type definition to bind using the feature</span>
<span class="c">// unboxed_closure to be able to use the <> notation. If we are not</span>
<span class="c">// allowed to use this feature we also need to move U to the Monad trait</span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="n">FnMut</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,)</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="n">Func</span><span class="p">,</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Input</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span><span class="p">,),</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Output</span> <span class="o">=</span> <span class="nb">Option</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="k">match</span> <span class="n">m</span> <span class="p">{</span>
<span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">=></span> <span class="nf">f</span><span class="p">(</span><span class="n">t</span><span class="p">),</span>
<span class="nb">None</span> <span class="k">=></span> <span class="nb">None</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">// Iterator monad</span>
<span class="k">impl</span> <span class="n">Unit</span><span class="p">[</span><span class="n">I</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span> <span class="k">for</span> <span class="k">impl</span> <span class="n">I</span>
<span class="n">where</span> <span class="n">I</span><span class="o"><</span><span class="n">X</span><span class="o">></span> <span class="o">=</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">Item</span><span class="o">=</span><span class="n">X</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span> <span class="err">...</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">F</span><span class="o">></span> <span class="n">Monad</span><span class="p">[</span><span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">_</span><span class="o">></span><span class="p">]</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">F</span><span class="o">></span> <span class="k">for</span> <span class="k">impl</span> <span class="n">I</span>
<span class="c">// Use the lifetime 'a here to restrict all uses of I<_></span>
<span class="n">where</span> <span class="n">I</span><span class="o"><</span><span class="n">X</span><span class="o">></span> <span class="o">=</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">Item</span><span class="o">=</span><span class="n">X</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span>
<span class="c">// Here we can add that F needs to live at least as long as</span>
<span class="c">// the return from bind (and Self):</span>
<span class="n">F</span><span class="p">:</span> <span class="n">FnMut</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,)</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span> <span class="p">{</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="n">Func</span><span class="p">,</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Input</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span><span class="p">,),</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Output</span> <span class="o">=</span> <span class="k">impl</span> <span class="n">I</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="c">// Now we can put both self and f inside of something and</span>
<span class="c">// return it safely</span>
<span class="err">...</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>And now we actually have a <code class="highlighter-rouge">Monad</code> trait which allows for different <code class="highlighter-rouge">Fn*</code> types, different
lifetime requirements on the function passed to <code class="highlighter-rouge">bind</code> and it can also be implemented for
N-ary types as well as traits themselves.</p>
<h2 id="receiver-types">Receiver types</h2>
<p>An additional, and slightly smaller, problem is receiver types. All parameters to <code class="highlighter-rouge">bind</code> have
so far been taken by value, which means that unless the type implementing the monad trait is
<code class="highlighter-rouge">Copy</code> the original value would be invalidated once it is passed to <code class="highlighter-rouge">bind</code>. This is a bit
problematic when you want to reuse the same piece of data multiple times without having to build
the monad computation anew every time it has to be used.</p>
<p>Haskell’s GHC does some limited memoization to avoid this issue, eg. when performing repeated
calls to the same function with the same value it will only calculate the value one, which solves
this issue. Rust cannot do that since there is no runtime, and the value itself would have to be
<code class="highlighter-rouge">Clone</code>.</p>
<p>I am not sure I see too much use of <code class="highlighter-rouge">&self</code>, and <code class="highlighter-rouge">&mut self</code> seems to be somewhat redundant
since you cannot use <code class="highlighter-rouge">&mut self</code> for anything else until the specific monad instance has been
computed.</p>
<p>The solution to these two would probably be to implement specific monad-traits for the
receiver-types. Hopefully this will not have to be combined with different traits for <code class="highlighter-rouge">Fn*</code> types
since that would result in a lot of different, somewhat-incompatible, monad-traits.</p>
<h1 id="attempting-to-use-the-monad-trait-in-a-generic-way">Attempting to use the <code class="highlighter-rouge">Monad</code> trait in a generic way</h1>
<p>The code above is all ok as long as we do not try to actually be generic over the <code class="highlighter-rouge">Monad</code>
trait and let the compiler infer all the types. Then the <code class="highlighter-rouge">Monad::bind</code> and <code class="highlighter-rouge">Monad::unit</code>
functions work pretty well. But what if we try to define functions generic over ANY <code class="highlighter-rouge">Monad</code>?</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="n">liftM2</span><span class="o"><</span><span class="n">M</span><span class="p">,</span> <span class="n">MT</span><span class="p">,</span> <span class="n">MU</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">H</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">,</span> <span class="n">m1</span><span class="p">:</span> <span class="n">MT</span><span class="p">,</span> <span class="n">m2</span><span class="p">:</span> <span class="n">MU</span><span class="p">)</span>
<span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="nn">F</span><span class="p">::</span><span class="n">Output</span><span class="p">,</span> <span class="n">H</span><span class="o">></span>
<span class="n">where</span> <span class="n">M</span><span class="p">:</span> <span class="n">Monad</span><span class="p">,</span>
<span class="n">MT</span> <span class="o">=</span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="n">MU</span> <span class="o">=</span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="n">Fn</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="p">)</span><span class="o">></span><span class="p">,</span>
<span class="n">H</span><span class="p">:</span> <span class="n">Fn</span><span class="o"><</span><span class="p">(</span><span class="n">U</span><span class="p">,)</span><span class="o">></span> <span class="p">{</span>
<span class="nn">Monad</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="k">move</span> <span class="p">|</span><span class="n">x1</span><span class="p">|</span> <span class="nn">Monad</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="k">move</span> <span class="p">|</span><span class="n">x2</span><span class="p">|</span> <span class="nn">Unit</span><span class="p">::</span><span class="nf">unit</span><span class="p">(</span><span class="nf">f</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">))))</span>
<span class="p">}</span>
</code></pre>
</div>
<p>The above will actually not work. Firstly, the generic <code class="highlighter-rouge">H</code> will not be possible to infer.
Secondly, all the different <code class="highlighter-rouge">Fn*</code> parameters are creating lots of unnecessary noise as well as
restricts the user to the most basic use of <code class="highlighter-rouge">bind</code>. By using “associated traits” we can
simplify it a tiny bit:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="n">liftM2</span><span class="o"><</span><span class="n">M</span><span class="p">,</span> <span class="n">MT</span><span class="p">,</span> <span class="n">MU</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">,</span> <span class="n">m1</span><span class="p">:</span> <span class="n">MT</span><span class="p">,</span> <span class="n">m2</span><span class="p">:</span> <span class="n">MU</span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="nn">F</span><span class="p">::</span><span class="n">Output</span><span class="o">></span>
<span class="n">where</span> <span class="n">M</span><span class="p">:</span> <span class="n">Monad</span><span class="p">,</span>
<span class="n">MT</span> <span class="o">=</span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="n">MU</span> <span class="o">=</span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="n">Fn</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="p">)</span><span class="o">></span> <span class="p">{</span>
<span class="nn">Monad</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="k">move</span> <span class="p">|</span><span class="n">x1</span><span class="p">|</span> <span class="nn">Monad</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="k">move</span> <span class="p">|</span><span class="n">x2</span><span class="p">|</span> <span class="nn">Unit</span><span class="p">::</span><span class="nf">unit</span><span class="p">(</span><span class="nf">f</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">))))</span>
<span class="p">}</span>
</code></pre>
</div>
<p>It is still very far from being ergonomic to use. I suspect that we will need to be able to
implement traits for types without having to specify their type-parameters, as well as using
traits and types which are partially applied as types and type-parameters. This would enable us
to use both the “associated traits” and the types implenting traits as a type-constructors if
their type-parameters are not fully specified:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="n">liftM2</span><span class="o"><</span><span class="n">M</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">,</span> <span class="n">m1</span><span class="p">:</span> <span class="n">M</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">m2</span><span class="p">:</span> <span class="n">M</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">M</span><span class="o"><</span><span class="nn">F</span><span class="p">::</span><span class="n">Output</span><span class="o">></span>
<span class="n">where</span> <span class="n">M</span><span class="p">:</span> <span class="n">Monad</span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="nn">M</span><span class="p">::</span><span class="n">BindFn</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="p">)</span><span class="o">></span> <span class="p">{</span>
<span class="nn">Monad</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="n">m1</span><span class="p">,</span> <span class="k">move</span> <span class="p">|</span><span class="n">x1</span><span class="p">|</span> <span class="nn">Monad</span><span class="p">::</span><span class="nf">bind</span><span class="p">(</span><span class="n">m2</span><span class="p">,</span> <span class="k">move</span> <span class="p">|</span><span class="n">x2</span><span class="p">|</span> <span class="nn">Unit</span><span class="p">::</span><span class="nf">unit</span><span class="p">(</span><span class="nf">f</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">))))</span>
<span class="p">}</span>
</code></pre>
</div>
<p>The above code requires the use of traits in place of concrete types, having the compiler
monomorphize them to concrete types without having to be explicitly generic over them. This is
starting to look a lot more like how <code class="highlighter-rouge">Monad</code> is declared and used in Haskell, more so than how
it is in Scala, and it cuts down on a lot of noise which the extra generics otherwise would have
added.</p>
<h1 id="what-is-needed-to-be-able-to-implement-a-generic-monad-trait">What is needed to be able to implement a generic <code class="highlighter-rouge">Monad</code> trait?</h1>
<ul>
<li>Proper higher kinded types. Currently this can be emulated using associated types and
trait-inheritance, but this has the downside that it cannot enforce the signature of the types
used by <code class="highlighter-rouge">F</code> or returned from <code class="highlighter-rouge">bind</code>.</li>
<li>Partial application of type-constructors, to allow for implementing <code class="highlighter-rouge">Monad</code> for N-arity
type-constructors as well as associated types.</li>
<li><a href="https://github.com/rust-lang/rfcs/pull/1210">impl specialization</a> or otherwise some unification for <code class="highlighter-rouge">Fn*</code> traits to enforce the function
signature of <code class="highlighter-rouge">F</code> passed to <code class="highlighter-rouge">bind</code>, since the <code class="highlighter-rouge">Monad</code> trait needs to be generic over the
<code class="highlighter-rouge">Fn*</code> type itself.</li>
<li>Type-equality constraints in <code class="highlighter-rouge">where</code>-clauses: <a href="https://github.com/rust-lang/rust/issues/20041">#20041</a>
This will allow the <code class="highlighter-rouge">F</code> passed to <code class="highlighter-rouge">bind</code> to be a generic specified by the implementor and
be enforced to have the correct function signature.</li>
<li>Implementing a trait for another trait needs to have some kind of mechanism which allows the
concrete type used to be different within the same <code class="highlighter-rouge">impl</code>-block without failing to typecheck.
Eg. all <code class="highlighter-rouge">impl Trait</code> inside of the same <code class="highlighter-rouge">impl</code>-block could be considered to be equivalent to
the generic <code class="highlighter-rouge">T: Trait</code> for typechecking of traits. Otherwise the <code>M<U></code> return
of <code class="highlighter-rouge">F</code> and <code class="highlighter-rouge">bind</code> will not typecheck for monads more complex than <code class="highlighter-rouge">Option</code> or <code class="highlighter-rouge">Result</code>.
Another solution would be to allow traits to be used in type-position and then let the compiler
monomorphize.</li>
</ul>
<p>The above also holds true for the <code class="highlighter-rouge">Functor</code> trait, since <code class="highlighter-rouge">Option</code> and <code class="highlighter-rouge">Result</code> can take
<code class="highlighter-rouge">F = FnOnce</code> while <code class="highlighter-rouge">Iterator</code> can at most take <code class="highlighter-rouge">F = FnMut</code>, <code class="highlighter-rouge">Iterator</code>’s <code class="highlighter-rouge">map</code> also
returns a different concrete type compared to the original type.</p>
<p>There are a few additional features which can be added to this list if ease of use is considered,
and it is not just limited to these:</p>
<ul>
<li>Allowing types and function signatures to be generic over type-constructors.</li>
<li>Allowing traits to be used in type-position, letting the compiler monomorphize the type.</li>
<li>Allowing restrictions defined in traits on associated types.</li>
</ul>
<p>This would allow us to hopefully write <code class="highlighter-rouge">Monad</code> as something more like this:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">trait</span> <span class="n">Monad</span>
<span class="c">// The type monad is implemented on must be a 1-arity type-constructor</span>
<span class="n">where</span> <span class="n">Type</span><span class="p">:</span> <span class="o"><</span><span class="n">_</span><span class="o">></span> <span class="k">-></span> <span class="p">{</span>
<span class="c">// Trait-parameter of 1-arity, inheriting from the Func trait</span>
<span class="k">type</span> <span class="n">BindFn</span><span class="o"><</span><span class="n">A</span><span class="o">></span><span class="p">:</span> <span class="n">Func</span><span class="p">;</span>
<span class="c">// Here we assume we can use the syntax Type<...> to parameterize</span>
<span class="c">// the self type, same for BindFn:</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="n">Type</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">Self</span> <span class="o">=</span> <span class="n">Type</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="nn">Type</span><span class="p">::</span><span class="n">BindFn</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,)</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Output</span> <span class="o">=</span> <span class="n">Type</span><span class="o"><</span><span class="n">U</span><span class="o">></span><span class="p">;</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">Type</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="k">impl</span> <span class="n">Monad</span> <span class="k">for</span> <span class="nb">Option</span> <span class="p">{</span>
<span class="k">trait</span> <span class="n">BindFn</span><span class="o"><</span><span class="n">A</span><span class="o">></span> <span class="o">=</span> <span class="n">FnOnce</span><span class="o"><</span><span class="n">A</span><span class="o">></span><span class="p">;</span>
<span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">U</span><span class="o">></span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">U</span><span class="o">></span>
<span class="n">where</span> <span class="n">Self</span> <span class="o">=</span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="nn">Type</span><span class="p">::</span><span class="n">BindFn</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,)</span><span class="o">></span> <span class="o">+</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span>
<span class="nn">F</span><span class="p">::</span><span class="n">Output</span> <span class="o">=</span> <span class="nb">Option</span><span class="o"><</span><span class="n">U</span><span class="o">></span> <span class="p">{</span>
<span class="k">match</span> <span class="k">self</span> <span class="p">{</span>
<span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">=></span> <span class="nf">f</span><span class="p">(</span><span class="n">t</span><span class="p">),</span>
<span class="nb">None</span> <span class="k">=></span> <span class="nb">None</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">unit</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span> <span class="nf">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p><strong>EDIT:</strong> Posted on reddit: <a href="https://www.reddit.com/r/rust/comments/3li3by/rust_and_the_monad_trait_not_just_higher_kinded/">/r/rust</a></p>
<p><strong>EDIT 2015-09-21:</strong> <a href="https://github.com/m4rw3r/m4rw3r.github.io/commit/68bceb3f427b9f24ba0ec60f1b749a569a4a8981">diff</a></p>
<p>Changed <code class="highlighter-rouge">Monad</code> definition to not include the <code class="highlighter-rouge">T</code> generic in the impl but only in <code class="highlighter-rouge">bind</code>
where possible. The HKT syntax was also changed to <code class="highlighter-rouge">Trait[Type<_>]</code> to indicate a type of
kind <code class="highlighter-rouge">* -> *</code> required for the impl of <code class="highlighter-rouge">Trait</code>.</p>
Parser Combinator Experiments in Rust - Part 3: Performance and impl Trait2015-09-07T00:00:00+00:00http://m4rw3r.github.io//parser-combinator-experiments-performance<h1 id="performance-update">Performance update</h1>
<p><a href="https://github.com/Geal">Geoffroy Couprie</a>, the creator of <a href="https://github.com/Geal/nom">Nom</a>,
suggested to me earlier that I should replace the <code class="highlighter-rouge">is_token</code> implementation from the previous
performance tests since it is pretty slow. The previous version was a pretty simple one-liner with
some significant overhead since it was written to be easy to read and to be comparable to the
<code class="highlighter-rouge">notInClass</code> function from the Haskell parsers in terms of readability:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">is_token</span><span class="p">(</span><span class="n">c</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-></span> <span class="nb">bool</span> <span class="p">{</span>
<span class="n">c</span> <span class="o"><</span> <span class="mi">128</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">></span> <span class="mi">31</span> <span class="o">&&</span> <span class="n">b</span><span class="s">"()<>@,;:</span><span class="se">\\</span><span class="err">\</span><span class="s">"</span><span class="err">/</span><span class="p">[]</span><span class="err">?</span><span class="o">=</span><span class="p">{}</span> <span class="err">\</span><span class="n">t</span><span class="s">".iter()
.position(|&i| i == c).is_none()
}
</span></code></pre>
</div>
<p>The version above is performing a naive linear search, which is far from the most efficient method
of determining if a character is in the given set. Attoparsec actually creates a binary search
tree, which is much faster. Here is the optimized version Geoffroy suggested:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">is_token</span><span class="p">(</span><span class="n">c</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-></span> <span class="nb">bool</span> <span class="p">{</span>
<span class="c">// roughly follows the order of ascii chars: "\"(),/:;<=>?@[\\]{} \t"</span>
<span class="n">c</span> <span class="o"><</span> <span class="mi">128</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">></span> <span class="mi">32</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">'\t'</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">'"'</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">'('</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">')'</span> <span class="o">&&</span>
<span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">','</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">'/'</span> <span class="o">&&</span> <span class="o">!</span><span class="p">(</span><span class="n">c</span> <span class="o">></span> <span class="mi">57</span> <span class="o">&&</span> <span class="n">c</span> <span class="o"><</span> <span class="mi">65</span><span class="p">)</span> <span class="o">&&</span> <span class="o">!</span><span class="p">(</span><span class="n">c</span> <span class="o">></span> <span class="mi">90</span> <span class="o">&&</span> <span class="n">c</span> <span class="o"><</span> <span class="mi">94</span><span class="p">)</span> <span class="o">&&</span>
<span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">'{'</span> <span class="o">&&</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">b</span><span class="sc">'}'</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This is of course much harder to read but gives a decent performance boost. Since it affects all
Rust parser-combinators I have replaced their <code class="highlighter-rouge">is_token</code> implementations with the optimized
version from above:</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th style="text-align: right">Time, 21 kB</th>
<th style="text-align: right">Time, 2 MB</th>
<th style="text-align: right">Time, 204 MB</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/bos/attoparsec/blob/4f137347be02106765f6897059b88219c79bb86c/examples/rfc2616.c">C http-parser</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.62 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/cda66a9308ecb91f7147">Experiment</a><sup>1</sup></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.71 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/54f7d80a3a5232c85d79">Nom</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.013 s</td>
<td style="text-align: right">1.01 s</td>
</tr>
<tr>
<td><a href="https://github.com/bos/attoparsec/blob/4f137347be02106765f6897059b88219c79bb86c/examples/RFC2616.hs">Attoparsec</a></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.021 s</td>
<td style="text-align: right">1.45 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/4e82c4ee10deb1e141fc">Combine</a><sup>2</sup></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.067 s</td>
<td style="text-align: right">6.10 s</td>
</tr>
<tr>
<td>Parsec<sup>3</sup></td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.490 s</td>
<td style="text-align: right">47.75 s</td>
</tr>
</tbody>
</table>
<p>1: <a href="https://github.com/m4rw3r/rust_parser_experiments/tree/fifth">Library code from last post</a>,
with verbose errors enabled.</p>
<p>2: Combine is now stable. Sadly it does not include the <a href="https://github.com/Marwes/combine/pull/42">Ranged Stream</a>
pull-request which was used in the Combine-pre benchmark in the last post, this means that the
parser tested here is not a zero-copy parser and should be compared with the Combine-beta from
the last post.</p>
<p>3: Uses the exact same parser-code as the Attoparsec benchmark.</p>
<h1 id="experimental-impl-trait">Experimental <code class="highlighter-rouge">impl Trait</code></h1>
<p>There is an <a href="https://github.com/eddyb/rust/commits/calendar-driven-development">experimental branch maintained by eddyb</a>
which implements the now closed <a href="https://github.com/rust-lang/rfcs/pull/105">RFC PR #105</a>. This
implementation allows for unboxed abstract return types through the use of the <code class="highlighter-rouge">impl</code> keyword
in a function return type.</p>
<p>The interesting thing with this in the context of monadic parser-combinator is that it is suddenly
no longer required to box closures whenever any are returned. This avoids a lot of allocations
as well as enables for more optimizations at the same time it makes it somewhat easier to manage
the types.</p>
<h2 id="monadic-parser-combinator">Monadic parser combinator</h2>
<p>The basics of a monadic parser-combinator is to use the combinator functions to create new
functions which are to be executed later once a parsing context exists (ie. some input to parse).
This means that the monadic type <code class="highlighter-rouge">Parser<T></code> in simplified terms is <code class="highlighter-rouge">Fn(&[u8]) -> Option<(T, &[u8])></code>,
where <code class="highlighter-rouge">u8</code> is the input type, and the combinators themselves as well as the functions which create
parsers in the <code class="highlighter-rouge">Parser<T></code> monad are of the type <code class="highlighter-rouge">Fn(...) -> (Fn(&[u8]) -> Option<(T, &[u8])>)</code>.</p>
<p>In the manual threading-of-state the input is present from the start and the functions can
immediately operate on the input: <code class="highlighter-rouge">Fn(&[u8], ...) -> Option<(T, &[u8])></code>. This is not as
composable as the purely monadic parser-combinator since the state always needs to be considered
during combination and not only when building the combinators and primitive parsers.</p>
<h2 id="the-unboxed-parsert">The unboxed <code class="highlighter-rouge">Parser<T></code></h2>
<p>The boxed example from <a href="/parser-combinator-experiments-rust">the first article of this series</a>
is using the (simplified) type <code class="highlighter-rouge">Box<FnBox(&[u8]) -> Option<(T, &[u8])>></code> which is heap-allocated.
But with the experimental branch we can use the <code class="highlighter-rouge">impl Trait</code> notation to make this closure
stack-allocated: <code class="highlighter-rouge">impl Fn(&[u8]) -> Option<(T, &[u8])></code>, of course this is not something we want
to copy all over the place, so we implement a trait which is implemented for <code class="highlighter-rouge">Fn(&[u8]) -> Option<(T, &[u8])></code>:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">pub</span> <span class="k">trait</span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">parse</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span><span class="o">></span><span class="p">;</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="k">for</span> <span class="n">F</span>
<span class="n">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">parse</span><span class="p">(</span><span class="k">self</span><span class="p">,</span> <span class="n">i</span><span class="p">:</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span><span class="o">></span> <span class="p">{</span>
<span class="k">self</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
<p>This is a very simplified signature which does not support any error handling or incomplete input,
or different input types. The <code class="highlighter-rouge">'a</code> lifetime is exposed through the <code class="highlighter-rouge">Parser<'a, T></code> type to
allow zero-copy parsers to return <code class="highlighter-rouge">Parser<'a, &'a [u8]></code>, typically this <code class="highlighter-rouge">'a</code> will be free
unless constructions like this are needed.</p>
<p>The above allows us to write:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">struct</span> <span class="n">MyData</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">></span> <span class="p">{</span>
<span class="n">name</span><span class="p">:</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">],</span>
<span class="n">last_name</span><span class="p">:</span> <span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">],</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">my_parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">></span><span class="p">()</span> <span class="k">-></span> <span class="k">impl</span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">MyStruct</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">>></span> <span class="p">{</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_till</span><span class="p">(|</span><span class="n">c</span><span class="p">|</span> <span class="n">c</span> <span class="o">==</span> <span class="n">b</span><span class="sc">' '</span><span class="p">),</span> <span class="p">|</span><span class="n">name</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">char</span><span class="p">(</span><span class="n">b</span><span class="sc">' '</span><span class="p">),</span> <span class="p">|</span><span class="n">_</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_till</span><span class="p">(|</span><span class="n">c</span><span class="p">|</span> <span class="o">==</span> <span class="n">b</span><span class="sc">'\n'</span><span class="p">),</span> <span class="p">|</span><span class="n">last_name</span><span class="p">|</span>
<span class="n">MyData</span><span class="p">{</span>
<span class="n">name</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span>
<span class="n">last_name</span><span class="p">:</span> <span class="n">last_name</span><span class="p">,</span>
<span class="p">})))</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">b</span><span class="s">"Martin Wernstål</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="k">let</span> <span class="n">parser</span> <span class="o">=</span> <span class="nf">my_parser</span><span class="p">();</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{:?}"</span><span class="p">,</span> <span class="n">parser</span><span class="nf">.parse</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
<span class="c">// MyData{name: &b"Martin", last_name: &b"Wernstål"}</span>
<span class="p">}</span>
</code></pre>
</div>
<h2 id="performance">Performance</h2>
<p>After making some very suspect (and test breaking) changes to eddyb’s code I finally got
<a href="https://github.com/m4rw3r/rust_parser_experiments/tree/ebedd36f2f7e19171c65e38fdee3822d5daa4090">the code</a>
to compile. This also includes avoiding any kind of linking since metadata is broken for
<code class="highlighter-rouge">impl Trait</code> at the moment of writing. (<a href="https://gist.github.com/m4rw3r/9128819a56db444ba402">Diff</a>:
Fixed a probable typo, added monomorphize at some spots and finally removed the check for
<code class="highlighter-rouge">has_escaping_regions</code> specifically for <code class="highlighter-rouge">impl Trait</code> syntax.)</p>
<p>The same files were used as in the <a href="/parser-combinator-experiments-rust">last test</a>:</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th style="text-align: right">Time, 21 kB</th>
<th style="text-align: right">Time, 2 MB</th>
<th style="text-align: right">Time, 204 MB</th>
</tr>
</thead>
<tbody>
<tr>
<td>[slow <code class="highlighter-rouge">is_token</code>]</td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.014 s</td>
<td style="text-align: right">1.16 s</td>
</tr>
<tr>
<td>fast <code class="highlighter-rouge">is_token</code></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.010 s</td>
<td style="text-align: right">0.80 s</td>
</tr>
</tbody>
</table>
<p>This is actually a very competitive result, seeing as the hand-written C-parser clocks in at 0.6
seconds, and neither the parser combinator nor the specific modifications to <code class="highlighter-rouge">rustc</code> have been
optimized. The code is used as is and only <code class="highlighter-rouge">--release</code> was supplied to cargo when compiling
the binary. To put it in perspective; it is only 30% slower than the hand-written C-parser which uses
switch-case-goto and 15% slower than the manual-threading of state variant. But it is 25% faster
than Nom and 80% faster compared to Attoparsec.</p>
<p>These are really promising results and I am hoping that Rust can land this feature in the near
future, unboxed abstract returns and higher kinded types would be amazing tools to have in an
already great language.</p>
<p><strong>EDIT:</strong> Published on reddit: <a href="https://www.reddit.com/r/rust/comments/3k0d0d/parser_combinator_experiments_part_3_performance/">/r/rust</a></p>
Parser Combinator Experiments in Rust - Part 2: Error handling2015-08-30T00:00:00+00:00http://m4rw3r.github.io//parser-combinator-experiments-errors<p>I published the <a href="/parser-combinator-experiments-rust">previous blog post</a> on reddit
and got some interesting feedback from that. I also talked a bit to the creators of
<a href="https://github.com/Geal/nom">Nom</a> and <a href="https://github.com/Marwes/combine">Combine</a> which finally
helped me make a decision regarding manual threading-of-state vs boxed closures.</p>
<p>I have decided to proceed with the manual threading-of-state since it is performing a lot better,
is just as modular (boxing of parsers can be done if you need some dynamic dispatch) and it is not
too different from the boxed-closure version. If Rust improves enough for unboxed closure returns
I might be able to completely update the parser to use closures instead of manual passing of state
in a <code class="highlighter-rouge">X.0</code> breaking update without making it difficult to upgrade.</p>
<p>This will be a pretty long post since it deals with quite a lot of the different things which can
be done in relation to error handling.</p>
<h1 id="comparison-with-combine">Comparison with Combine</h1>
<p>Since people wanted me to include more parser-generators in my simple benchmark I have had some
help from the creator of Combine to produce two versions of the benchmark for it. The first version
is using the stock version of Combine, <code class="highlighter-rouge">1.0.0-beta.3</code>, which does not support zero-copy parsing.
The second version is using the the unfinished <code class="highlighter-rouge">1.0.0</code> version which has support for zero-copy
parsing as well as other improvements.</p>
<p>This time I ordered them from fastest to slowest on the 204 MB file:</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th style="text-align: right">Time, 21 kB</th>
<th style="text-align: right">Time, 2 MB</th>
<th style="text-align: right">Time, 204 MB</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/bos/attoparsec/blob/4f137347be02106765f6897059b88219c79bb86c/examples/rfc2616.c">C http-parser</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.62 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/572c09d5b7b3698dbb08">Manual</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.015 s</td>
<td style="text-align: right">1.19 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/0dd154d232abd0f3d4cf">Nom</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.018 s</td>
<td style="text-align: right">1.42 s</td>
</tr>
<tr>
<td><a href="https://github.com/bos/attoparsec/blob/4f137347be02106765f6897059b88219c79bb86c/examples/RFC2616.hs">Attoparsec</a></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.021 s</td>
<td style="text-align: right">1.45 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/fcf84cb3d3df1d555177">Combine-pre</a></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.024 s</td>
<td style="text-align: right">2.15 s</td>
</tr>
<tr>
<td><a href="https://github.com/m4rw3r/rust_parser_experiments/blob/fourth/examples/http_parser.rs">Boxed</a></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.041 s</td>
<td style="text-align: right">3.75 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/6370f617199af2d6ca78">Combine-beta</a></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.077 s</td>
<td style="text-align: right">7.04 s</td>
</tr>
<tr>
<td>Parsec<sup>1</sup></td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.490 s</td>
<td style="text-align: right">47.75 s</td>
</tr>
</tbody>
</table>
<p>1: Uses the exact same parser-code as the Attoparsec benchmark.</p>
<p>The difference of not having to allocate any <code class="highlighter-rouge">Vec<u8></code> instances is quite huge which can clearly
be seen in the performance of Combine-pre vs Combine-beta. In addition to this Combine-pre has
much better support for parsing <code class="highlighter-rouge">&[u8]</code> which reduces the noise in the parser greatly.</p>
<p>The above test is still using the <a href="https://github.com/m4rw3r/rust_parser_experiments/tree/f64d0cc317c5d850987b83f206191eeed1e9bb68">code from the previous blog-post</a>,
I will be listing the performance of the improved/rewritten parser further down.</p>
<h1 id="tradeoffs">Tradeoffs</h1>
<p>Now that we have looked at the numbers again a bit I will have to go through the limitations my
parser will have. Making a completely generic parser and then get it to perform well is pretty
difficult, and to make one which is both performing well and is easy to use is even harder, so I
have to limit it a bit.</p>
<p>Currently I have settled for the following tradeoffs:</p>
<ul>
<li>Only works on slices, does not accept iterators as a data-source.</li>
<li>The data in the source-slice must be <code class="highlighter-rouge">Copy</code>, this allows for simpler smaller parsers and cuts
out a lot of the noise from pointer-dereferencing when matching data.</li>
<li>Need to restart parsing whenever incomplete inputs are encountered, this might also mean that
data needs to be moved/copied in the source-buffer if all data is not immediately available.</li>
<li>Errors are bound to the current input-source, forcing users to deal with them before restarting
parse (or saving some kind of copy before resuming).</li>
</ul>
<p>I am not 100% certain about any of these, but it seems to be allowing for a pretty fast and
flexible parser. If larger inputs need to be consumed the parser-combinator could be demoted to a
lexer, to make it cheaper and more convenient to just restart the parser on incomplete input and
keep some kind of datastructure persistent between invocations of it.</p>
<h1 id="error-handling">Error handling</h1>
<p>Error handling is something people also seemed to be interested in, what different approaches can
be made and how it impacts both performance and readability of error messages. Seeing as I have
already made the tradeoff which requires an immutable underlying buffer for every invocation into
the parsers I do not have to consider the situation where I have to duplicate parts of the input
stream to produce a readable error message.</p>
<p>Another simplification I have decided on is to not let <code class="highlighter-rouge">bind</code> [automatically backtrack on error] (https://github.com/m4rw3r/rust_parser_experiments/blob/f64d0cc317c5d850987b83f206191eeed1e9bb68/src/lib.rs#L226)
for the next version. Instead it is up to the appropriate combinators to hold a reference to the
original slice and backtrack if needed. This should simplify some logic and hopefully enable LLVM
to optimize it a bit better. For most of the parsers and combinators the original input position is
completely useless seeing as there is nothing to be done once an error has occurred.</p>
<p>One additional optimization regarding errors is that we can omit the actual unexpected token or
state since we always point at the error with the slice in <code class="highlighter-rouge">Parser</code>. In the previous version we
did not guarantee that the slice actually survived when yielding the error. This means that we can
now have an even simpler <code class="highlighter-rouge">Error</code> without any data at all!</p>
<p>Further, we can optimize <code class="highlighter-rouge">many1</code> to actually not need to stack or wrap errors — instead it just
needs to pass through the existing error — because of the following equivalence:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>// Haskell
many1 p = p >>= \x -> many p >>= \xs -> return x:xs
// Rust parser
many1(p) = bind(p(m), |m, x|
bind(many(m, p), |m, xs|
ret(m, xs.insert(0, x))))
</code></pre>
</div>
<p>This is of course a simplification since the result of <code class="highlighter-rouge">many</code> and <code class="highlighter-rouge">many1</code> are not always
vectors. And it still requires some modifications to the internal <code class="highlighter-rouge">Iterator</code> used by <code class="highlighter-rouge">many</code>
and <code class="highlighter-rouge">many1</code> so that it will actually propagate the error instead of throwing it away, as it
did in the previous version.</p>
<p>So we will have the following two simplified <code class="highlighter-rouge">Error</code> types to compare:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Error</span><span class="p">;</span>
</code></pre>
</div>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">enum</span> <span class="n">Error</span><span class="o"><</span><span class="n">I</span><span class="o">></span>
<span class="n">where</span> <span class="n">I</span><span class="p">:</span> <span class="nb">Copy</span> <span class="p">{</span>
<span class="nf">Expected</span><span class="p">(</span><span class="n">I</span><span class="p">),</span>
<span class="n">Unexpected</span><span class="p">,</span>
<span class="nf">String</span><span class="p">(</span><span class="nb">Vec</span><span class="o"><</span><span class="n">I</span><span class="o">></span><span class="p">)</span>
<span class="p">}</span>
</code></pre>
</div>
<p>And then the common <code class="highlighter-rouge">Parser</code> implementation:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">enum</span> <span class="n">State</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span>
<span class="n">where</span> <span class="n">I</span><span class="p">:</span> <span class="err">'</span><span class="n">a</span> <span class="p">{</span>
<span class="nf">Item</span><span class="p">(</span><span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="n">I</span><span class="p">],</span> <span class="n">T</span><span class="p">)</span>
<span class="nf">Error</span><span class="p">(</span><span class="o">&</span><span class="err">'</span><span class="n">a</span> <span class="p">[</span><span class="n">I</span><span class="p">],</span> <span class="n">E</span><span class="p">),</span>
<span class="c">// Reporting incomplete input does not need to actually return the slice</span>
<span class="nf">Incomplete</span><span class="p">(</span><span class="nb">usize</span><span class="p">),</span>
<span class="p">}</span>
<span class="c">// Newtype wrapper to avoid exposing the internal state</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">(</span><span class="n">State</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">)</span>
<span class="n">where</span> <span class="n">I</span><span class="p">:</span> <span class="err">'</span><span class="n">a</span><span class="p">;</span>
</code></pre>
</div>
<h2 id="custom-errors">Custom errors</h2>
<p>Of course, the default errors are still not good enough for a real application if the output ever
reaches the end-user. Just mentioning something like “Expected ‘foo’ at ‘baz bar’” is not good
enough and application- (or library-) specific error types are required to properly describe
parse-errors.</p>
<p>Currently <code class="highlighter-rouge">bind</code> is generic over the error passed, automatically converting the error from the
first parameter to the error-type of the second parameter using <code class="highlighter-rouge">From::from</code>. This should enable
some flexibility in handling errors, but I am not certain this is enough.</p>
<p>This works well when bind is used like this since the first error will be converted to the error
type of <code class="highlighter-rouge">my_parser</code>:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nd">mdo!</span><span class="p">{</span><span class="n">input</span><span class="p">,</span>
<span class="nf">char</span><span class="p">(</span><span class="n">b</span><span class="sc">':'</span><span class="p">);</span>
<span class="n">data</span> <span class="o">=</span> <span class="nf">my_parser</span><span class="p">();</span>
<span class="n">ret</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
</code></pre>
</div>
<p>It does not work so well when it is done in reverse however, as that will attempt to convert from
the potentially user-defined error-type from <code class="highlighter-rouge">my_parser</code> into the <code class="highlighter-rouge">Error</code> type of the parser
library:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nd">mdo!</span><span class="p">{</span><span class="n">input</span><span class="p">,</span>
<span class="n">data</span> <span class="o">=</span> <span class="nf">my_parser</span><span class="p">();</span>
<span class="nf">char</span><span class="p">(</span><span class="n">b</span><span class="sc">'\n'</span><span class="p">);</span>
<span class="n">ret</span> <span class="n">data</span>
<span class="p">};</span>
</code></pre>
</div>
<p>Another problem with this is inference, since the second parameter does not always have an error
AND a value specified for its type (eg. it is common to just end with a <code class="highlighter-rouge">ret</code> or <code class="highlighter-rouge">err</code>). This
will force the use of a few type-annotations when parsers are added and used inline in a function.
It will not be a problem if the parser is wrapped in a function however.</p>
<p>The <code class="highlighter-rouge">map_err</code> approach which <a href="http://doc.rust-lang.org/std/result/enum.Result.html#method.map_err">std::result::Result</a>
uses is something which could solve this problem. Allowing errors to be converted to the desired
type right after the parse-operation will solve the problem illustrated in the second example
above, it will enable the user to convert the error of <code class="highlighter-rouge">char</code> to the appropriate error before
the <code class="highlighter-rouge">ret</code>.</p>
<p>Enabling constructions like this will solve the issue:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="nd">mdo!</span><span class="p">{</span><span class="n">input</span><span class="p">,</span>
<span class="n">data</span> <span class="o">=</span> <span class="nf">my_parser</span><span class="p">();</span>
<span class="nf">char</span><span class="p">(</span><span class="n">b</span><span class="sc">'\n'</span><span class="p">)</span><span class="nf">.map_err</span><span class="p">(</span><span class="nn">MyError</span><span class="p">::</span><span class="n">ParseError</span><span class="p">);</span>
<span class="n">ret</span> <span class="n">data</span>
<span class="p">};</span>
</code></pre>
</div>
<p>This exact syntax does not play nice with the <code class="highlighter-rouge">mdo!</code> macro at the moment, but it is something I
am working on.</p>
<h2 id="type-inference">Type inference</h2>
<p>There is still a minor annoyance with this construction for error handling, the type inference for
the error type is not as simple as it could be. In current stable Rust (1.2) given that <code class="highlighter-rouge">f</code>
does not have any specified error type then <code class="highlighter-rouge">bind(m, f)</code> will cause <code class="highlighter-rouge">rustc</code> to fail to infer
the type of <code class="highlighter-rouge">bind</code> since the returned error type is the error type from <code class="highlighter-rouge">f</code>.</p>
<p>This can be solved by adding a <a href="https://github.com/rust-lang/rfcs/blob/master/text/0213-defaulted-type-params.md">default type parameter</a>
for the error of <code class="highlighter-rouge">f</code> so that it will default to the existing error type, this will cause
<code class="highlighter-rouge">rustc</code> to use that type if it cannot infer a type for the error type of <code class="highlighter-rouge">f</code>. Sadly it is just
partially usable at the moment in stable rust; stable rust accepts the syntax of the default type
parameter but does not use it as a fallback in case the inference fails.</p>
<p>To be able to use the default type as a fallback nightly rust must be used and the crates which
want to enable the fallback in the code they use must enable the feature gate
<code class="highlighter-rouge">default_type_parameter_fallback</code>. This will cause <code class="highlighter-rouge">rustc</code> to use the fallback for all code
in the crate, this includes code from other libraries used in the crate (eg. if <code class="highlighter-rouge">bind</code> is used
and inference is wanted for it, it is the crate which performs the call to <code class="highlighter-rouge">bind</code> which needs the
feature enabled, not the library where <code class="highlighter-rouge">bind</code> is defined). There is a
<a href="https://github.com/rust-lang/rust/issues/27336">tracking issue</a> for the fallback, and it seems
like it is just waiting for some more testing in the wild before it is enabled by default.</p>
<p>Since the default type parameters do not actually affect stable at all it is a harmles addition
to the generic types:</p>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">bind</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">F</span><span class="p">,</span> <span class="n">U</span><span class="p">,</span> <span class="n">V</span> <span class="o">=</span> <span class="n">E</span><span class="o">></span><span class="p">(</span><span class="n">m</span><span class="p">:</span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">E</span><span class="o">></span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span>
<span class="k">-></span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">U</span><span class="p">,</span> <span class="n">V</span><span class="o">></span>
<span class="n">where</span> <span class="n">V</span><span class="p">:</span> <span class="n">From</span><span class="o"><</span><span class="n">E</span><span class="o">></span><span class="p">,</span>
<span class="n">F</span><span class="p">:</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="n">Input</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="o">></span><span class="p">,</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="n">I</span><span class="p">,</span> <span class="n">U</span><span class="p">,</span> <span class="n">V</span><span class="o">></span><span class="p">;</span>
</code></pre>
</div>
<p>Note <code class="highlighter-rouge">V = E</code> above.</p>
<p>In the future default type parameter fallback can be used to simplify error handling around parsers
which do not return any error (eg. <code class="highlighter-rouge">any</code>). Currently they “return” <code class="highlighter-rouge">Error<I></code> to be consistent
with the other parsers which forces the user to handle the error case even though it is not at all
necessary, since the type could just be free, but the alternative is to force the users on stable
Rust to annotate the error-type every time any of these functions are used. By using the generic
<code class="highlighter-rouge">E = Error<I></code> instead once default type parameter fallback has landed it will be easier for the
user to use these parsers while it does not break any existing code.</p>
<h1 id="comparing-the-two">Comparing the two</h1>
<p>To avoid having to copy-paste almost every parser I conditionally compile a private internal
module which contains the <code class="highlighter-rouge">Error</code> type definition as well as constructors for the errors.
This is done using the <code class="highlighter-rouge">--features</code> flag of Cargo. By default the parser will include the
verbose errors, but <code class="highlighter-rouge">--no-default-features</code> can be used to make errors carry no data (could
be useful for some applications if the extra data in the error messages is not needed and the
performance gain is justified).</p>
<p>The simple version of the module only contains noops for managing the data, returning just the
error values. The more complex version contains some minor logic to populate the <code class="highlighter-rouge">Error</code> data.</p>
<h2 id="inlining">Inlining</h2>
<p><img src="/public/parser_combinator_errors_different_inlining.png" alt="Comparison of CPU samples" /></p>
<p>Every time changed something I <code class="highlighter-rouge">time</code>d the run on the 200 MB file to see if the run-time
changed, and if the different error handling approaches had much of an impact on performance.
Pretty consistently the noop error was about 10% faster for most of the changes I did, but when
I tried to simplify the private <code class="highlighter-rouge">iter::Iter</code> I ran into a problem; suddenly the larger error
enum performed better than the empty one.</p>
<p>It turns out that LLVM is suddenly inlining different functions in the parser code; for the
faster version it inlines <code class="highlighter-rouge">request</code> and all its callees except for <code class="highlighter-rouge">message_header</code> (which
in turn has everything inlined except for <code class="highlighter-rouge">message_header_line</code>). In the slower version
<code class="highlighter-rouge">request</code> is NOT inlined nor is <code class="highlighter-rouge">message_header</code> but it inlines <code class="highlighter-rouge">message_header_line</code>
instead.</p>
<p>Clearly I need some way to make the inlining equivalent for both tests.</p>
<p>Using <code class="highlighter-rouge">#[inline]</code> on the relvant functions in the <code class="highlighter-rouge">http_parser</code> example did not solve the
problem. LLVM/<code class="highlighter-rouge">rustc</code> does not seem to interpret that version of the attribute as a strong
indication that it should be inlined. Thankfully there is a stronger indication to the compiler
that it should inline: <code class="highlighter-rouge">#[inline(always)</code>. Of course I immediately tried to liberally sprinkle
that attribute on all functions in <code class="highlighter-rouge">examples/http_parser.rs</code> and it turned out that this
actually just makes everything perform worse (kind of expexted, CPU-cache and all that).</p>
<p>Instead using <code class="highlighter-rouge">#[inline(always)]</code> on the exact same functions which were originally
automatically inlined by LLVM (without even needing the <code class="highlighter-rouge">#[inline]</code> attribute) produced the
best result for both approaches. Force-inlining <code class="highlighter-rouge">request</code> and <code class="highlighter-rouge">request_line</code> makes LLVM inline
in the same way for both variants. Now the performance gap is down to 5-10% only, and both are
faster than before by a small margin.</p>
<h1 id="the-code">The code</h1>
<p>The code can be found in the branch <code class="highlighter-rouge">fifth</code> <a href="https://github.com/m4rw3r/rust_parser_experiments/tree/fifth">here</a>
and the specific commit tested was <a href="https://github.com/m4rw3r/rust_parser_experiments/tree/128d7c70deff74fb4b7fef4a3dae7a2356343478">128d7c70</a>.</p>
<h1 id="results">Results</h1>
<p>Using the current beta (<code class="highlighter-rouge">rustc 1.3.0-beta.3 (2a89bb6ba 2015-08-11)</code>, same as was used for the
previous performance test) and the same machine as in the previous tests (MacBook Pro retina
15-inch, late 2013 with 2.3 GHz Intel Core i7 and 16 GB RAM) we get the following numbers:</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th style="text-align: right">Time, 21 kB</th>
<th style="text-align: right">Time, 2 MB</th>
<th style="text-align: right">Time, 204 MB</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple</td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.012 s</td>
<td style="text-align: right">0.948 s</td>
</tr>
<tr>
<td>Verbose</td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.012 s</td>
<td style="text-align: right">0.974 s</td>
</tr>
</tbody>
</table>
<p>The major speedups seem to come from the following changes:</p>
<ul>
<li><code class="highlighter-rouge">bind</code> no longer backtracks on error; this avoids one extra branch-instruction in the most used
construction in the whole library.</li>
<li>The internal <code class="highlighter-rouge">iter::Iter</code> was simplified a bit and there are 2 possible states instead of 3;
this affected some of the combinators, making their matching code require fewer cases which
hopefully optimizes better.</li>
</ul>
<p>Overall I am pretty happy with the results, it is really starting to close in on the hand-written
C-code listed above in the benchmark. And I hope that the tradeoffs I made are acceptable for the
gain in parsing speed.</p>
<p>Now on to actually attempting to use this code in an application!</p>
<p><strong>EDIT:</strong> Posted on Reddit here: <a href="https://www.reddit.com/r/rust/comments/3iwqlo/parser_combinator_experiments_part_2_error/">/r/rust</a></p>
Parser Combinator Experiments in Rust2015-08-18T00:00:00+00:00http://m4rw3r.github.io//parser-combinator-experiments<p>For the last week I have been working a bit on parser-combinator experiments using the
programming-language Rust. I have tried stacking structs, manually threading state and
boxed closures, with the last two seeming to be the most promising.</p>
<p>I am writing this as I would like feedback on my approach, as well as to announce that I have
something in the works which looks pretty promising.</p>
<h2 id="the-code">The code</h2>
<p>The code can be found in my <a href="https://github.com/m4rw3r/rust_parser_experiments">rust_parser_experiments</a> repo, the <code class="highlighter-rouge">master</code> branch currently
containing the manual state-threading and the <code class="highlighter-rouge">fourth</code> branch containing the boxed closure
version.</p>
<ul>
<li><a href="https://github.com/m4rw3r/rust_parser_experiments/tree/f64d0cc317c5d850987b83f206191eeed1e9bb68">Manually threading state</a></li>
<li><a href="https://github.com/m4rw3r/rust_parser_experiments/tree/b36a60a79cf38bb9e1c39a2d382b737b0f6aeb22">Boxed closures</a></li>
</ul>
<h2 id="first-some-numbers">First, some numbers</h2>
<p>I used the attoparsec’s http-header <a href="https://github.com/bos/attoparsec/tree/master/examples">examples</a> as a simple benchmark to
compare the two approaches, both with each other as well as the other provided examples to see
how well it holds up. I also included the parser combinator <a href="https://github.com/Geal/nom">Nom</a>
in this comparison.</p>
<p>The data used is the file <code class="highlighter-rouge">http-requests.txt</code> file as well as two copies of this file which
contains the same data copied 100 times and 10,000 times, resulting two files, 2 MB and 204 MB,
in size.</p>
<p>The tests were run on a MacBook Pro (Retina, 15-inch, Late 2013) with a 2.3 GHz Intel Core i7 and
16 GB RAM. All optimizations were turned on (<code class="highlighter-rouge">-O2</code> for Haskell, <code class="highlighter-rouge">-O3</code> for C and <code class="highlighter-rouge">--release</code>
for Rust).</p>
<table>
<thead>
<tr>
<th>Parser</th>
<th style="text-align: right">Time, 21 kB</th>
<th style="text-align: right">Time, 2 MB</th>
<th style="text-align: right">Time, 204 MB</th>
</tr>
</thead>
<tbody>
<tr>
<td>C http-parser</td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.62 s</td>
</tr>
<tr>
<td>Attoparsec</td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.021 s</td>
<td style="text-align: right">1.45 s</td>
</tr>
<tr>
<td>Parsec</td>
<td style="text-align: right">0.009 s</td>
<td style="text-align: right">0.490 s</td>
<td style="text-align: right">47.75 s</td>
</tr>
<tr>
<td><a href="https://gist.github.com/m4rw3r/0dd154d232abd0f3d4cf">Nom</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.018 s</td>
<td style="text-align: right">1.42 s</td>
</tr>
<tr>
<td><a href="https://github.com/m4rw3r/rust_parser_experiments/blob/master/examples/http_parser.rs">Manual</a></td>
<td style="text-align: right">0.003 s</td>
<td style="text-align: right">0.015 s</td>
<td style="text-align: right">1.19 s</td>
</tr>
<tr>
<td><a href="https://github.com/m4rw3r/rust_parser_experiments/blob/fourth/examples/http_parser.rs">Boxed</a></td>
<td style="text-align: right">0.004 s</td>
<td style="text-align: right">0.041 s</td>
<td style="text-align: right">3.75 s</td>
</tr>
</tbody>
</table>
<p>Making some quick profiling using Instruments (bundled with XCode) it seems like the Boxed version
spends about a second in <code class="highlighter-rouge">je_malloc</code> to allocate all the boxed closures which are used. In
comparison the Manual version spends most of its time (~600 ms) in the <code class="highlighter-rouge">message_header</code>
function.</p>
<h2 id="usage">Usage</h2>
<p>The basic monad laws apply to both of the approaches, the difference being that manual
state threading requires the first parameter to always be the monad-state.</p>
<table>
<thead>
<tr>
<th>Haskell</th>
<th>Manual</th>
<th>Boxed</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="highlighter-rouge">m :: Parser a</code></td>
<td><code class="highlighter-rouge">m: Parser<A></code></td>
<td><code class="highlighter-rouge">m: Parser<A></code></td>
</tr>
<tr>
<td><code class="highlighter-rouge">f :: a -> Parser b</code></td>
<td><code class="highlighter-rouge">f: Fn(Empty, A) -> Parser<B></code></td>
<td><code class="highlighter-rouge">f: Fn(A) -> Parser<B></code></td>
</tr>
<tr>
<td><code class="highlighter-rouge">g :: Parser b</code></td>
<td><code class="highlighter-rouge">g: Fn(Empty) -> Parser<B></code></td>
<td><code class="highlighter-rouge">g: Fn() -> Parser<B></code></td>
</tr>
<tr>
<td><code class="highlighter-rouge">m >>= f</code></td>
<td><code class="highlighter-rouge">bind(m, f)</code></td>
<td><code class="highlighter-rouge">bind(m, f)</code></td>
</tr>
<tr>
<td><code class="highlighter-rouge">m >> g</code></td>
<td><code>bind(m, |m, _| g(m))</code></td>
<td><code>bind(f, |_| g())</code></td>
</tr>
<tr>
<td><code class="highlighter-rouge">return a</code></td>
<td><code class="highlighter-rouge">ret(m, a)</code></td>
<td><code class="highlighter-rouge">ret(a)</code></td>
</tr>
<tr>
<td><code class="highlighter-rouge">fail a</code></td>
<td><code class="highlighter-rouge">err(m, a)</code></td>
<td><code class="highlighter-rouge">err(a)</code></td>
</tr>
</tbody>
</table>
<p>For the manual version above, <code class="highlighter-rouge">Empty</code> is an alias for <code class="highlighter-rouge">Parser<()></code> to indicate state but no
wrapped value (all the parsers require an <code class="highlighter-rouge">Empty</code> instance to act upon, to prevent accidental
loss of data).</p>
<p>The usage itself does not differ much, all parsers just needs to accomodate for that <code class="highlighter-rouge">Empty</code>
parameter.</p>
<p>Expanded version of <code class="highlighter-rouge">do</code>-syntax:</p>
<h3 id="manually-threading-state">Manually threading state</h3>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="n">request_line</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">></span><span class="p">(</span><span class="n">p</span><span class="p">:</span> <span class="n">Empty</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="nb">u8</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">Request</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">></span><span class="p">,</span> <span class="n">Error</span><span class="o"><</span><span class="nb">u8</span><span class="o">>></span> <span class="p">{</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">is_token</span><span class="p">),</span> <span class="p">|</span><span class="n">p</span><span class="p">,</span> <span class="n">method</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">is_space</span><span class="p">),</span> <span class="p">|</span><span class="n">p</span><span class="p">,</span> <span class="n">_</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">is_not_space</span><span class="p">),</span> <span class="p">|</span><span class="n">p</span><span class="p">,</span> <span class="n">uri</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">is_space</span><span class="p">),</span> <span class="p">|</span><span class="n">p</span><span class="p">,</span> <span class="n">_</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">http_version</span><span class="p">(</span><span class="n">p</span><span class="p">),</span> <span class="p">|</span><span class="n">p</span><span class="p">,</span> <span class="n">version</span><span class="p">|</span>
<span class="nf">ret</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">Request</span><span class="p">{</span><span class="n">method</span><span class="p">:</span> <span class="n">method</span><span class="p">,</span> <span class="n">uri</span><span class="p">:</span> <span class="n">uri</span><span class="p">,</span> <span class="n">version</span><span class="p">:</span> <span class="n">version</span><span class="p">,}))))))</span>
<span class="p">}</span>
</code></pre>
</div>
<h3 id="boxed">Boxed</h3>
<div class="language-rust highlighter-rouge"><pre class="highlight"><code><span class="k">fn</span> <span class="n">request_line</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">></span><span class="p">()</span> <span class="k">-></span> <span class="n">Parser</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="err">'</span><span class="n">a</span><span class="p">,</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">Request</span><span class="o"><</span><span class="err">'</span><span class="n">a</span><span class="o">></span><span class="p">,</span> <span class="n">Error</span><span class="o"><</span><span class="nb">u8</span><span class="o">>></span> <span class="p">{</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">is_token</span><span class="p">),</span> <span class="k">move</span> <span class="p">|</span><span class="n">method</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">is_space</span><span class="p">),</span> <span class="k">move</span> <span class="p">|</span><span class="n">_</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">is_not_space</span><span class="p">),</span> <span class="k">move</span> <span class="p">|</span><span class="n">uri</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">take_while1</span><span class="p">(</span><span class="n">is_space</span><span class="p">),</span> <span class="k">move</span> <span class="p">|</span><span class="n">_</span><span class="p">|</span>
<span class="nf">bind</span><span class="p">(</span><span class="nf">http_version</span><span class="p">(),</span> <span class="k">move</span> <span class="p">|</span><span class="n">version</span><span class="p">|</span>
<span class="nf">ret</span><span class="p">(</span><span class="n">Request</span><span class="p">{</span><span class="n">method</span><span class="p">:</span> <span class="n">method</span><span class="p">,</span> <span class="n">uri</span><span class="p">:</span> <span class="n">uri</span><span class="p">,</span> <span class="n">version</span><span class="p">:</span> <span class="n">version</span><span class="p">,}))))))</span>
<span class="p">}</span>
</code></pre>
</div>
<p>And here is the difference in the http-example once the <code class="highlighter-rouge">mdo!</code> macro is used:
<a href="https://gist.github.com/m4rw3r/6d1ca498f8e4abc24dbd">Diff</a></p>
<h2 id="operation">Operation</h2>
<p>The difference really becomes apparent once you look at how the parser is run. The manual
threading of state eagerly parses the data as it moves through the defined parser functions,
eg. in <code class="highlighter-rouge">request_line</code> above the resulting data contains the actual <code class="highlighter-rouge">Request</code> struct, fully
populated and all, as well as the remaining data to be parsed. This is pretty straightforward and
inlining makes the resulting code appear similar to how a hand-written parser would look.</p>
<p>The boxed-closure version on the other hand does nothing when a defined parser like <code class="highlighter-rouge">request_line</code>
is run. Instead this allocates a bunch of structures on the heap (boxed <code class="highlighter-rouge">FnOnce</code>s inside of each
other) which are then executed once you want to parse something. Due to requiremens from <code class="highlighter-rouge">ret</code>
the boxed closures cannot be <code class="highlighter-rouge">Fn</code> or <code class="highlighter-rouge">FnMut</code> or it would otherwise enforce <code class="highlighter-rouge">Clone</code> on the
value passed to <code class="highlighter-rouge">ret</code>.</p>
<p>The <code class="highlighter-rouge">FnOnce</code> requirement also poses a problem when combinators like <code class="highlighter-rouge">many</code> and <code class="highlighter-rouge">manyTill</code>
are used, as the parser can only be used once. This means that <code class="highlighter-rouge">many</code> and the like require
a newly allocated parser for every iteration, taking a <code class="highlighter-rouge">Fn() -> Parser<A></code> as a parameter, which
causes a lot of heap-allocation during parsing.</p>
<h2 id="my-opinion">My opinion</h2>
<p>I am currently on the fence here, but leaning towards using the manual threading-of-state for the
time being since it optimizes much better and seems to correspond better to Rust’s current
feature-set. Once <a href="https://github.com/rust-lang/rfcs/issues/518">abstract return types</a> lands, and
it works with functions, it should hopefully improve the performance of the closure-returning
version and make that a clear winner.</p>
<p><strong>EDIT 2015-08-19:</strong> Updated performance numbers for Nom. The <code class="highlighter-rouge">nom::Stepper</code> introduced lots of memory
operations and caused cycles to be spent on moving data which should not have needed to be moved.</p>
<p>This increased Nom’s performance from 0.004 s and 6.902 s for the first two benchmarks, and the
last one couldn’t complete it in one hour when <code class="highlighter-rouge">nom::Stepper</code> was used. The <code class="highlighter-rouge">nom::MemProducer</code>
was also tried along with <code class="highlighter-rouge">nom::Stepper</code> but that did not make any noticeable difference.</p>
<p><strong>EDIT:</strong> Posted on Reddit here: <a href="https://www.reddit.com/r/rust/comments/3hinis/parser_combinator_experiments/">/r/rust</a></p>