Please disable your adblock and script blockers to view this page

Dreaming of a Parser Generator for Language Design

Parr and Fisher
The Foundation of the
The Power of Dynamic Analysis
Strategies For Fast Lexical Analysis

Laurence Tratt



No matching tags


No matching tags

Positivity     33.00%   
   Negativity   67.00%
The New York Times
Write a review: Hacker News

To a lesser extent, I also discuss DSLs. This means I’m not concerned with any of the following:Before getting into what I see as the requirements for a parser generator for a language designer, let’s get out of the way those features that are useful but not necessary.Many authors focus on the issue of composing grammars to form new composite languages. The scannerless parser generators often tout the ability to handle combining grammars, but still suffer from the first issue.While having a tool that supports combining grammars would be handy, I don’t see it as a must-have. However, if it is possible to design tools with distinct lexing and parsing phases that enable or ease combining grammars, that would be wonderful.As compilers and source code have grown in length and complexity, one response has been to adopt more incremental compilation. These enable advanced use cases like re-lexing and parsing as the developer types to provide real-time compiler errors. If it were possible to easily control the resolution of errors in the parser, it might be possible to use the type information to make the correct decision.“Some Strategies For Fast Lexical Analysis when Parsing Programming Languages” discusses optimizing a lexer by generating token values during lexing. Besides which, most tools don’t offer any form of compile-time ambiguity detection anyway.Since we can’t detect arbitrary ambiguity, what we need is a new class of grammars which are unambiguous, but flexible enough to include the kinds of grammars we would naturally want to write for programming languages. Then we could use an algorithm like Marpa, which while accepting ambiguous grammars claims to parse all reasonable unambiguous grammars in linear time, to implement the parser.For a fully known and static language, adapting a grammar to the limitations of LL or LR parsing is painful, but doable. What a language designer needs is support for a relatively flexible set of grammars that allow them to worry about their language instead of satisfying the parser generator. The parser generator should provide simple ways of specifying operator precedence and associativity as disambiguation rules on top of the grammar.I’ve written before about how languages need to adopt intransitive operator precedence. For example, to add a data type property to all expression nodes which will later be set to the expression’s data type by the type checker.Increasingly, the compiler is not the only tool that needs to lex and parse source code in a given language. Parser generators should support the reuse of a single grammar in both the compiler and these tools.As is too often the case with open-source tools, parser generators are often lacking in usability. Too frequently the error messages of parser generators are incomprehensible without detailed knowledge of the parsing algorithm being used. Ideally, an example string which will cause the parsing problem would be provided and if there is an ambiguity the different possible parse trees offered.Performance still matters for generated parser even with today’s computers being multiple orders of magnitude faster than those available when parsing algorithms were first being developed. A parser generator can feed the single grammar into two different algorithms to offer this functionality with little to no impact to the compiler writer.All of the requirements I’ve laid out here can be summed up by one goal: enabling language growth. Initially, a new language needs a quick and dirty way to get lexing and parsing working for a small grammar. Additionally, having a separate lexer and unambiguous grammars guide the language development toward good designs while support for intransitive operator precedence provides design freedom.

As said here by