Homework 9: More Parser/Grammer
CIS 194: Homework 9
Due Tuesday, November 1
This is only half a homework assignment (10 points). Your other task this week is to write your final project proposal, also due on November 1st.
Exercise 1
Start with this code, which roughly corresponds to the result of the code developed during this week’s lecture.
Your task is to write a combined parser/grammar generator for EBNF
parseBNF :: Descr f => f BNF
BNF in BNF
Here is a BNF for BNF:
identifier = letter, {letter | digit | '-'};
spaces = {' ' | newline};
quoted-char = non-quote-or-backslash | '\\', '\\' | '\\', '\'';
terminal = '\'', {quoted-char}, '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = '{', spaces, rhs, spaces, '}', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal | non-terminal | option | repetition | group;
sequence = atom, {spaces, ',', spaces, atom}, spaces;
choice = sequence, {spaces, '|', spaces, sequence}, spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, {production};
This grammar is set up so that the precedence of ,
and |
is correctly implemented: a , b | c
will parse as (a, b) | c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside '…'
, a '
is replaced by \'
and a \
is replaced by \\
. Also see the function quote
in ppRHS
.
If something is unclear about the grammar, please ask!
Recursion
You will notice that, in contrast to the grammars in class, this is recursive: Many productions refer to rhs
, and rhs
itself refers to these productions. With plain recursion on the Haskell level, and the implementation of Grammar
in class, this will not work.
Therefore, you will find a new function
recNonTerminal :: Descr f => String -> (f a -> f a) -> f a
in the provided file that makes the recursion explicit, and hence allows the grammar generator to inspect and it and make sense of it.
You use recNonTerminal "foo"
to define a non-terminal symbol foo
. The second argument of recNonTerminal
is a function that returns the right-hand side of the production. The argument given to that function is a description to be used whenever foo
is referenced.
So instead of the naively recursive (and non-functional) variant
aList :: Descr f => f [Item]
aList = nonTerminal "list" $
((:) <$> anItem <*> aList) `orElse` pure []
you would encode that as
aList :: Descr f => f [Item]
aList = recNonTerminal "list" $ \list ->
((:) <$> anItem <*> list) `orElse` pure []
Larger example
The following is an example that shows how to use recNonTerminal
and is otherwise also similar to your exercise. You can use it as an template.
In this example, we have a parser and a grammar generator for simple arithmetic expressions with addition, multiplication and constant integers.
data Expr = Plus Expr Expr | Mult Expr Expr | Const Integer
deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
mkPlus <$> eMult exp
<*> many (spaces *> char '+' *> spaces *> eMult exp)
<* spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
mkPlus <$> eAtom exp
<*> many (spaces *> char '*' *> spaces *> eAtom exp)
<* spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
aConst `orElse` eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
id <$ char '('
<* spaces
<*> inner
<* spaces
<* char ')'
<* spaces
BNF helper functions
Note the functions
mkChoices :: RHS -> [RHS] -> RHS
mkSequences :: RHS -> [RHS] -> RHS
in the provided code. They might be useful in when parsing the sequence
and choice
productions (just as mkPlus
and mkMult
in the example).
Testing and debugging your code
If you write many small functions, maybe one for each production, on the top level, you can easily test them in GHCi:
*Main> parse aProduction "identifier = letter, {letter | digit | '-'};" Just ("identifier",Sequence (NonTerminal "letter") (Repetition (Choice (Choice (NonTerminal "letter") (NonTerminal "digit")) (Terminal "-"))))
(linebrake added manually by me)
If you run
*Main> putStr $ ppGrammar "bnf" parseBNF
you should get the grammar above
You should be able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr *Main> let bnf2 = runGrammer "expr" parseBNF *Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF *Main> f bnf1 == bnf1 True *Main> f bnf2 == bnf2 True
The last line is quite meta: You are unsing
parseBNF
as a parser on the pretty-printed grammar produced from interpretingparseBNF
as a grammar.
Optional exercise: A better recursion
This is not part of the homework, but if find this brain food yummy, feel free to implement it.
There is a way to get around defining recNonTerminal
and explicitly passing around the recursive call (i.e. the exp
in the example), if we adjust our Grammar
type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal
, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name `elem` seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
Adjust the other primitives on Grammar
to type-check again, and observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst `orElse` eParens parseExp