Copyright © 2008-2014 Orlando Hill

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

1. Introduction

As programmers, we are required to make use of data that is presented in a variety of formats. In order to extract and manipulate the desired information, we need the ability to navigate the structure of the language the data is written in. Unless the language is very simple, we must use a parser that understands the language and gives us the data in a form we can more readily use.

Manually creating parsers can be boring and time consuming. It is, therefore, common to use a use parser generator to do the grunt work of constructing the parser. This is where Waxeye comes in handy.

2. Getting Started

2.1. Downloading

You can download the current official release of Waxeye from Sourceforge. If you want to get the very latest version of Waxeye’s source, you can download it from GitHub using the Git version control system.

2.2. Requirements

There are no external dependencies needed to run a pre-built version of Waxeye. If you build from source, you’ll need Racket.

To use a generated parser, you need a supported programming language to run it from.

2.3. Installation

2.3.1. Unix and MacOSX

  1. Extract the files of the distribution.

  2. Copy the waxeye directory to where you wish to install it.

  3. Add the bin/waxeye binary to your search path. e.g. If you have ~/bin in your PATH and installed waxeye to /usr/local/waxeye then you might do the following.

ln -s /usr/local/waxeye/bin/waxeye ~/bin/

2.3.2. Windows

  1. Extract the files of the distribution.

  2. Copy the waxeye directory to where you wish to install it.

2.4. Running

Currently, Waxeye is used from a command-line interface. You can use it as a command-line tool or, as part of a script or build-system. There are plans to develop a graphical tool at a later stage.

2.4.1. Unix and MacOSX

Run Waxeye by executing the waxeye binary.

2.4.2. Windows

Use a command prompt to run waxeye.exe.

3. Basic Concepts

3.1. What is a parser?

When we want to understand data that has been written in a language of interest (L), we need to break our data into units of the language. This process of breaking our input into different parts, based on the structure of L, is called parsing. A program used for parsing is called a parser.

3.2. What is the result of a parser?

Once your input has been parsed, you need the result to be presented in a from that is easy to understand and manipulate. Since the input was organized based on the hierarchical structure of the language, it makes sense that the output of the parser mimic this structure. The most effective form to do this with is a tree.

Such a tree is known as an Abstract Syntax Tree (AST). A Waxeye parser will automatically give you an AST that represents your input. The structure of this AST is based on the structure of your language’s grammar.

3.3. What is a parser generator?

If L is simple, it is easy for us to use our programming lanugage of choice to, manually, write a parser for L. However, as the structural complexity of L increases, so too, does the size and complexity of the parser program. Writing and maintaining a large parser, by hand, can quickly become a tedious and laborious job. Thankfully, we can use a parser generator to automate the work of creating a parser so we can focus on other problems.

A parser generator is a tool designed to help software developers automate the process of creating a parser. Just like compilers and assemblers, a parser generator takes a description of a program, automatically does the boring work for you and gives you a transformed program as output. Each tool accepts input in one language (L1), performs various transformations and creates output in another language (L2).

L1 --> Compiler         --> L2
L1 --> Assembler        --> L2
L1 --> Parser Generator --> L2

The key difference between the three tools is the level of abstraction held by the input and output languages. The assembler works at the lowest level by taking assembly files and producing machine code. The compiler works above the assembler by taking a more abstract programming language and generating assembly files or machine code directly. Finally, the parser generator has the highest level of abstraction and transforms a grammar file into programming language source code for a compiler to process.

3.4. What is a grammar file?

We can define a language as the set of strings it contains. While it is sometimes possible to specify a language simply by enumerating all of its strings, such an approach has significant drawbacks. Trying to write each string in our language could be very time consuming and, potentially, take forever.

Suppose we need to read time information as part of a larger program. In a trivial case, the time information may be presented as two digits for the hours, a colon :, and then two digits for the minutes.

00:00, 00:01, 00:02, ... 14:23, 14:24, 14:25, ... 23:57, 23:58, 23:59

We could describe our time language this way but, writing all 1,440 possible hour/minute combinations wouldn’t be much fun. Not to mention how bad things would be if we extended our language to include date information.

As another example, consider the language that consists of all strings of one or more alphabet character.

a, b, c, ... z, aa, ab, ac, ... az, aaa, aab, aac, ...

Even worse than our time example, this language is infinite. It would be impossible for us to explicitly list every string in the language.

If we want to describe such languages, we need a notation that is more abstract than simply writing out strings. We call this notation a grammar and the file that contains it a grammar file.

4. Waxeye Grammars

To generate a parser for a language, you must supply the parser generator with a grammar file that describes the language. Waxeye grammar files are written as text documents and are, by convention, given the .waxeye file extension.

A Waxeye grammar consists of a set of rule definitions, called non-terminals. Together, the non-terminals succinctly describe the syntax of the language. By default, the first non-terminal is considered the starting point of the language definition.

4.1. Non-terminals

Non-terminals are defined in three parts; a name, a rule type and one or more grammar expressions.

The most common non-terminal type is the tree constructing non-terminal. A tree constructing non-terminal has the following form:

Name <- +expressions

Where Name matches [a-zA-Z] *[a-zA-Z0-9_-].

A tree constructing non-terminal
Example <- A | B

The other common non-terminal type is the void non-terminal. The result of a void non-terminal is not included in the AST that is constructed by the parser. To define a void non-terminal, use this form:

Name <: +expressions

A void non-terminal
Example <: A | B

4.2. Expressions

The most important part of each non-terminal definition is the set of expressions it contains. Grammar expressions come in different forms and have their own meanings. Places where an expression can be contained within another expression are denoted with an e.

4.2.1. Atomic Expressions



Matches any character from the input.



Matches text in the input.

Case-insensitive Literal


Matches text in the input while ignores case. This equivalent to the expression [tT][eE][xX][tT] but, is much more readable.

Character Class


Character-class that matches either a lower-case English character, _ or -.



References the non-terminal named NT.



Raises the precedence of the expression e.

4.2.2. Prefix Expressions



Doesn’t include the result of e when building the AST.



Puts e within a closure.



Puts e within a plus-closure.



Puts e within an optional.

Negative Check


Checks that e fails.

Positive Check


Checks that e succeeds.

4.2.3. Sequence Expressions

e1 e2

Matches e1 and e2 in sequence.

4.2.4. Alternation Expressions


Tries to match e1 and, if that fails, tries to match e2.

4.3. Precedence

In Waxeye grammars, some expressions can have other expressions nested within them. When we use parentheses, we are explicitly denoting the nesting structure of the expressions.

((?A) B) | C

At times, this can seem needlessly verbose. In many cases, we are able to omit the parentheses in favor of a shorter notation. We do this by exploiting the precedence of each expression type.

?A B | C

The precedence of an expression determines the priority it has when resolving implicitly nested expressions. Each expression type has a level of precedence relative to all other types. There are four different precedence levels in Waxeye grammars.

4.3.1. Level 4

The highest precedence is held by the atomic expressions. Because these expressions cannot, themselves, contain expressions, there is no need to consider which expressions are nested within them.

4.3.2. Level 3

The prefix expressions hold the next precedence level. Their nesting is resolved directly after the atomic expressions.

4.3.3. Level 2

Sequences of expressions are formed once the atomic and prefix expressions have been resolved.

4.3.4. Level 1

Finally, once all other expressions have been resolved, the different choices of the alternation expression are resolved.

4.4. Pruning Non-terminals

Sometimes, creating a new AST node will give us more information than we need. We might want to create a new AST node, only if doing so will tell us something interesting about our input. If the additional node gives us nothing of interest, our tree could be said to contain vertical noise.

To make it easier to process the AST, we can remove this vertical noise by using the pruning non-terminal type. This non-terminal type has the following form:

Name <= +expressions

When Name has successfully parsed a string, one of three things will happen, depending on the number of results to be included from Name's expressions.

  • If there are no expression results to be included, nothing new will be added to the AST.

  • If there is one expression result to be included, that result will take the place of the Name AST node.

  • Otherwise, a new Name AST node will be created, just like a tree constructing non-terminal.

To help understand how this works, consider an example from a simple arithmetic grammar.

Product <- Number *([*/] Number)

Number  <- +[0-9]

If we use the Product rule to parse the string 3*7, we get a tree with Product at the root and, below that, a Number, a * character and then another Number.

->  Number
    |   3
|   *
->  Number
    |   7

However, if the Product rule parses a string with just one Number in it, we will get a tree that is slightly bigger than we need. Parsing the string 5 produces the following tree.

->  Number
    |   5

In this case, having a Product node at the root of the AST isn’t necessary. If we want to, we can rewrite the original grammar to use a pruning non-terminal.

Product <= Number *([*/] Number)

Number  <- +[0-9]

Now, when we use Product to parse 3*7, we will get the same result as before but, when parsing 5, we get an AST with Number as the root.

|   5

As a second example, let’s look at a grammar for nested parentheses.

A <- :'(' A :')' | B

B <- 'b'

Here are some example inputs and their resulting ASTs:

Input: b

->  B
    |   b

Input: (b)

->  A
    ->  B
        |   b

Input: (((b)))

->  A
    ->  A
        ->  A
            ->  B
                |   b

Unless we want to know the number of parentheses matched, trees like these contain more information than we need. Again, we are able to solve this by rewriting the grammar using a pruning non-terminal.

A <= :'(' A :')' | B

B <- 'b'

This time, parsing the input (((b))) gives us a much shorter tree.

|   b


There are two types of comments in Waxeye grammars; single-line and multi-line.

4.5.1. Single-line

Single-line comments start at the first # outside of an atomic expression and extend until the end of the line.

# This is a single-line comment.

4.5.2. Multi-line

Multi-line comments are opened at the first /* outside of an atomic expression and closed with a */.

/* This is a multi-line comment. */
/* This is, also,
   a multi-line comment. */

As an added convenience for when editing a grammar, multi-line comments can be nested within each other. This is handy when you want to comment out a section of the grammar that already contains a comment.


This is the outer comment.

A <- 'a'

 * This is the inner comment.
B <- 'b'


5. Using Waxeye

This chapter will show you how to setup Waxeye for your programming language. It covers language specific installation requirements and presents some basic boilerplate code to get you started. You can find copies of this boilerplate code in src/example/. I use $WAXEYE_HOME to refer to the location where you have installed the files of the Waxeye distribution.

The example grammar we’ll be using can be found in grammars/num.waxeye. You may wish to copy it to the directory you’re working in so you can experiment with extending and modifying the grammar.

Num <- '0' | [1-9] *[0-9]

Once setup and run, the boilerplate example will use the parser you generated to parse the string 42 and print the AST it creates.

|   4
|   2

5.1. Using Waxeye from C

Waxeye’s C runtime is currently supported on unix platforms and MacOSX.

5.1.1. Install

To install the C runtime, you need to compile it and, optionally, install the header files and library in your system search paths.

To compile the runtime, perform the command make lib from within the src/c directory of your waxeye installation.

cd $WAXEYE_HOME/src/c
make lib
make clean

To install the header files and library in your search paths, you could copy the files directly but, creating symbolic links to them will make upgrading easier.

sudo ln -s $WAXEYE_HOME/lib/libwaxeye.a /usr/local/lib/
sudo ln -s $WAXEYE_HOME/src/c/include/waxeye.h /usr/local/include/
sudo ln -s $WAXEYE_HOME/src/c/include/waxeye /usr/local/include/

5.1.2. Generate Parser

waxeye -g c . num.waxeye

5.1.3. Use Parser

#include <string.h>
#include "parser.h"

int main() {
    // Create our parser
    struct parser_t *parser = parser_new();

    // Setup our input
    char data[] = "42";
    struct input_t *input = input_new(data, strlen(data));

    // Parse our input
    struct ast_t *ast = parse(parser, input);

    // Print our ast
    display_ast(ast, type_strings);


    return 0;

5.1.4. Run

If you installed the headers and library in your system path:

gcc example.c parser.c -lwaxeye -o example


FLAGS="-I $WAXEYE_HOME/src/c/include/ -L $WAXEYE_HOME/lib/"
gcc $FLAGS example.c parser.c -lwaxeye -o example



5.2. Using Waxeye from Java

Waxeye’s Java runtime is compatible with version 1.5 and 1.6 of the JRE. It should also be possible to use Waxeye with JRE versions 1.3 and 1.4 of by retrofitting the classes with Retroweaver or Retrotranslator.

5.2.1. Install

To use a Waxeye parser from Java, you need Waxeye’s Java runtime in your classpath. The required classes are in the Jar file lib/waxeye.jar.

5.2.2. Generate Parser

waxeye -g java . num.waxeye

5.2.3. Use Parser

import org.waxeye.input.InputBuffer;
import org.waxeye.parser.ParseResult;

public class Example {
    public static void main(final String[] args) {
        // Create our parser
        final Parser parser = new Parser();

        // Setup our input
        final InputBuffer input = new InputBuffer("42".toCharArray());

        // Parse our input
        final ParseResult<Type> result = parser.parse(input);

        // Print our ast

5.2.4. Run

javac -cp .:$WAXEYE_HOME/lib/waxeye.jar
java -cp .:$WAXEYE_HOME/lib/waxeye.jar Example

5.3. Using Waxeye from Javascript

Waxeye parsers generated in Javascript should be able to run from any modern Javascript environment. The examples given here will use Node.js.

5.3.1. Install

To install Waxeye’s Javascript runtime library with Node.js, use the following commands on Unix and MacOS systems.

mkdir -p ~/.node_libraries
ln -s /usr/local/waxeye/src/javascript/waxeye.js ~/.node_libraries/

5.3.2. Generate Parser

waxeye -g javascript . num.waxeye

5.3.3. Use Parser

var parser = require('./parser');

// Create our parser
var p = new parser.Parser();

// Parse our input
var ast = p.parse("42");

// Print our AST

5.3.4. Run

node nodejs_example.js

5.4. Using Waxeye from Python

Waxeye’s Python runtime has been tested with Python version 2.5.1 and is intended to work with 2.x.x versions of Python.

5.4.1. Install

To install Waxeye’s Python runtime, you need to run the script from the src/python directory.

cd $WAXEYE_HOME/src/python
python build
sudo python install
rm -rf build/

5.4.2. Generate Parser

waxeye -g python . num.waxeye

5.4.3. Use Parser

import parser

# Create our parser
p = parser.Parser()

# Parse our input
ast = p.parse("42")

# Print our AST
print ast

5.4.4. Run


5.5. Using Waxeye from Ruby

Waxeye’s Ruby runtime is compatible with Ruby version 1.8.6.

5.5.1. Install

Install the Waxeye gem; either from Rubyforge or, from the gem file in lib.

# Install the Waxeye gem from Rubyforge
sudo gem install waxeye

5.5.2. Generate Parser

waxeye -g ruby . num.waxeye

5.5.3. Use Parser

require './parser'

# Create our parser
p =

# Parse our input
ast = p.parse("42")

# Print our AST
puts ast

5.5.4. Run

ruby example.rb

5.6. Using Waxeye from Scheme

Waxeye’s Scheme runtime is compatible with Racket.

5.6.1. Install

Install the waxeye collection where Racket can find it.

# Install the Waxeye collection; change to your install paths as needed
sudo ln -s /usr/local/waxeye/src/scheme/waxeye /usr/local/racket/lib/racket/collects/

5.6.2. Generate Parser

waxeye -g scheme . num.waxeye

5.6.3. Use Parser


(require "parser.scm")

(define (run)
  ;; Parse our input
  (let ((ast (parser "42")))
    ;; Print the ast
    (display-ast ast)))


5.6.4. Run from Racket

racket -t example.scm

6. Using ASTs and Parse Errors

Since just printing an Abstract Syntax Tree isn’t very interesting, let’s have a look at how to access the information the ASTs contain.

When you use a Waxeye parser, the result will be one of two things. If the parser successfully parsed the input, the result will be an AST. If the input doesn’t match the syntax of the language, the result will be a parse error.

6.1. ASTs

ASTs come in three different forms; tree, char and empty.

  • A tree AST contains a type, a list of children and, the start and end position in the input.

  • A char AST contains a single character and has no children.

  • An empty AST simply signifies that parsing was successful. If your starting non-terminal is voided or is pruning and had no children, you will get an empty AST.

6.1.1. Using an AST node as string

If a given AST node will only ever have char children, you may wish to treat that node as a single string.

From C
char *str = ast_children_as_string(ast);
printf("%s\n", str);
From Java
From Javascript
From Python
print ''.join(ast.children)
From Ruby
puts ast.children.join('')
From Scheme
(display (list->string (ast-c ast)))

6.2. Parse Errors

A parse error contains information about where the input is invalid and hints about what is wrong with it.

6.3. Determining the result type

6.3.1. From C

switch (result->type) {
  case AST_TREE:
    return "tree ast";
  case AST_EMPTY:
    return "empty ast";
  case AST_ERROR:
    return "error";

6.3.2. From Java

ParseResult<Type> result = parser.parse(input);
if (result.getAST() != null) {
    if (result instanceof IEmpty) {
        return "empty ast";
    else {
        return "tree ast";
else {
    return "error";

6.3.3. From Javascript

if (result instanceof waxeye.AST) {
    return "tree ast";
else {
    if (result instanceof waxeye.ParseError) {
        return "error";
    else {
        return "empty ast";

6.3.4. From Python

if isinstance(result, waxeye.AST):
    return "tree ast"
elif isinstance(result, waxeye.ParseError):
    return "error"
    return "empty ast"

6.3.5. From Ruby

case result.class
  when Waxeye::AST
    "tree ast"
  when Waxeye::ParseError
    "empty ast"

6.3.6. From Scheme

  ((ast? result) "tree ast")
  ((parse-error? result) "error")
  (else "empty ast"))

7. Example: A Calculator

Now that we know how to write grammars, generate parsers and manipulate AST, we can put these skills together to build a small language interpreter. In this chapter, we create a command-line calculator.

Our calculator reads a line of input, parses it as an arithmetic expression and computes the result. The arithmetic language supports the following constructs.

  • floating point numbers

  • binary operators +,-,*,/

  • unary negation

  • parentheses

calc  <- ws sum

sum   <- prod *([+-] ws prod)

prod  <- unary *([*/] ws unary)

unary <= '-' ws unary
       | :'(' ws sum :')' ws
       | num

num   <- +[0-9] ?('.' +[0-9]) ws

ws    <: *[ \t\n\r]

7.1. Calculator in C

#include <stdio.h>
#include "parser.h"

double sum(struct ast_t *ast);

// GNU libc extension
extern ssize_t getline(char **lineptr, size_t *n, FILE *stream);

double num(struct ast_t *ast) {
    char *buf = ast_children_as_string(ast);
    double val = atof(buf);
    return val;

double unary(struct ast_t *ast) {
    struct ast_tree_t *tree = ast->data.tree;
    switch (tree->type) {
        case TYPE_UNARY:
            return - unary(vector_get(tree->children, 1));
        case TYPE_SUM:
            return sum(ast);
            return num(ast);

double prod(struct ast_t *ast) {
    struct vector_t *chil = ast->data.tree->children;
    size_t num_chil = chil->size;
    double val = unary(vector_get(chil, 0));

    size_t i;
    for (i = 1; i < num_chil; i +=2) {
        char operator = ((struct ast_t*) vector_get(chil, i))->data.c;
        double operand = unary(vector_get(chil, i + 1));

        if (operator == '*') {
            val *= operand;
        else {
            val /= operand;

    return val;

double sum(struct ast_t *ast) {
    struct vector_t *chil = ast->data.tree->children;
    size_t num_chil = chil->size;
    double val = prod(vector_get(chil, 0));

    size_t i;
    for (i = 1; i < num_chil; i +=2) {
        char operator = ((struct ast_t*) vector_get(chil, i))->data.c;
        double operand = prod(vector_get(chil, i + 1));

        if (operator == '+') {
            val += operand;
        else {
            val -= operand;

    return val;

void calc(struct parser_t *parser, struct input_t *input) {
    struct ast_t *ast = parse(parser, input);

    if (ast->type == AST_ERROR) {
        display_ast(ast, type_strings);
    else {
        printf("%f\n", sum(vector_get(ast->data.tree->children, 0)));


struct input_t *rl() {
    size_t data_size = 0;
    char* data = NULL;
    ssize_t line_size = getline(&data, &data_size, stdin);

    if (line_size == -1) {
        return NULL;

    return input_new(data, line_size);

int main() {
    struct input_t *input;
    struct parser_t *parser = parser_new();

    printf("calc> ");
    input = rl();

    while (input != NULL) {
        calc(parser, input);
        printf("calc> ");
        input = rl();


    return 0;

7.2. Calculator in Java

import java.util.List;

import org.waxeye.ast.IAST;
import org.waxeye.ast.IChar;
import org.waxeye.input.InputBuffer;
import org.waxeye.parser.ParseResult;

 * A commandline arithmetic calculator.
public final class Calculator {
    private static final Parser p = new Parser();

    private Calculator() {

    private static Object calc(final String input) {
        final ParseResult<Type> result =
            p.parse(new InputBuffer(input.toCharArray()));
        final IAST<Type> ast = result.getAST();

        if (ast == null) {
            return result.getError();
        else {
            return sum(ast.getChildren().get(0));

    private static double sum(final IAST<Type> sum) {
        final List<IAST<Type>> chil = sum.getChildren();
        double val = prod(chil.get(0));

        for (int i = 1; i < chil.size(); i += 2) {
            final char operator = ((IChar)chil.get(i)).getValue();

            if (operator == '+') {
                val += prod(chil.get(i + 1));
            else {
                val -= prod(chil.get(i + 1));

        return val;

    private static double prod(final IAST<Type> prod) {
        final List<IAST<Type>> chil = prod.getChildren();
        double val = unary(chil.get(0));

        for (int i = 1; i < chil.size(); i += 2) {
            final char operator = ((IChar)chil.get(i)).getValue();

            if (operator == '*') {
                val *= unary(chil.get(i + 1));
            else {
                val /= unary(chil.get(i + 1));

        return val;

    private static double unary(final IAST<Type> unary) {
        switch (unary.getType()) {
            case Unary:
                return - unary(unary.getChildren().get(1));
            case Sum:
                return sum(unary);
                return num(unary);

    private static double num(final IAST<Type> num) {
        return Double.parseDouble(num.childrenAsString());

    public static void main(String[] args) {
        try {
            final BufferedReader in = new BufferedReader(
                new InputStreamReader(;
            System.out.print("calc> ");
            String line = in.readLine();

            while (line != null) {
                System.out.print("calc> ");
                line = in.readLine();

        catch (IOException e) {

7.3. Calculator in Javascript

var util = require('util');
var waxeye = require('waxeye');
var parser = require('./parser');

var p = new parser.Parser();

// a commandline arithmetic calculator.
var calc = function(input) {
    var ast = p.parse(input);
    if (ast instanceof waxeye.ParseError) {
        return ast;
    else {
        return sum(ast.children[0]);

var binOp = function(ast, fn, ch, op1, op2) {
    var chil = ast.children;
    // apply the visitor function to our first sub-tree
    var val = fn(chil[0]);
    var i = 1;
    while (i != chil.length) {
        // choose our operator function
        var operator = chil[i] == ch ? op1 : op2;
        // apply the visitor function to our second sub-tree
        var operand = fn(chil[i + 1]);
        // apply the operator to our current value and the second sub-tree
        val = operator(val, operand);
        // move on to the next operator and sub-tree
        i += 2;
    return val

var sum = function(ast) {
    var add = function(a, b){return a + b;};
    var sub = function(a, b){return a - b;};
    return binOp(ast, prod, '+', add, sub);

var prod = function(ast) {
    var mult = function(a, b){return a * b;};
    var div = function(a, b){return a / b;};
    return binOp(ast, unary, '*', mult, div);

var unary = function(ast) {
    if (ast.type === 'unary') {
        // the unary rule is a pruning non-terminal
        // the only case we will see it is if we have negation
        return - unary(ast.children[1]);
    else {
        if (ast.type === 'sum') {
            return sum(ast);
        else {
            return num(ast);

var num = function(ast) {
    return parseFloat(ast.children.join(''));

var input = "";
var stdin = process.openStdin();


util.print('calc> ');

// Read our input
stdin.on('data', function (line) {
    util.print('calc> ');

stdin.on('end', function () {

7.4. Calculator in Python

import sys
import waxeye
import parser

p = parser.Parser()

# A commandline arithmetic calculator.
def calc(input):
    ast = p.parse(input)
    if isinstance(ast, waxeye.ParseError):
        return ast
        return sum(ast.children[0])

def bin_op(ast, fn, ch, op1, op2):
    chil = ast.children
    val = fn(*(chil[0],))
    i = 1
    while i != len(chil):
        if chil[i] == ch:
            operator = op1
            operator = op2
        # Increment val by the operator applied to val and the operand
        operand = fn(*(chil[i + 1],))
        val = operator(*(val, operand))
        i += 2
    return val

sum = lambda ast: bin_op(ast, prod, '+', lambda a, b: a + b, lambda a, b: a - b)

prod = lambda ast: bin_op(ast, unary, '*', lambda a, b: a * b, lambda a, b: a / b)

def unary(ast):
    if ast.type == 'unary':
        return - unary(ast.children[1])
    elif ast.type == 'sum':
        return sum(ast)
        return num(ast)

def num(ast):
    return float(''.join(ast.children))

sys.stdout.write('calc> ')
line = sys.stdin.readline()
while line:
    print calc(line)
    sys.stdout.write('calc> ')
    line = sys.stdin.readline()

7.5. Calculator in Ruby

require 'rubygems'
require 'waxeye'
require './parser'

# A commandline arithmetic calculator.
class Calculator
  @@p =

  def self.calc(input)
    ast = @@p.parse(input)

    if ast.is_a?(Waxeye::ParseError)

  def self.bin_op(ast, fn, ch, op1, op2)
    chil = ast.children
    val = self.send(fn, (chil[0]))
    i = 1
    until i == chil.size
      # Increment val by the operator applied to val and the operand
      val = val.send(chil[i] == ch ? op1 : op2,
                     self.send(fn, chil[i + 1]))
      i += 2

  def self.sum(ast)
    bin_op(ast, :prod, '+', :'+', :'-')

    bin_op(ast, :unary, '*', :'*', :'/')

  def self.unary(unary)
    case unary.type
      when :unary
        - unary(unary.children[1])
      when :sum

  def self.num(num)

print 'calc> '
STDIN.each {|input| puts Calculator.calc(input); print 'calc> ' }

7.6. Calculator in Scheme


(require "parser.scm")

;; A commandline arithmetic calculator.

(define (calc input)
  (let ((ast (parser input)))
    (if (ast? ast)
        (begin (display (sum (car (ast-c ast))))
        (display-parse-error ast))))

(define (bin-op ast fn ch op1 op2)
  (let* ((chil (list->vector (ast-c ast)))
         (val (fn (vector-ref chil 0))))
    (let loop ((i 1))
      (unless (= i (vector-length chil))
              ;; Increment val by the operator applied to val and the operand
              (set! val ((if (equal? (vector-ref chil i) ch) op1 op2)
                         val (fn (vector-ref chil (+ i 1)))))
              (loop (+ i 2))))

(define (sum ast)
  (bin-op ast prod #\+ + -))

(define (prod ast)
  (bin-op ast unary #\* * /))

(define (unary ast)
  (case (ast-t ast)
    ((unary) (- (unary (cadr (ast-c ast)))))
    ((sum) (sum ast))
    (else (num ast))))

(define (num ast)
  (string->number (list->string (ast-c ast))))

(define (rl)
  (display "calc> ")
  (read-line (current-input-port)))

(let loop ((input (rl)))
  (if (eof-object? input)
      (begin (calc input)
             (loop (rl)))))


8. Grammar Testing

;; These are tests for the 'Grammar' non-terminal
(Grammar ; <- This is the non-terminal's name

 ;; Following the name are pairs of input string and expected output. The
 ;; output is either the keyword 'pass', the keyword 'fail' or an AST. The AST
 ;; specifies the structure of the expected tree, the names of the nodes and
 ;; the individual characters. If you don't want to specify the whole tree,
 ;; just use the wild-card symbol '*' for the portion of the tree you want to
 ;; skip.

 "" ; <- This is the input
 (Grammar) ; <- This is the expected output

 "A <- 'a'"
 pass ; <- The keyword 'pass'

 fail ; <- The keyword 'fail'

 "A <- 'a' B <- 'b'"
 (Grammar (Definition (Identifier #\A) *)  ; <- Here we skip some of
          (Definition (Identifier #\B) *)) ;    Definition's children

 "A <- 'a'"
 (Grammar (*)) ; <- Here we specify a child tree of any type

 "A <- [a-z] *[a-z0-9]"
 (Grammar (Definition (Identifier #\A) (LeftArrow) (Alternation *)))

 "A <- 'a'"
 (Grammar (Definition (Identifier #\A)
            (LeftArrow) (Alternation (Sequence (Unit (Literal (LChar #\a)))))))

9. Modular Grammars

It is sometimes desirable to have a grammar split across multiple files and to have a final grammar built from those files. We can do this by using a modular grammar.

Having our grammar split in this way provides us with the opportunity to manipulate the definition of the non-terminals and, in the process, create new languages. Depending on how we compose our final grammar, we can create vastly different languages from the same base grammars and only need to change the one modular grammar.

One of the biggest advantages of modular grammars is that they make it very easy to embed one language within another. Many languages can be thought of in this way. Prime examples are when a programming language is embedded within XML or HTML. Or, going the other way, you could embed a data language like SQL within a programming language.

There are also cases when a language’s syntax changes subtly over time. We want to have parsers for each version of the language but without duplicating large parts of our grammars.

9.1. Grammar Composition

A modular grammar is made up of expressions that pull together non-modular grammars. Some modular expressions can have other expressions nested within them. An expression is one of the following:

  • "grammar.waxeye"
    A path to a .waxeye file. This path should either be relative to the modular grammar or be an absolute path.

  • (rename modular-exp (old-name . new-name) …)
    Renames the specified non-terminals with their new names.

  • (only modular-exp non-term …)
    Includes only the listed non-terminals.

  • (all-except modular-exp non-term …)
    Includes all non-terminals except those listed.

  • (prefix prefix modular-exp)
    Prefixes the names of non-terminals from modular-exp.

  • (prefix-only prefix modular-exp non-term …)
    Prefixes only the listed non-terminals.

  • (prefix-all-except prefix modular-exp non-term …)
    Prefixes all non-terminals except those listed.

  • (join modular-exp …)
    Combines the results of multiple modular expressions into a single expression. Not needed at the top-level.

;; A contrived example where we replace the definition of Number in Json with a
;; much simpler one that only supports integers.

(all-except "../json.waxeye" Number)

(rename (only "../num.waxeye" Num) (Num . Number))

10. Waxeye Options

waxeye [ <option> ... ] <grammar>
 where <option> is one of
 Waxeye modes:
/ -g <language> <dir> : Generate
| -i : Interpret
\ -t <test> : Test
 Grammar options:
  -m : Modular Grammar - default: false
  -s <start> : Starting non-terminal - default: first non-terminal
 Parser options:
  -c <comment> : Header comment for generated files - default: none
  -e <eof> : Check parser consumes all input - default: true
  -n <namespace> : Module or package namespace - default: none
  -p <prefix> : Name prefix for generated files - default: none
 Misc options:
  --debug : Activates debug information
  --version : Prints version number and copyright notice
  --help, -h : Show this help
  -- : Do not treat any remaining argument as a switch (at this level)
 /|\ Brackets indicate mutually exclusive options.
 Multiple single-letter switches can be combined after one `-'; for
  example: `-h-' is the same as `-h --'

10.1. Waxeye Modes


The grammar file describing the language you want to parse. It is the last argument given to Waxeye and is required by all of Waxeye’s operating modes.

10.1.1. Generate

-g <language> <dir>

Creates a parser written in the specified programming language. Writes the parser’s files to the specified directory.

Currently supported programming languages:

  • c

  • java

  • javascript

  • python

  • ruby

  • scheme

waxeye -g scheme . grammar.waxeye

10.1.2. Interpret


Parses input as a string from the language defined by the grammar. Displays the resulting AST or parse error.

waxeye -i grammar.waxeye < input.txt

10.1.3. Test

-t <test>

Runs the tests in the specified test file for the language defined by the grammar. Displays any test errors.

waxeye -t tests.scm grammar.waxeye

10.2. Grammar Options


Indicates that the grammar is a modular grammar.

-s <start>

Specifies the non-terminal that starts the language. Default - The first non-terminal in the grammar.

10.3. Parser Options

-c <comment>

The file to be used as the header comment of generated files. Default - none.

-e <eof>

Whether to check that the parser consumes all input. Default - true.

-n <namespace>

The module or package namespace. Default - none.

-p <prefix>

The name prefix for any generated files. Default - none.

10.4. Misc Options


Activates debug information.


Prints the version number and copyright notice.

--help, -h

Prints a message describing the available command-line options.

11. GNU Free Documentation License

                GNU Free Documentation License
                  Version 1.2, November 2002

 Copyright (C) 2000,2001,2002  Free Software Foundation, Inc.
     51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.


The purpose of this License is to make a manual, textbook, or other
functional and useful document "free" in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way
to get credit for their work, while not being considered responsible
for modifications made by others.

This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense.  It
complements the GNU General Public License, which is a copyleft
license designed for free software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does.  But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book.  We recommend this License
principally for works whose purpose is instruction or reference.


This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License.  Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein.  The "Document", below,
refers to any such manual or work.  Any member of the public is a
licensee, and is addressed as "you".  You accept the license if you
copy, modify or distribute the work in a way requiring permission
under copyright law.

A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall subject
(or to related matters) and contains nothing that could fall directly
within that overall subject.  (Thus, if the Document is in part a
textbook of mathematics, a Secondary Section may not explain any
mathematics.)  The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding

The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License.  If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant.  The Document may contain zero
Invariant Sections.  If the Document does not identify any Invariant
Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License.  A Front-Cover Text may
be at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters.  A copy made in an otherwise Transparent file
format whose markup, or absence of markup, has been arranged to thwart
or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount
of text.  A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML, PostScript or PDF designed for human modification.  Examples of
transparent image formats include PNG, XCF and JPG.  Opaque formats
include proprietary formats that can be read and edited only by
proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page.  For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.

A section "Entitled XYZ" means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language.  (Here XYZ stands for a
specific section name mentioned below, such as "Acknowledgements",
"Dedications", "Endorsements", or "History".)  To "Preserve the Title"
of such a section when you modify the Document means that it remains a
section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document.  These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.


You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License.  You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute.  However, you may accept
compensation in exchange for copies.  If you distribute a large enough
number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and
you may publicly display copies.


If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover.  Both covers must also clearly and legibly identify
you as the publisher of these copies.  The front cover must present
the full title with all words of the title equally prominent and
visible.  You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent

If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a computer-network location from which the general network-using
public has access to download using public-standard network protocols
a complete Transparent copy of the Document, free of added material.
If you use the latter option, you must take reasonably prudent steps,
when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that
edition to the public.

It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.


You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it.  In addition, you must do these things in the Modified Version:

A. Use in the Title Page (and on the covers, if any) a title distinct
   from that of the Document, and from those of previous versions
   (which should, if there were any, be listed in the History section
   of the Document).  You may use the same title as a previous version
   if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
   responsible for authorship of the modifications in the Modified
   Version, together with at least five of the principal authors of the
   Document (all of its principal authors, if it has fewer than five),
   unless they release you from this requirement.
C. State on the Title page the name of the publisher of the
   Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
   adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice
   giving the public permission to use the Modified Version under the
   terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections
   and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title, and add
   to it an item stating at least the title, year, new authors, and
   publisher of the Modified Version as given on the Title Page.  If
   there is no section Entitled "History" in the Document, create one
   stating the title, year, authors, and publisher of the Document as
   given on its Title Page, then add an item describing the Modified
   Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for
   public access to a Transparent copy of the Document, and likewise
   the network locations given in the Document for previous versions
   it was based on.  These may be placed in the "History" section.
   You may omit a network location for a work that was published at
   least four years before the Document itself, or if the original
   publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications",
   Preserve the Title of the section, and preserve in the section all
   the substance and tone of each of the contributor acknowledgements
   and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
   unaltered in their text and in their titles.  Section numbers
   or the equivalent are not considered part of the section titles.
M. Delete any section Entitled "Endorsements".  Such a section
   may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements"
   or to conflict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant.  To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a

You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version.  Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity.  If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.


You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy.  If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History"
in the various original documents, forming one section Entitled
"History"; likewise combine any sections Entitled "Acknowledgements",
and any sections Entitled "Dedications".  You must delete all sections
Entitled "Endorsements".


You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.


A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole


Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections.  You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers.  In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements",
"Dedications", or "History", the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual


You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License.  Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License.  However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.


The Free Software Foundation may publish new, revised versions
of the GNU Free Documentation License from time to time.  Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.  See

Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License "or any later version" applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation.  If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:

    Copyright (c)  YEAR  YOUR NAME.
    Permission is granted to copy, distribute and/or modify this document
    under the terms of the GNU Free Documentation License, Version 1.2
    or any later version published by the Free Software Foundation;
    with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
    A copy of the license is included in the section entitled "GNU
    Free Documentation License".

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the "with...Texts." line with this:

    with the Invariant Sections being LIST THEIR TITLES, with the
    Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the

If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.