# Expression Parsing (was: parsing equations)

## Keith.Clarke@dcs.qmw.ac.uk4 Apr 1996 00:23:21 -0500

From comp.compilers

Related articles
Expression Parsing (was: parsing equations) Keith.Clarke@dcs.qmw.ac.uk (1996-04-04)
| List of all articles for this month |

 From: Keith.Clarke@dcs.qmw.ac.uk Newsgroups: comp.compilers Date: 4 Apr 1996 00:23:21 -0500 Organization: Compilers Central Keywords: parse

Anything to help a Mac programmer, and our esteemed moderator...

>I am looking for tools or examples to help implement a spreadsheet-like
>equation parser for an internal application we need to write. We need to
>handle arbitrary nesting and complexity... everything from
>
>A1 + B1 / C1
>
>to
>
>((0.75 + RAND() * 0.5) * AVERAGE(\$A3:\$A19)) / (1 + LN( MAX(C3:C19) ) )
...
>Patrick Parker
>[It's certainly been done a thousand times, but it'd be nice to have a
>documented expression parser packaged up and ready to go. -John]

I have a demo expression parser written in C++, about 200 lines, but
essentially just C code. It uses the recursive descent precedence
method I posted in 1992 (sorry, I don't have a reference, but someone
reposted it in March this year).

It started as an example for teaching, & shows how to achieve various
effects that are occasionally asked for, such as juxtaposition, C-like
precedences, postfix ops, etc. It could easily handle constructs like
expr?expr:expr
too, but I haven't bothered.

Keith Clarke
---
Code for the archive:

// This program reads an integer expression terminated by an equals sign,
// calculates its value and prints it.
// It allows spaces before each operator, number and '='.

// It uses a simple recursive precedence parser
// featuring prefix, postfix, infix operators; juxtaposition/function calls;
// left-associative infix ops; right-assoc infix ops.

// BNF
// expr = primary
// | expr ^ expr exponentiation
// | expr expr juxtaposition (subtraction, here!)
// | expr / expr division
// | expr * expr mult
// | expr + expr addition
// | expr - expr subtraction
// | expr & expr logical and
// | expr | expr logical or
// primary
// = ( expr )
// | digits
// | - expr integer negation
// | ! expr logical negation
// | expr ( expr ) division (bizarrely!)
//
// The ambiguities in the above BNF have been resolved to make the
// behaviour of this example C-like. Postfix beats prefix
// beats infix; usual C precedences except juxtaposition (same as subtraction);
// the usual function-call syntax is made to mean division, thus 100(5)(2)=10.
// I've chosen non-associative operations so it's easy to see that this code
// gives left-associative behaviour.
//
//
// There are comments in the code to show how other behaviours might be
// realised.
//
// Permission is given to use this source provided an acknowledgement is given.
// I'd also like to know if you've found it useful.

// The following Research Report describes the idea, and shows how the
// parsing method may be understood as an encoding of the usual family-of-
// parsing-procedures technique as used e.g. in Pascal compilers.
// @techreport{QMW-DCS-383-1986a,
// author ="Clarke, Keith",
// title ="The Top-Down Parsing of Expressions",
// institution ="Department of Computer Science, Queen Mary College,
University of London, England",
// year ="1986",
// month ="June",
// number ="QMW-DCS-1986-383",
// scope ="theory",
// abstractURL
="http://www.dcs.qmw.ac.uk/publications/report_abstracts/1986/383",
// keywords ="Recursive-descent parsing, expression parsing,
operator precedence parsing."
// }
// A formal proof of the algorithm was made, as part of his PhD thesis work,
// by A.M. Abbas of QMC, London, in the framework of Constructive Set Theory.
// copyright Keith Clarke, Dept of Computer Science, QMW, University of London,
// England. email keithc@dcs.qmw.ac.uk

#include <stream.h>
#include <assert.h>
#include <ctype.h>

void printnum(int);
int fact(int);

// C-like highest-precedence postfix ops are handled in readprimary().
// Lower precedence postfix ops should be given a precedence in prio()
int postfix(char ch) {
return ch=='(' || ch=='!';
}

// This function is used only if you want to use juxtaposition of two
// expressions with a precedence like a ordinary operator, such as using
// a+bc^d to mean a+(b*(c^d)) as in ordinary arithmetic.
// When an expr can be formed from one expression immediately followed
// by a phrase P, then any of the symbols in the first set of P should
// be given a precedence suitable for the binding power of the construct.
// For true juxtaposition this would be FIRST(expr); but I've made it only
// digits so I can illustrate C-like behaviour of "function calls" e.g. f(x)(y)
int firstjuxta(char ch) {
switch(ch) {
// case '(' : return 1;

// return true for any digit, variable etc
// to allow e.g. (f 3) for a function application,
// or, in this example calculator (4 3) to mean the same as (4-3).
default: return isdigit(ch);
}
}

// Infix operators and maybe juxtaposition & postfix operators.
// This function tabulates the required behaviour of infix operators
// larger numbers bind more tightly.
// 0 means the symbol isn't an infix operator.
// It can be useful to give postfix ops a non-zero precedence - see readexpr()
int prio(char ch) {
switch(ch) {
case '|' : return 1;
case '&' : return 2;
case '+' : return 3;
case '-' : return 3;
case '*' : return 4;
case '/' : return 4;
case '^' : return 5;
default:
if(firstjuxta(ch)) return 3;
else return 0;
}
}

// Infix operators.
// True if it is, False if it isn't; doesn't matter if the symbol isn't
// even an infix op.
int rightassoc(char ch) {
switch(ch) {
case '^' : return 1;
default: return 0;
}
}

// END OF PRECEDENCE INFO

void skipspaces() {char ch; while(cin.get(ch),isspace(ch)) ; cin.putback(ch); }

int power(int n,int m) {
int k=1;
while(m>0) {k*=n; m--;}
return k;
}

// Read an infix expression containing top-level operators that bind at least
// as tightly as the given precedence.
// Don't consume the first non-digit character after the last number.
// Complain if you can't even find the first number,
// or if there is an operator with no following number.
int number, thisprec;
char ch;

skipspaces();
while ( cin >> ch, thisprec=prio(ch), thisprec>=prec ) {

// We're here because prio(ch) is large enough,
// but if firstjuxta(ch) is true, then ch is the first
// character of the following expression, not an infix operator.
if(firstjuxta(ch)) cin.putback(ch);

// if ch is a symbol which is postfix not infix,
// don't read the following expression here
switch(ch) {
case '+' : ans+=number; break;
case '*' : ans*=number; break;
case '-' : ans-=number; break;
case '/' : ans/=number; break;
case '^' : ans=power(ans,number); break;
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
// for illustration, I've made juxtaposition mean subtraction
// when the second one starts with a digit, in fn. firstjuxta()
ans-=number; break;
default: cout << "Panic: don't know the meaning of the operator "
<< ch << endl;
};
}
// We just read a character that wasn't an operator, and we don't
// have any use for it - but the next input function might - put it back
cin.putback(ch);

// no return value
}

// read a single number; a whole expression in parentheses; prefix op
char ch;
if(cin>>ch,ch=='(') {
// must be an expression; we've consumed the opening parenthesis
// and since readsum didn't care what followed the sum, the
// closing parenthesis is still there waiting to be consumed
if(cin>>ch,ch!=')') {
cout << "Missing right parenthesis\n";
// maybe we just consumed the = sign - put it back, just in case
cin.putback(ch);
}
} else if (isdigit(ch)) {
cin.putback(ch);

// illustrative prefix operators
} else if (ch=='-') {
} else if (ch=='!') {
} else
cout << "Number, left bracket or prefix operator expected\n";

// Parsing postfix ops here gives them HIGHER precedence than any
// infix operation; HIGHER precedence than prefix operators.
// You can give a postfix op a lower prec
// by handling it as though it were infix - see readexpr()

// illustrative postfix operators
while(cin>>ch,postfix(ch)) {
if(ch=='!') {
ans=fact(ans);
} else if(ch=='(') {
// make this "function call" syntax mean division
int number;
ans/=number;
if(cin.get(ch),ch!=')') {
cout << "Missing right parenthesis in function call\n";
}
} else
cout << "Panic: don't know the meaning of postfix op "
<< ch << endl;
}
cin.putback(ch);
}

// print an integer in decimal, one char at a time
// For pedagogic purposes only!
void printnum(const int N) {
if(N<10) { cout << N%10; }
else {
printnum(N/10);
cout << N%10;
}
}

int digitvalue(char ch) {
return (ch - '0');
}

// Each function checks all of its phrase, even if the caller has already
// made sure the first symbol is OK
char ch;
N=0;
while(cin.get(ch),isspace(ch)) ;
if(!isdigit(ch)) {
cout << "Number expected\n";
return;
}
while(isdigit(ch)) {
N = N*10+digitvalue(ch);
cin.get(ch);
}
cin.putback(ch);
}

int fact(int n) {
return n==0?1:n*fact(n-1);
}

main() {
int total; char ch;
cout << "Type an integer expression terminated by an '=' character";

// No doubt there's a better way to do this, but who cares...
while(cout << "\n? ",cin.get(ch),cin.putback(ch),!cin.eof()) {
skipspaces();
cin >> ch;
if(ch=='=') {
printnum(total);
} else {
cout << "Expected an '=' character after the last number";
}
while(cin.get(ch),ch!='\n') ;
}
}
--

Post a followup to this message