Re: ML-style pattern matching in C-like languages

"Ben Chambers" <gtg983q@mail.gatech.edu>
15 Dec 2005 17:52:31 -0500

          From comp.compilers

Related articles
ML-style pattern matching in C-like languages rsc@swtch.com (Russ Cox) (2005-12-15)
Re: ML-style pattern matching in C-like languages just-for-news-frido@q-software-solutions.de (Friedrich Dominicus) (2005-12-15)
Re: ML-style pattern matching in C-like languages gtg983q@mail.gatech.edu (Ben Chambers) (2005-12-15)
Re: ML-style pattern matching in C-like languages torbenm@app-2.diku.dk (2005-12-15)
Re: ML-style pattern matching in C-like languages RLake@oxfam.org.uk (2005-12-15)
Re: ML-style pattern matching in C-like languages rvclayton@acm.org (2005-12-15)
Re: ML-style pattern matching in C-like languages haberg@math.su.se (2005-12-19)
Re: ML-style pattern matching in C-like languages Juergen.Kahrs@vr-web.de (=?ISO-8859-1?Q?J=FCrgen_Kahrs?=) (2005-12-19)
Re: ML-style pattern matching in C-like languages rsc@swtch.com (Russ Cox) (2005-12-23)
[2 later articles]
| List of all articles for this month |

From: "Ben Chambers" <gtg983q@mail.gatech.edu>
Newsgroups: comp.compilers
Date: 15 Dec 2005 17:52:31 -0500
Organization: Georgia Institute of Technology
References: 05-12-036
Keywords: C, design
Posted-Date: 15 Dec 2005 17:52:31 EST

"Russ Cox" <rsc@swtch.com> wrote
> Have there been any proposals or prototypes built
> for introducing some form of ML-style pattern matching
> in C-like languages?
>
> Another issue is that my structures have much more
> information in them than I care to match on. For example,
> the generic abstract syntax node might look something like:
>
> struct Node
> {
> int op; /* OADD, OSUB, OMUL, ONEG, etc. */
> Node *left; /* left child in a.s. tree */
> Node *right; /* right child in a.s. tree */
> Line line; /* source line information */
> Type *type; /* C type assigned to expression */
> };
>
> If I want to pattern match on this, sometimes I might care
> about the type and sometimes not. I never care about the
> source line number. Sometimes I don't even care about the
> right child. In the real struct Node there are many more fields,
> any of which I only rarely care about.
>
> How do ML programmers who write real compilers deal
> with this issue (auxiliary data that never or rarely is
> interesting in the match)?


In ML, there are two ways that I am aware of to deal with this problem.


Say that you have the declaration (assuming line and ctype are already
defined):


    datatype node = NODE of int * node * node * line * ctype


You can use a match like this to only bind the values left and right (both
other nodes) and only if the type is OADD:


      case somenode
          of (NODE(OADD, left, right, _, _)) => ...


You can also use a record, so that you can omit the _ and also name
each of the fields.


This analogy, however, doesn't really approximate the full usefulness
of MLs typesystem. More often, you will find that the type being
matched upon can take one of several different structures, each one
only containing the information that is relevant. As a simple
example, consider the AST supporting binary operations and unary
negation:


      datatype binop = ADD | SUB | DIV | MUL
      datatype treenode = BINOP of binop * treenode * treenode * line * ctype
                                          | UMINUS of treenode * line * ctype


Then you could write:


      if(n ~ OADD(OMUL(?x, ?y), ?z))


as something like:


      case n
          of *BINOP(OADD, BINOP(OMUL, x, y, _, _), _, _)) =>


The corresponding record version would be a little cleaner:


      datatype binop = ADD | SUB | DIV | MUL
      datatype treenode = BINOP of {op: binop,
                                                                    left: treenode,
                                                                    right: treenode,
                                                                    linenum: line,
                                                                    exptype: etype}
                                          | UMINUS of {exp: treenode,
                                                                    linenum: line,
                                                                    exptype: etype}


The use of records would let us write the following:


      case n
          of (BINOP{op=OADD, left=BINOP{op=OMUL, left=x, right=y, ...},
                                              right, ...}) =>


Doesn't seem like a very big win, but that is mainly because of the
bepth to which you are picking apart the expression. With records, if
you only wanted the left and right, you could write:


      case n
          of (BINOP{op=OADD, left, right, ...}) =>


Which is pretty simple to write.


I'm not sure if that is really any help, but it explains a little bit
about it could be done in ML. One big thing, is that the use of
enumerated structures causes the information you are matching on to
only contain data relevant to the operation. The other win that you
can get is with records it is simple to ignore fields you don't care
about.


Hope that helps,
Ben Chambers


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.