Thu, 21 Jan 2010 19:44:36 +0000 (UTC)

Related articles |
---|

Prefix, infix and function-call and their implications in embedded lan pengyu.ut@gmail.com (Peng Yu) (2010-01-20) |

Re: Prefix, infix and function-call and their implications in embedded gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-01-21) |

Re: Prefix, infix and function-call and their implications in embedded herron.philip@googlemail.com (Philip Herron) (2010-01-21) |

Re: Prefix, infix and function-call and their implications in embedded kkylheku@gmail.com (Kaz Kylheku) (2010-01-21) |

Re: Prefix, infix and function-call and their implications in embedded bartc@freeuk.com (bartc) (2010-01-21) |

Re: Prefix, infix and function-call and their implications in embedded monnier@iro.umontreal.ca (Stefan Monnier) (2010-01-25) |

From: | Kaz Kylheku <kkylheku@gmail.com> |

Newsgroups: | comp.compilers |

Date: | Thu, 21 Jan 2010 19:44:36 +0000 (UTC) |

Organization: | A noiseless patient Spider |

References: | 10-01-069 |

Keywords: | design, comment |

Posted-Date: | 21 Jan 2010 14:57:59 EST |

On 2010-01-21, Peng Yu <pengyu.ut@gmail.com> wrote:

*> Consider the following three expressions, which are valid C, mit-*

*> scheme and Mathematica expressions. There are of course many other*

*> expressions that express the same thing in other languages, or in the*

*> same language but other different ways.*

*>*

*> 3+2*5>7*

*> (> (+ 3 (* 2 5)) 7)*

*> Greater[Plus[3,Times[2,5]],7]*

*>*

*> Apparently, at least to me, the first expression is the most readable.*

Really? What if we replace 2 3 5 7 by a b c d, and then change

the meaning of the operators, or give them a precedence you aren't

accustomed to?

What if 3+2*5>7 is actually a Smalltalk expression, such that it just

means ((3+2)*5)>7?

*> One possible reason is that we learn this algebraic notation much*

*> earlier than the other two, which is in analogy to that we can respond*

*> to the native language (say, English) much faster than to a second*

*> language (say, French).*

Another possible reason is that the algebraic notation has only a few

operators, whose precedence you have memorized (and are assuming to

hold true of the expression above).

Would it still be readable if the grammar had 500 operators, arranged into 200

precedence levels?

Another reason is that because you have a few operators, you can use special

glyphs for them, which are distinct from numbers and variables.

That second Lisp notation is unambigous. So we can replace all of the

non-punctuation symbols, and still recognize the tree shape as being the same,

provided we keep the parentheses in the printed notation as they are:

(> (+ 3 (* 2 5)) 7)

-> substitute generated symbols for all non-nil atoms ->

(G0001 (G0002 G0003 (G0004 G0005 G0006)) G0007)

I still know that the major constituent of the expression is G0001,

whose arguments are (G0002 ....) and G0007.

If we substitute the non-punctuation symbols of the infix expression, we are

lost; there is no explicit grouping there to retain:

G0001 G0002 G0003 G0004 G0005 G0006 G0007

Can you remember that G0002 and G0004 are binary operators,

and that G0004 has a higher precedence than G0002?

See, the amount of assumption is much smaller for the S-exp notation.

We assume that the parentheses group and that space separates elements.

That's it. The infix means we assume.

So what happens with operator precedence is that when the number of meanings we

want to use exceeds the number of operators, we can't invent new operators, so

we start overloading the meanings of the existing ones: a + b adds

strings together, or performs set union, etc. In math is not so bad because in

math you can invent new glyphs, make use of different typefaces and alphabets,

and make use of two dimensions, etc. If you want some different kind of plus,

you can put a circle or box around the plus symbol and there you go: new glyph.

When prefix notations get long, we can easily break them into multiple lines

using a few simple guidelines, e.g.:

(G0001 (G0002 G0003

(G0004 G0005 G0006))

G0007)

This we can easily visualize the structure as a tree printed sideways.

Consider that all Scheme code is written in that notation, not just small

expressions. The notation scales to express everything in the program.

The infix notation like a+b*c>7 is only /locally/ readable: small,

simple instances that fit onto less than about half a line of text.

It does not scale to large expressions, and it's not suitable for writing

expressing entire programs, which is why languages which have expressions

typically provide other constructs like statements and declarations for

structuring the rest of the program.

*> Readability affects the programmer productivity.*

That's only one kind of readability, which we can call ``micro-readability'':

the readability of a small expression that occupies about a third of a line of

text in your editor.

Microreadability is significant, but not as much as you think.

A large program is not readable no matter what notation it is written in. You

can't just sit down and read 500,000 lines of code, and grasp it all as a unit.

So being able to pick out a readable 15 character subsequence of that program

doesn't actually buy you as much as you think.

Suppose that small subexpressions found in a 500,000 line program are all

beautifully micro-readable. Suppose you need to make a small change to one of

them. What if it turns out that the program has 10,000 other expressions

similar to that one (but not exactly the same), and they /all/ have to be found

and changed in an analogous way in order for your proposed change to work

properly? Oops.

Large program structure and semantics is what affects productivity.

It's not how micro-readable it is, but how little of it you have to read,

understand and rewrite to implement a new requirement, or fix a bug.

*> Since embedded language can be embedded in a computer language, such*

*> scheme and C++, the choice of prefix, infix and function-call can*

*> profound affect the readability of the embedded language. I haven't*

*> found any previous references on this issue. Could somebody*

*> recommend me some if there are?*

If you don't have any references, how can you be sure that the effect of infix

versus prefix is ``profound''?

People working in, say, Java don't struggle any more or less than people

working in Scheme, in terms of just cranking out raw code and understanding

what they have written.

They struggle differently on a higher semantic level.

*> 3+2*5>7*

[ASSIGN 3 TO CONSTANT-THREE. ASSIGN 2 TO CONSTANT-TWO. ASSIGN 5 TO CONSTANT-FIVE.

MULTIPLY CONSTANT-TWO BY CONSTANT-FIVE GIVING INTERMEDIATE-FACTOR. ADD CONSTANT-THREE

TO INTERMEDIATE-FACTOR GIVING SUM-VALUE. IF SUM-VALUE IS GREATER THAN 7 GOTO ANOTHER-PLACE.

Now THAT's readable. -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.