Re: Grammar for roman numerals

Martin Ward <>
29 Mar 2007 00:59:10 -0400

          From comp.compilers

Related articles
Grammar for roman numerals (2007-03-27)
Re: Grammar for roman numerals (Martin Ward) (2007-03-29)
Re: Grammar for roman numerals (Ivan Boldyrev) (2007-03-29)
Re: Grammar for roman numerals (Dmitry A. Kazakov) (2007-03-30)
Re: Grammar for roman numerals (Martin Ward) (2007-03-30)
Re: Grammar for roman numerals (Dmitry A. Kazakov) (2007-04-01)
Re: Grammar for roman numerals (Hans-Peter Diettrich) (2007-04-01)
Re: Grammar for roman numerals (whiskey) (2007-04-06)
[3 later articles]
| List of all articles for this month |

From: Martin Ward <>
Newsgroups: comp.compilers
Date: 29 Mar 2007 00:59:10 -0400
Organization: Compilers Central
References: 07-03-095
Keywords: parse
Posted-Date: 29 Mar 2007 00:59:10 EDT

On Tuesday 27 Mar 2007 14:27, wrote:
> Here is my grammar (I allow an arbitrary number of Ms)
> numeral -> thousands
> thousands -> thous_part hundreds | thous_part | hundreds
> thous_part -> thous_part M | M
> hundreds -> hun_part tens | hun_part | tens
> hun_part -> hun_rep | CD | D | D hun_rep | CM
> hun_rep -> C | CC | CCC
> tens -> tens_part ones | tens_part | ones
> tens_part -> tens_rep | XL | L | L tens_rep | XC
> tens_rep -> X | XX | XXX
> ones -> ones_rep | IV | V | V ones_rep | IX
> ones_rep -> I | II | III
> Comments?

This doesn't accept IIII for 4 (as found on many clocks with Roman
Numeral faces, for example), nor does it accept the "shorthand"
forms: IC for 99, IIC for 98, MVM for 1995 and so on.
The rule is that any smaller number placed before a larger
number is subtracted from the larger number.
I know of no examples where the "smaller number"
consists of other than a single numeral, or the two identical numerals
II, XX or CC. However, constructions such as IIIII for "five", IIX for "eight"
or VV for "ten" have been discovered in manuscripts.

A bar placed over a number multiplies it by one thousand,
and a double bar multiplies it by one million.
This could be implemented in your system by using parentheses
to denote the bar: thus (I) would represent 1,000.
(In the Middle Ages, 500, usually D, was sometimes written as
I followed by an apostrophus, resembling a backwards C, while 1,000
was written as CI followed by an apostrophus.)

The more general question raised by this discussion (and more relevant
to comp.compilers) is how "forgiving" should a parser be in the case
where the language being parsed has no formal definition: or where
there are several, conflicting formal definitions?
Do you accept anything that can possibly be interpreted,
or do you place "arbitrary" restrictions in order to simplify
the grammar, at the expense of rejecting existing files?

Martin Erdos number: 4
G.K.Chesterton web site:

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.