Multibyte lexers in flex?
10 May 2000 02:51:56 -0400

          From comp.compilers

Related articles
Multibyte lexers in flex? (2000-05-10)
| List of all articles for this month |

Newsgroups: comp.compilers
Date: 10 May 2000 02:51:56 -0400
Organization: - Before you buy.
Keywords: lex, i18n, comment

    Does anyone have any experience in or tricks for developing scanners
in flex (or a variant) that support multibyte characters? I am
interested in developing a lexer (actually extending an existing one)
that will have to support different code pages at runtime. So, for
example, I would like to recognize patterns such as:


where VALUE can contain multibyte characters in the current codepage
(Japanese Shift-JIS, EUC-JP, etc. depending on where the executable is

    I realize there are some ways around this by writing patterns such


<MB_MODE>.+ { /* punt to some external fcn to
handle the multibyte string */ }

but this is somewhat ugly and requires me to be very careful about how
I write my patterns. Anyone have any ideas?

[This has come up before. In its usual 8-bit transparent mode, lex handles
multibyte characters just fine as multi-character sequences. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.