Tokens across two input buffers

cherico@bonbon.net (cherico)
21 Sep 2004 22:21:30 -0400

          From comp.compilers

Related articles
Tokens across two input buffers cherico@bonbon.net (2004-09-21)
| List of all articles for this month |

From: cherico@bonbon.net (cherico)
Newsgroups: comp.compilers
Date: 21 Sep 2004 22:21:30 -0400
Organization: http://groups.google.com
Keywords: lex, question
Posted-Date: 21 Sep 2004 22:21:30 EDT

I am using flex to detect utf-8 encoded letters. Because the input is
from socket, so I use yy_switch_to_buffer() everytime new data coming
from the socket descriptor.


But sometimes, a utf-8 token may be divided into two pieces in two
sequent buffers due to the nature of socket. This resulted in
incorrect result.


I tried to put the "imcomplete" characters back to the input stream in
<<EOF>> rule (use yyless). But these characters were output before the
<<EOF>> rule.


Is there any way to solve this problem?
[Of course. Rather than using yy_switch_to_buffer, define a version
of YY_INPUT to get the data from the socket. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.