parsing bibtex file with flex/bison

bnrj.rudra@gmail.com
Mon, 4 Mar 2013 15:48:56 -0800 (PST)

          From comp.compilers

Related articles
parsing bibtex file with flex/bison bnrj.rudra@gmail.com (2013-03-04)
Re: parsing bibtex file with flex/bison drikosev@otenet.gr (Evangelos Drikos) (2013-03-06)
Re: parsing bibtex file with flex/bison bnrj.rudra@gmail.com (Rudra Banerjee) (2013-03-07)
Re: parsing bibtex file with flex/bison gah@ugcs.caltech.edu (glen herrmannsfeldt) (2013-03-08)
Re: parsing bibtex file with flex/bison bnrj.rudra@gmail.com (Rudra Banerjee) (2013-03-17)
Re: parsing bibtex file with flex/bison torsten.eichstaedt@FernUni-Hagen.de (Torsten =?UTF-8?B?RWljaHN0w6RkdA==?=) (2013-03-25)
| List of all articles for this month |

From: bnrj.rudra@gmail.com
Newsgroups: comp.compilers
Date: Mon, 4 Mar 2013 15:48:56 -0800 (PST)
Organization: Compilers Central
Injection-Date: Mon, 04 Mar 2013 23:48:56 +0000
Keywords: lex, parse, design, comment
Posted-Date: 05 Mar 2013 00:13:53 EST

I want to parse bibtex file using flex/bison. A sample bibtex is:
@Book{a1,
author="amook",
Title="ASR",
Publisher="oxf",
Year="2010",
Add="UK",
Edition="1",
}
@Article{a2,
Author="Rudra Banerjee",
Title={FeNiMo},
Publisher={P{\"R}B},
Issue="12",
Page="36690",
Year="2011",
Add="UK",
Edition="1",
}
(A new key may start in same line)
Now, I have written a flex code:


%{
#include <stdio.h>
#include <stdlib.h>
%}


%{
char yylval;
int YEAR,i;
//char array_author[1000];
%}
%x author
%x title
%x pub
%x year
%%
@ printf("\nNEWENTRY\n");
[a-zA-Z][a-zA-Z0-9]* {printf("%s",yytext);
BEGIN(INITIAL);}
author= {BEGIN(author);}
<author>\"[a-zA-Z\/.]+\" {printf("%s",yytext);
                                                       BEGIN(INITIAL);}
title= {BEGIN(title);}
<title>\"[a-zA-Z\/.]+\" {printf("%s",yytext);
                                                       BEGIN(INITIAL);}
publisher= {BEGIN(pub);}
<pub>\"[a-zA-Z\/.]+\" {printf("%s",yytext);
                                                       BEGIN(INITIAL);}
[a-zA-Z0-9\/.-]+= printf("ENTRY TYPE ");
\" printf("QUOTE ");
\{ printf("LCB ");
\} printf(" RCB");
; printf("SEMICOLON ");
\n printf("\n");
%%


int main(){
    yylex();
//char array_author[1000];
//printf("%d%s",&i,array_author[i]);
i++;
return 0;
}


while this is peeking up the few things, not all.
Can anyone kindly help me with this?
[My suggestion would be to do less in the lexer and more in the parser. In the
lexer, responable tokens might be '@' '{' '}' '=' ',' word
qstring (quoted string)


Then you could write bison rules like this:


clause: '@' word '{' word ',' attrlist '}' ;


attrlist: attr | attr ',' attrlist ;


attr: name '=' value ;


value: word | qstring | nestlist :


nestlist: '{' list '}' ;


list: listitem | list listitem ;


listitem: word | qstring | nestlist :


And so forth. This isn't exactly right, but it should get you going
in the right direction. The parser will recognize some invalid bibtex,
e.g., words that aren't attribute names, which it's easier to check in
semantic code rather than trying to stick laundry lists of keywords
into the parser. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.