Re: What does it mean to "move characters" in the lexer?

Thomas Koenig <tkoenig@netcologne.de>
Wed, 22 Jun 2022 11:45:22 -0000 (UTC)

          From comp.compilers

Related articles
What does it mean to "move characters" in the lexer? costello@mitre.org (Roger L Costello) (2022-06-21)
Re: What does it mean to "move characters" in the lexer? gah4@u.washington.edu (gah4) (2022-06-21)
Re: What does it mean to "move characters" in the lexer? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? tkoenig@netcologne.de (Thomas Koenig) (2022-06-22)
| List of all articles for this month |

From: Thomas Koenig <tkoenig@netcologne.de>
Newsgroups: comp.compilers
Date: Wed, 22 Jun 2022 11:45:22 -0000 (UTC)
Organization: news.netcologne.de
References: 22-06-057 22-06-058 22-06-064 22-06-066
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="77766"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, performance, parallel
Posted-Date: 22 Jun 2022 10:01:48 EDT

Kaz Kylheku <480-992-1380@kylheku.com> schrieb:


> I remember reading some article some years ago whereby some Javascript
> programmer discovered it was faster to read JSON from a file using
> dedicated JSON routines available in Javascript, than to declare the
> same syntax in the Javascript program as a literal and let it be
> scanned along with the program and available to it that way.


This came up on comp.arch recently.


There is an insanely fast JSON parser ad UTF-8 validator based
on SIMD to be found at https://github.com/simdjson/simdjson .
They select a different length of vector according to
the CPU version they find. The algorithm is described at
https://arxiv.org/pdf/1902.08318.pdf. It
heavily relies on special-casing for JSON and for the SIMD
instructions that are available.


A general SIMD-based parser generator is likely to be even harder
to write and will probably not outperform the package above (nor,
for that case, a traditional character-at-a-time approach).


Is there research on this?



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.