Re: Spell checking identifiers

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Wed, 24 Jun 2020 03:56:56 +0800

          From comp.compilers

Related articles
Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-24)
Re: Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-24)
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-23)
Re: Spell checking identifiers derek@_NOSPAM_knosof.co.uk.invalid (Derek M. Jones) (2020-06-24)
Re: Spell checking identifiers 937-053-0959@kylheku.com (Kaz Kylheku) (2020-06-24)
Re: Spell checking identifiers tkoenig@netcologne.de (Thomas Koenig) (2020-06-24)
Re: Spell checking identifiers gautier_niouzes@hotmail.com (2020-06-24)
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-24)
[5 later articles]
| List of all articles for this month |

From: Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Newsgroups: comp.compilers
Date: Wed, 24 Jun 2020 03:56:56 +0800
Organization: Easynews - www.easynews.com
References: 20-06-010
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42091"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, errors
Posted-Date: 23 Jun 2020 15:59:33 EDT
In-Reply-To: 20-06-010
Content-Language: en-GB

> [There's a vast amount of work on edit distance.  My guess is they
> use something like Levenshtein, but rather than use a constant
> distance of 1 between different letters, the distance varies depending
> on how different the letters look. -John]


This clang blog specifically mentions Levenshtein,




http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker


and it looks like what people do is to go through the entire symbol
table and compute it against the individual erroneous identifier.


I thought that'd be a bit on the expensive side, because C++ files
can have 100k+ (or millions?) of lines after preprocessing, so one
translation unit really can go up to million identifiers in practice.
[I don't know if that actually happens but I don't think it's safe
to assume it doesn't.]


In the 10 years since, people may have changed from standard Levenshtein
as you mention.


But then, maybe compilation speed for erroneous input isn't really
important. rustc is slow for a short input file in both cases [which
could be the startup cost.]


--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.