Re: Spell checking identifiers

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Wed, 24 Jun 2020 03:56:56 +0800

From comp.compilers

Related articles
Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-24)
*Re: Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson)* (2020-06-24)**
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-23)
Re: Spell checking identifiers derek@_NOSPAM_knosof.co.uk.invalid (Derek M. Jones) (2020-06-24)
Re: Spell checking identifiers 937-053-0959@kylheku.com (Kaz Kylheku) (2020-06-24)
Re: Spell checking identifiers tkoenig@netcologne.de (Thomas Koenig) (2020-06-24)
Re: Spell checking identifiers gautier_niouzes@hotmail.com (2020-06-24)
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-24)
[5 later articles]

| List of all articles for this month |

From:	Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Newsgroups:	comp.compilers
Date:	Wed, 24 Jun 2020 03:56:56 +0800
Organization:	Easynews - www.easynews.com
References:	20-06-010
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42091"; mail-complaints-to="abuse@iecc.com"
Keywords:	lex, errors
Posted-Date:	23 Jun 2020 15:59:33 EDT
In-Reply-To:	20-06-010
Content-Language:	en-GB

> [There's a vast amount of work on edit distance. My guess is they
> use something like Levenshtein, but rather than use a constant
> distance of 1 between different letters, the distance varies depending
> on how different the letters look. -John]

This clang blog specifically mentions Levenshtein,

http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker

and it looks like what people do is to go through the entire symbol
table and compute it against the individual erroneous identifier.

I thought that'd be a bit on the expensive side, because C++ files
can have 100k+ (or millions?) of lines after preprocessing, so one
translation unit really can go up to million identifiers in practice.
[I don't know if that actually happens but I don't think it's safe
to assume it doesn't.]

In the 10 years since, people may have changed from standard Levenshtein
as you mention.

But then, maybe compilation speed for erroneous input isn't really
important. rustc is slow for a short input file in both cases [which
could be the startup cost.]

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Spell checking identifiers

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>Wed, 24 Jun 2020 03:56:56 +0800

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Wed, 24 Jun 2020 03:56:56 +0800