Spellchecker Impact on Typing-Error Biometrics

The Problem

Typing-error biometric systems (see the Behavioural Biometric System project) rely on recording the characteristic mistakes a user makes while typing. But modern operating systems and applications apply automatic spellcorrection before the keystrokes ever reach the application layer. This silently erases part — or all — of the biometric signal before it can be captured.

How much damage do spellcheckers actually do? And which checkers are most harmful? This project answers both questions.

Methodology

We built a controlled benchmark using context-free grammar sentences that allow programmatic manipulation of the text. We generated two classes of typos:

  • QWERTY-proximity typos — swaps between physically adjacent keys, reflecting genuine motor-error patterns.
  • Random typos — uniformly sampled substitutions, serving as a noise baseline.

We then ran each sentence through a suite of popular Python spellcheckers — including pyspellchecker, autocorrect, textblob, jamspell, and others — and measured how many biometrically informative errors each checker removed versus preserved.

Key Findings

  • Aggressive context-aware checkers removed the vast majority of QWERTY-proximity errors, the most biometrically valuable class.
  • Simple dictionary-based checkers were far less destructive and preserved more of the signal.
  • Random typos were more resistant to correction, but they carry less individual-specific information.

Implications for System Design

Any real-world deployment of typing-error biometrics must either (a) operate at the raw keylogger level, below the spellchecker, or (b) account for spellchecker interference in feature extraction. Ignoring this can lead to false confidence in laboratory results that do not transfer to production.

The full benchmark is described in Spellchecker Analysis for Behavioural Biometric of Typing Errors Scenario (SECRYPT 2024).

Bartłomiej Marek
Bartłomiej Marek
PhD Candidate

PhD Candidate at CISPA, SprintML lab. I research privacy and security of large language models and multimodal systems.