Skip to content

Misspellings Class Documentation

Overview

The Misspellings class is designed to introduce typographical errors (misspellings) into input strings by applying predefined error rates. This functionality is useful for generating augmented text data that simulates common misspelling errors, making it applicable for testing text classifiers and spell-check systems.

Inheritance

The Misspellings class inherits from the AugmentationWithRandomSelection base class, which provides a framework for implementing random selection in augmentations.

Parameters

  • selection_size: Proportion of misspellings to apply.
  • error_rates: List of error rates to introduce variations. Defaults to [0.1, 0.2] if not provided.

Functionality

The class generates multiple augmented text variants from an input string, returning a list that contains the original string along with its misspelled versions generated using different error rates.

Usage

To use the Misspellings class, instantiate it with the desired selection_size and error_rates, then call its transformation method to obtain a list that includes the original and augmented strings with misspellings.

Example

misspelling = Misspellings(0.8, [0.05, 0.1])
augmented_texts = misspelling._raw_transform("sample text")

For an input string like "example", a possible output could be:

[
  "example",
  "exampel",
  "examlpe"
]

This methodology effectively introduces realistic typing errors that can enhance model robustness by simulating natural misspellings, inspired by common keyboard mistakes.