Misspellings Class Documentation¶
Overview¶
The Misspellings
class is designed to introduce typographical errors (misspellings) into input strings by applying predefined error rates. This functionality is useful for generating augmented text data that simulates common misspelling errors, making it applicable for testing text classifiers and spell-check systems.
Inheritance¶
The Misspellings
class inherits from the AugmentationWithRandomSelection
base class, which provides a framework for implementing random selection in augmentations.
Parameters¶
selection_size
: Proportion of misspellings to apply.error_rates
: List of error rates to introduce variations. Defaults to[0.1, 0.2]
if not provided.
Functionality¶
The class generates multiple augmented text variants from an input string, returning a list that contains the original string along with its misspelled versions generated using different error rates.
Usage¶
To use the Misspellings
class, instantiate it with the desired selection_size
and error_rates
, then call its transformation method to obtain a list that includes the original and augmented strings with misspellings.
Example¶
misspelling = Misspellings(0.8, [0.05, 0.1])
augmented_texts = misspelling._raw_transform("sample text")
For an input string like "example", a possible output could be:
[
"example",
"exampel",
"examlpe"
]
This methodology effectively introduces realistic typing errors that can enhance model robustness by simulating natural misspellings, inspired by common keyboard mistakes.