string_utils

Documentation for `generate_fuzzy_regex` and `combine_chunks`¶

`generate_fuzzy_regex`¶

Functionality¶

Generates a fuzzy regex pattern that matches variations of an input text. The pattern is case-insensitive and allows for one to two character substitutions at any position in the text.

Parameters¶

text: The string for which a fuzzy regex pattern is generated.

Usage¶

Purpose: To build a regex pattern capable of matching input strings with minor character variations, useful for fuzzy text matching.

Example¶

pattern = generate_fuzzy_regex("Hello")
# pattern may look like:
# ^([Hh][a-zA-Z]{1,2}[Ee][Ll][Ll][Oo]|[Hh][Ee][a-zA-Z]{1,2}[Ll][Ll][Oo]|...)
import re
re.match(pattern, "HEllo")

`combine_chunks`¶

Functionality¶

This function takes a list of tokens and merges them into a single string, ensuring that punctuation is attached correctly to tokens. Tokens in punctuation_attach_left (like '.', ',', '!', '?', ':', ';', ')', ']', or '}') are appended to the previous token with no space, while tokens in punctuation_attach_both (like '-' or '/') are merged with both the previous and next token. All other tokens are separated by spaces.

Parameters¶

chunks: List[str]
A list of tokens (words and punctuation) that will be combined.

Returns¶

A string:
The combined string with correct punctuation formatting.

Usage¶

Purpose: Used to combine tokenized text into a coherent sentence string while preserving punctuation rules.

Example¶

tokens = ['a', 'b', '-', 'c', 'd', '.']
result = combine_chunks(tokens)
# result is "a b-c d."

string_utils

Documentation for generate_fuzzy_regex and combine_chunks¶

generate_fuzzy_regex¶

Functionality¶

Parameters¶

Usage¶

Example¶

combine_chunks¶

Functionality¶

Parameters¶

Returns¶

Usage¶

Example¶

Documentation for `generate_fuzzy_regex` and `combine_chunks`¶

`generate_fuzzy_regex`¶

`combine_chunks`¶