Text
Usage
This is a filter that simply retrieves the buffer's text and returns it as Unicode. It takes a file or file buffer and returns a single SourceText object containing all the text in the file. It is the default filter when there is no filter specified, though it can be manually included via pyspelling.filters.text. When first in the chain, the file's default, assumed encoding is utf-8 unless otherwise overridden by the user.
The Text filter can also be used convert from one encoding to another.
matrix:
- name: text
default_encoding: cp1252
pipeline:
- pyspelling.filters.text:
convert_encoding: utf-8
source:
- "**/*.txt"
Options
| Options | Type | Default | Description |
|---|---|---|---|
normalize | string | '' | Performs Unicode normalization. Valid values are NFC, NFD, NFKC, and NFKD. |
convert_encoding | string | '' | Assuming a valid encoding, the text will be converted to the specified encoding. |
errors | string | 'strict' | Specifies what to do when converting the encoding, and a character can't be converted. Valid values are strict, ignore, replace, xmlcharrefreplace, backslashreplace, and namereplace. |
Categories
Text returns text with the following categories.
| Category | Description |
|---|---|
text | The extracted text. |