Skip to content

Text

Usage

This is a filter that simply retrieves the buffer's text and returns it as Unicode. It takes a file or file buffer and returns a single SourceText object containing all the text in the file. It is the default filter when there is no filter specified, though it can be manually included via pyspelling.filters.text. When first in the chain, the file's default, assumed encoding is utf-8 unless otherwise overridden by the user.

The Text filter can also be used convert from one encoding to another.

matrix:
- name: text
  default_encoding: cp1252
  pipeline:
  - pyspelling.filters.text:
      convert_encoding: utf-8
  source:
  - "**/*.txt"

Options

Options Type Default Description
normalize string '' Performs Unicode normalization. Valid values are NFC, NFD, NFKC, and NFKD.
convert_encoding string '' Assuming a valid encoding, the text will be converted to the specified encoding.
errors string 'strict' Specifies what to do when converting the encoding, and a character can't be converted. Valid values are strict, ignore, replace, xmlcharrefreplace, backslashreplace, and namereplace.

Categories

Text returns text with the following categories.

Category Description
text The extracted text.