note_splitter.tokens module

The class definitions for all the tokens.

See the hierarchy of all the token types here: https://note-splitter.readthedocs.io/en/latest/token-hierarchy.html

Each token class has a content property. If the token is not a combination of other tokens, that content property is a string of the original content of the raw line of text. Otherwise, the content property is the list of subtokens. Each token class also has a boolean class variable (not an instance variable) named HAS_PATTERN. If HAS_PATTERN is True, the class has a corresponding regular expression in patterns.py.

class note_splitter.tokens.Block

Bases: note_splitter.tokens.Token

The ABC for tokens that are each a combination of tokens.

__bool__(): Returns whether the token’s content is empty.

__contains__(item: note_splitter.tokens.Token) → bool: Returns whether the token’s content contains an item.

__delitem__(index: int) → None: Deletes the token at the given index.

__getitem__(index: int) → note_splitter.tokens.Token: Returns the token at the given index.

__iter__(): Returns an iterator for the token’s content.

__len__(): Returns the length of the token’s content.

__setitem__(index: int, token: note_splitter.tokens.Token) → None: Sets the token at the given index to the given token.

__str__(): Returns the original content of the token’s raw text.

append(token: note_splitter.tokens.Token) → None: Appends the given token to the section.

property content: list[typing.Any]

insert(index: int, token: note_splitter.tokens.Token) → None: Inserts the given token at the given index.

remove(token: note_splitter.tokens.Token) → None: Removes the given token from the section.

class note_splitter.tokens.Blockquote(line: str = '')

Bases: note_splitter.tokens.CanHaveInlineElements

A single-line quote.

content

The content of the line of text.

Type: str

level

The number of spaces of indentation.

Type: int

HAS_PATTERN = True

class note_splitter.tokens.BlockquoteBlock(tokens_: Optional[list[typing.Any]] = None)

Bases: note_splitter.tokens.Block

Multiple lines of blockquotes.

content

The consecutive blockquote tokens.

Type: list[Blockquote]

class note_splitter.tokens.CanHaveInlineElements(line: str = '')

Bases: note_splitter.tokens.Line

The ABC for single-line tokens that can have inline elements.

class note_splitter.tokens.Code(line: str = '')

Bases: note_splitter.tokens.Fenced

A line of code inside a code block.

content

The content of the line of text.

Type: str

class note_splitter.tokens.CodeBlock(tokens_: Optional[list[typing.Any]] = None)

Bases: note_splitter.tokens.Block

A multi-line code block.

content

The code block’s code fence tokens surrounding code token(s).

Type: list[Union[CodeFence, Code]]

language

Any text that follows the triple backticks (or tildes) on the line of the opening code fence. Surrounding whitespace characters are removed.

Type: str

class note_splitter.tokens.CodeFence(line: str = '')

Bases: note_splitter.tokens.Fence

The delimiter of a multi-line code block.

content

The content of the line of text.

Type: str

language

Any text that follows the triple backticks (or triple tildes). Surrounding whitespace characters are removed. This will be an empty string if there are no non-whitespace characters after the triple backticks/tildes.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.EmptyLine(line: str = '')

Bases: note_splitter.tokens.Line

A line with either whitespace characters or nothing.

content

The content of the line of text.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.Fence

Bases: note_splitter.tokens.Line

The ABC for tokens that block fences are made out of.

class note_splitter.tokens.Fenced

Bases: note_splitter.tokens.Line

The ABC for tokens that are between Fence tokens.

class note_splitter.tokens.Footnote(line: str = '')

Bases: note_splitter.tokens.CanHaveInlineElements

A footnote (not a footnote reference).

content

The content of the line of text.

Type: str

reference

The footnote’s reference that may appear in other parts of the document.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.Header(line: str = '')

Bases: note_splitter.tokens.CanHaveInlineElements

A header (i.e. a title).

content

The content of the line of text.

Type: str

body

The content of the line of text not including the header symbol(s) and their following whitespace character(s).

Type: str

level

The header level. A header level of 1 is the largest possible header.

Type: int

HAS_PATTERN = True

class note_splitter.tokens.HorizontalRule(line: str = '')

Bases: note_splitter.tokens.Line

A horizontal rule.

content

The content of the line of text.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.Line(line: str = '')

Bases: note_splitter.tokens.Token

The ABC for tokens that take up one line of a file.

property content: str

class note_splitter.tokens.Math(line: str = '')

Bases: note_splitter.tokens.Fenced

A line of math inside a math block.

content

The content of the line of text.

Type: str

class note_splitter.tokens.MathBlock(tokens_: Optional[list[typing.Any]] = None)

Bases: note_splitter.tokens.Block

A multi-line mathblock.

Inline mathblocks are not supported (the opening and closing math fences must be on different lines).

content

The mathblock’s math fence tokens surrounding math token(s).

Type: list[Math]

class note_splitter.tokens.MathFence(line: str = '')

Bases: note_splitter.tokens.Fence

The delimiter of a multi-line mathblock.

content

The content of the line of text.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.OrderedListItem(line: str = '')

Bases: note_splitter.tokens.TextListItem, note_splitter.tokens.CanHaveInlineElements

An item in an ordered list.

content

The content of the line of text.

Type: str

level

The number of spaces of indentation.

Type: int

HAS_PATTERN = True

class note_splitter.tokens.Section(tokens_: Optional[list[typing.Any]] = None)

Bases: note_splitter.tokens.Block

A file section starting with a token of the chosen split type.

The Splitter returns a list of Sections. Section tokens never contain section tokens, but may contain tokens of any and all other types.

content

The tokens in this section, starting with a token of the chosen split type.

Type: list[Token]

class note_splitter.tokens.Table(tokens_: Optional[list[typing.Any]] = None)

Bases: note_splitter.tokens.Block

A table.

content

The table’s row token(s) and possibly divider token(s).

Type: list[Union[TableRow, TableDivider]]

class note_splitter.tokens.TableDivider(line: str = '')

Bases: note_splitter.tokens.TablePart

The part of a table that divides the table’s header from its body.

content

The content of the line of text.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.TablePart

Bases: note_splitter.tokens.Line

The ABC for tokens that tables are made out of.

class note_splitter.tokens.TableRow(line: str = '')

Bases: note_splitter.tokens.TablePart

A row of a table.

content

The content of the line of text.

Type: str

HAS_PATTERN = True

class note_splitter.tokens.Task(line: str = '')

Bases: note_splitter.tokens.TextListItem, note_splitter.tokens.CanHaveInlineElements

A to do list item that is either checked or unchecked.

content

The content of the line of text.

Type: str

level

The number of spaces of indentation.

Type: int

is_done

Whether the task is done (whether the box is checked).

Type: bool

HAS_PATTERN = True

class note_splitter.tokens.Text(line: str = '')

Bases: note_splitter.tokens.CanHaveInlineElements

Normal text.

This class is the catch-all for individual lines of text that don’t fall into any other category.

content

The content of the line of text.

Type: str

level

The number of spaces of indentation.

Type: int

class note_splitter.tokens.TextList(tokens_: Optional[list[typing.Any]] = None)

Bases: note_splitter.tokens.Block

A list that is numbered, bullet-pointed, and/or checkboxed.

A single text list may have any combination of ordered list items, unordered list items, tasks, and other text lists with more indentation.

content

The tokens that make up the list. Lists may have sublists.

Type: list[Union[TextListItem, “TextList”]]

level

The number of spaces of indentation of the first item in the list.

Type: int

class note_splitter.tokens.TextListItem

Bases: note_splitter.tokens.Line

The ABC for text list item tokens.

class note_splitter.tokens.Token

Bases: abc.ABC

The abstract base class (ABC) for all tokens.

HAS_PATTERN = False

__str__(): Returns the original content of the token’s raw text.

property content: Any

class note_splitter.tokens.UnorderedListItem(line: str = '')

Bases: note_splitter.tokens.TextListItem, note_splitter.tokens.CanHaveInlineElements

An item in a bullet point list.

The list can have bullet points as asterisks, minuses, and/or pluses.

content

The content of the line of text.

Type: str

level

The number of spaces of indentation.

Type: int

HAS_PATTERN = True

note_splitter.tokens.__is_token_type(obj: Any) → bool

Returns True if obj is a Token type.

Parameters: obj (Any) – The object to test.

note_splitter.tokens._get_indentation_level(line: str) → int

Counts the spaces at the start of the line.

If there are tabs instead, each tab is counted as 4 spaces. This function assumes tabs and spaces are not mixed.

note_splitter.tokens.get_all_token_types(tokens_module: module) → list[type[note_splitter.tokens.Token]]

Gets the list of all token types.

Call the function like this: tokens.get_all_token_types(tokens).

Parameters: tokens_module (ModuleType) – The module containing the token types. There is only one correct argument. The only reason why the argument is required is because there doesn’t seem to be any other way to automatically get the list of token types from within the file they are in.