note_splitter.lexer module

For splitting raw text into a list of tokens.

The lexer categorizes lines of text first without looking at their context. For example, a markdown codeblock will become two code fence tokens surrounding one or more tokens of any type, possibly “incorrect” types such as header. Then the lexer makes a quick pass over the token list while looking at each token’s context to ensure they have the correct type.

class note_splitter.lexer.Lexer

Bases: object

Creates a Callable that converts raw text to a list of tokens.

__call__(text: str) list[note_splitter.tokens.Token]

Converts raw text to a list of tokens.

Parameters

text (str) – The raw text to convert to a list of tokens.