athena.data.text_featurizer

Text featurizer

Module Contents

Classes

Vocabulary Vocabulary
EnglishVocabulary English vocabulary seperated by space
SentencePieceFeaturizer TODO: docstring
TextTokenizer Text Tokenizer
TextFeaturizer The main text featurizer interface
class athena.data.text_featurizer.Vocabulary(vocab_file)

Vocabulary

Interface::
decode: Convert a list of ids to a sentence, with space inserted encode: Convert a sentence to a list of ids, with special tokens added
load_model(self, vocab_file)

load model

_default_unk_index(self)
_default_unk_symbol(self)
__len__(self)
decode(self, ids)

Convert a list of ids to a sentence.

encode(self, sentence)

Convert a sentence to a list of ids, with special tokens added.

__call__(self, inputs)
class athena.data.text_featurizer.EnglishVocabulary(vocab_file)

Bases: athena.data.text_featurizer.Vocabulary

English vocabulary seperated by space

decode(self, ids)

Convert a list of ids to a sentence.

encode(self, sentence)

Convert a sentence to a list of ids, with special tokens added.

class athena.data.text_featurizer.SentencePieceFeaturizer(spm_file)

TODO: docstring

load_model(self, model_file)

load model

__len__(self)
encode(self, sentence)

Convert a sentence to a list of ids by sentence piece model

decode(self, ids)

Conver a list of ids to a sentence

class athena.data.text_featurizer.TextTokenizer(text=None)

Text Tokenizer

load_model(self, text)

load model

__len__(self)
encode(self, texts)

Convert a sentence to a list of ids, with special tokens added.

decode(self, sequences)

Conver a list of ids to a sentence

class athena.data.text_featurizer.TextFeaturizer(config=None)

The main text featurizer interface

supported_model
default_config
model_type

the model type

unk_index

return the unk index

load_model(self, model_file)

load model

delete_punct(self, tokens)

delete punctuation tokens

__len__(self)
encode(self, texts)

Convert a sentence to a list of ids, with special tokens added.

decode(self, sequences)

Conver a list of ids to a sentence