athena.data.text_featurizer¶
Text featurizer
Module Contents¶
Classes¶
Vocabulary |
Vocabulary |
EnglishVocabulary |
English vocabulary seperated by space |
SentencePieceFeaturizer |
TODO: docstring |
TextTokenizer |
Text Tokenizer |
TextFeaturizer |
The main text featurizer interface |
-
class
athena.data.text_featurizer.Vocabulary(vocab_file)¶ Vocabulary
- Interface::
- decode: Convert a list of ids to a sentence, with space inserted encode: Convert a sentence to a list of ids, with special tokens added
-
load_model(self, vocab_file)¶ load model
-
_default_unk_index(self)¶
-
_default_unk_symbol(self)¶
-
__len__(self)¶
-
decode(self, ids)¶ Convert a list of ids to a sentence.
-
encode(self, sentence)¶ Convert a sentence to a list of ids, with special tokens added.
-
__call__(self, inputs)¶
-
class
athena.data.text_featurizer.EnglishVocabulary(vocab_file)¶ Bases:
athena.data.text_featurizer.VocabularyEnglish vocabulary seperated by space
-
decode(self, ids)¶ Convert a list of ids to a sentence.
-
encode(self, sentence)¶ Convert a sentence to a list of ids, with special tokens added.
-
-
class
athena.data.text_featurizer.SentencePieceFeaturizer(spm_file)¶ TODO: docstring
-
load_model(self, model_file)¶ load model
-
__len__(self)¶
-
encode(self, sentence)¶ Convert a sentence to a list of ids by sentence piece model
-
decode(self, ids)¶ Conver a list of ids to a sentence
-
-
class
athena.data.text_featurizer.TextTokenizer(text=None)¶ Text Tokenizer
-
load_model(self, text)¶ load model
-
__len__(self)¶
-
encode(self, texts)¶ Convert a sentence to a list of ids, with special tokens added.
-
decode(self, sequences)¶ Conver a list of ids to a sentence
-
-
class
athena.data.text_featurizer.TextFeaturizer(config=None)¶ The main text featurizer interface
-
supported_model¶
-
default_config¶
-
model_type¶ the model type
-
unk_index¶ return the unk index
-
load_model(self, model_file)¶ load model
-
delete_punct(self, tokens)¶ delete punctuation tokens
-
__len__(self)¶
-
encode(self, texts)¶ Convert a sentence to a list of ids, with special tokens added.
-
decode(self, sequences)¶ Conver a list of ids to a sentence
-