athena.models.tacotron2¶
tacotron2 implementation
Module Contents¶
Classes¶
Tacotron2 |
An implementation of Tacotron2 |
-
class
athena.models.tacotron2.Tacotron2(data_descriptions, config=None)¶ Bases:
athena.models.base.BaseModelAn implementation of Tacotron2 Reference: NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS
-
default_config¶
-
_pad_and_reshape(self, outputs, ori_lens, reverse=False)¶ Parameters: - outputs – true labels, shape: [batch, y_steps, feat_dim]
- ori_lens – scalar
Returns: - it has to be reshaped to match reduction_factor
shape: [batch, y_steps / reduction_factor, feat_dim * reduction_factor]
Return type: reshaped_outputs
-
call(self, samples, training: bool = None)¶ call model
-
initialize_input_y(self, y)¶ Parameters: y – the true label, shape: [batch, y_steps, feat_dim] Returns: - zeros will be padded as one step to the start step
- shape: [batch, y_steps+1, feat_dim]
Return type: y0
-
initialize_states(self, encoder_output, input_length)¶ Parameters: - encoder_output – encoder outputs, shape: [batch, x_step, eunits]
- input_length – shape: [batch]
Returns: - initial states of rnns in decoder
[rnn layers, 2, batch, dunits]
prev_attn_weight: initial attention weights, [batch, x_steps] prev_context: initial context, [batch, eunits]
Return type: prev_rnn_states
-
time_propagate(self, encoder_output, input_length, prev_y, prev_rnn_states, accum_attn_weight, prev_attn_weight, prev_context, training=False)¶ Parameters: - encoder_output – encoder output (batch, x_steps, eunits).
- input_length – (batch,)
- prev_y – one step of true labels or predicted labels (batch, feat_dim).
- prev_rnn_states – previous rnn states [layers, 2, states] for lstm
- prev_attn_weight – previous attention weights, shape: [batch, x_steps]
- prev_context – previous context vector: [batch, attn_dim]
- training – if it is training mode
Returns: shape: [batch, feat_dim] logit: shape: [batch, reduction_factor] current_rnn_states: [rnn_layers, 2, batch, dunits] attn_weight: [batch, x_steps]
Return type: out
-
get_loss(self, outputs, samples, training=None)¶ get loss
-
synthesize(self, samples)¶ Synthesize acoustic features from the input texts :param samples: the data source to be synthesized
Returns: the corresponding synthesized acoustic features attn_weights_stack: the corresponding attention weights Return type: after_outs
-
_synthesize_post_net(self, before_outs, logits_stack)¶
-