`athena.models.tacotron2`¶

tacotron2 implementation

Module Contents¶

Classes¶

Tacotron2 An implementation of Tacotron2

class athena.models.tacotron2.Tacotron2(data_descriptions, config=None)¶

Bases: athena.models.base.BaseModel

An implementation of Tacotron2 Reference: NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS

https://arxiv.org/pdf/1712.05884.pdf

default_config¶

_pad_and_reshape(self, outputs, ori_lens, reverse=False)¶

Parameters:

outputs – true labels, shape: [batch, y_steps, feat_dim]
ori_lens – scalar

Returns:

it has to be reshaped to match reduction_factor: shape: [batch, y_steps / reduction_factor, feat_dim * reduction_factor]

Return type:

reshaped_outputs

call(self, samples, training: bool = None)¶: call model

initialize_input_y(self, y)¶

Parameters:	y – the true label, shape: [batch, y_steps, feat_dim]
Returns:	zeros will be padded as one step to the start step shape: [batch, y_steps+1, feat_dim]
Return type:	y0

initialize_states(self, encoder_output, input_length)¶

Parameters:

encoder_output – encoder outputs, shape: [batch, x_step, eunits]
input_length – shape: [batch]

Returns:

initial states of rnns in decoder: [rnn layers, 2, batch, dunits]

prev_attn_weight: initial attention weights, [batch, x_steps] prev_context: initial context, [batch, eunits]

Return type:

prev_rnn_states

time_propagate(self, encoder_output, input_length, prev_y, prev_rnn_states, accum_attn_weight, prev_attn_weight, prev_context, training=False)¶

Parameters:	encoder_output – encoder output (batch, x_steps, eunits). input_length – (batch,) prev_y – one step of true labels or predicted labels (batch, feat_dim). prev_rnn_states – previous rnn states [layers, 2, states] for lstm prev_attn_weight – previous attention weights, shape: [batch, x_steps] prev_context – previous context vector: [batch, attn_dim] training – if it is training mode
Returns:	shape: [batch, feat_dim] logit: shape: [batch, reduction_factor] current_rnn_states: [rnn_layers, 2, batch, dunits] attn_weight: [batch, x_steps]
Return type:	out

get_loss(self, outputs, samples, training=None)¶: get loss

synthesize(self, samples)¶

Synthesize acoustic features from the input texts :param samples: the data source to be synthesized

Returns:	the corresponding synthesized acoustic features attn_weights_stack: the corresponding attention weights
Return type:	after_outs

_synthesize_post_net(self, before_outs, logits_stack)¶

athena.models.tacotron2¶

Module Contents¶

Classes¶

`athena.models.tacotron2`¶