athena.loss

some losses

Module Contents

Classes

CTCLoss CTC LOSS
Seq2SeqSparseCategoricalCrossentropy Seq2SeqSparseCategoricalCrossentropy LOSS
MPCLoss MPC LOSS
Tacotron2Loss Tacotron2 Loss
GuidedAttentionLoss
GuidedMultiHeadAttentionLoss Guided multihead attention loss function module for multi head attention.
FastSpeechLoss used for training of fastspeech
SoftmaxLoss Softmax Loss
AMSoftmaxLoss Additive Margin Softmax Loss
AAMSoftmaxLoss Additive Angular Margin Softmax Loss
ProtoLoss Prototypical Loss
AngleProtoLoss Angular Prototypical Loss
GE2ELoss Generalized End-to-end Loss
class athena.loss.CTCLoss(logits_time_major=False, blank_index=-1, name='CTCLoss')

Bases: tensorflow.keras.losses.Loss

CTC LOSS CTC LOSS implemented with Tensorflow

__call__(self, logits, samples, logit_length=None)
class athena.loss.Seq2SeqSparseCategoricalCrossentropy(num_classes, eos=-1, by_token=False, by_sequence=True, from_logits=True, label_smoothing=0.0)

Bases: tensorflow.keras.losses.CategoricalCrossentropy

Seq2SeqSparseCategoricalCrossentropy LOSS CategoricalCrossentropy calculated at each character for each sequence in a batch

__call__(self, logits, samples, logit_length=None)
class athena.loss.MPCLoss(name='MPCLoss')

Bases: tensorflow.keras.losses.Loss

MPC LOSS L1 loss for each masked acoustic features in a batch

__call__(self, logits, samples, logit_length=None)
class athena.loss.Tacotron2Loss(model, guided_attn_loss_function, regularization_weight=0.0, l1_loss_weight=0.0, mask_decoder=False, pos_weight=1.0, name='Tacotron2Loss')

Bases: tensorflow.keras.losses.Loss

Tacotron2 Loss

__call__(self, outputs, samples, logit_length=None)
Parameters:outputs – contain elements below:

att_ws_stack: shape: [batch, y_steps, x_steps]

class athena.loss.GuidedAttentionLoss(guided_attn_weight, reduction_factor, attn_sigma=0.4, name='GuidedAttentionLoss')

Bases: tensorflow.keras.losses.Loss

__call__(self, att_ws_stack, samples)
_create_attention_masks(self, input_length, output_length)

masks created by attention location

Parameters:
  • input_length – shape: [batch_size]
  • output_length – shape: [batch_size]
Returns:

shape: [batch_size, 1, y_steps, x_steps]

Return type:

masks

_create_length_masks(self, input_length, output_length)

masks created by input and output length

Parameters:
  • input_length – shape: [batch_size]
  • output_length – shape: [batch_size]
Returns:

shape: [batch_size, 1, output_length, input_length]

Return type:

masks

Examples

output_length: [6, 8] input_length: [3, 5] masks:

[[[1, 1, 1, 0, 0],
[1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]]
class athena.loss.GuidedMultiHeadAttentionLoss(guided_attn_weight, reduction_factor, attn_sigma=0.4, num_heads=2, num_layers=2, name='GuidedMultiHeadAttentionLoss')

Bases: athena.loss.GuidedAttentionLoss

Guided multihead attention loss function module for multi head attention.

__call__(self, att_ws_stack, samples)
class athena.loss.FastSpeechLoss(duration_predictor_loss_weight, eps=1.0, use_mask=True, teacher_guide=False)

Bases: tensorflow.keras.losses.Loss

used for training of fastspeech

__call__(self, outputs, samples)

Its corresponding log value is calculated to make it Gaussian. :param outputs: it contains four elements:

before_outs: outputs before postnet, shape: [batch, y_steps, feat_dim] teacher_outs: teacher outputs, shape: [batch, y_steps, feat_dim] after_outs: outputs after postnet, shape: [batch, y_steps, feat_dim] duration_sequences: duration predictions from teacher model, shape: [batch, x_steps] pred_duration_sequences: duration predictions from trained predictor

shape: [batch, x_steps]
Parameters:samples – samples from dataset
class athena.loss.SoftmaxLoss(embedding_size, num_classes, name='SoftmaxLoss')

Bases: tensorflow.keras.losses.Loss

Softmax Loss Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(self, outputs, samples, logit_length=None)
class athena.loss.AMSoftmaxLoss(embedding_size, num_classes, m=0.3, s=15, name='AMSoftmaxLoss')

Bases: tensorflow.keras.losses.Loss

Additive Margin Softmax Loss Reference to paper “CosFace: Large Margin Cosine Loss for Deep Face Recognition”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(self, outputs, samples, logit_length=None)
class athena.loss.AAMSoftmaxLoss(embedding_size, num_classes, m=0.3, s=15, easy_margin=False, name='AAMSoftmaxLoss')

Bases: tensorflow.keras.losses.Loss

Additive Angular Margin Softmax Loss Reference to paper “ArcFace: Additive Angular Margin Loss for Deep Face Recognition”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(self, outputs, samples, logit_length=None)
class athena.loss.ProtoLoss(name='ProtoLoss')

Bases: tensorflow.keras.losses.Loss

Prototypical Loss Reference to paper “Prototypical Networks for Few-shot Learning”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(self, outputs, samples=None, logit_length=None)
Parameters:outputs – [batch_size, num_speaker_utts, embedding_size]
class athena.loss.AngleProtoLoss(init_w=10.0, init_b=-5.0, name='AngleProtoLoss')

Bases: tensorflow.keras.losses.Loss

Angular Prototypical Loss Reference to paper “In defence of metric learning for speaker recognition” Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(self, outputs, samples=None, logit_length=None)
Parameters:outputs – [batch_size, num_speaker_utts, embedding_size]
class athena.loss.GE2ELoss(init_w=10.0, init_b=-5.0, name='GE2ELoss')

Bases: tensorflow.keras.losses.Loss

Generalized End-to-end Loss Reference to paper “Generalized End-to-end Loss for Speaker Verification”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(self, outputs, samples=None, logit_length=None)
Parameters:outputs – [batch_size, num_speaker_utts, embedding_size]