athena.utils.learning_rate¶
learning rate
Module Contents¶
Classes¶
WarmUpLearningSchedule |
WarmUp Learning rate schedule for Adam |
WarmUpAdam |
WarmUpAdam Implementation |
ExponentialDecayLearningRateSchedule |
ExponentialDecayLearningRateSchedule |
ExponentialDecayAdam |
WarmUpAdam Implementation |
-
class
athena.utils.learning_rate.WarmUpLearningSchedule(model_dim=512, warmup_steps=4000, k=1.0, decay_steps=99999999, decay_rate=1.0)¶ Bases:
tensorflow.keras.optimizers.schedules.LearningRateScheduleWarmUp Learning rate schedule for Adam
- Used as :
- optimizer = tf.keras.optimizers.Adam(learning_rate = WarmUpLearningSchedule(512),
- beta_1=0.9, beta_2=0.98, epsilon=1e-9)
- Args :
- model_dim is the something related to total model parameters warmup_steps is the highest learning rate iters
Returns: return the learning rate Idea from the paper: Attention Is All You Need
-
__call__(self, step)¶
-
class
athena.utils.learning_rate.WarmUpAdam(config=None, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, name='WarmUpAdam', **kwargs)¶ Bases:
tensorflow.keras.optimizers.AdamWarmUpAdam Implementation
-
default_config¶
-
-
class
athena.utils.learning_rate.ExponentialDecayLearningRateSchedule(initial_lr=0.005, decay_steps=10000, decay_rate=0.5, start_decay_steps=30000, final_lr=1e-05)¶ Bases:
tensorflow.keras.optimizers.schedules.LearningRateScheduleExponentialDecayLearningRateSchedule
- Used as :
- optimizer = tf.keras.optimizers.Adam( learning_rate = ExponentialDecayLearningRate(0.01, 100))
- Args :
- initial_lr, decay_steps
Returns: initial_lr * (0.5 ** (step // decay_steps)) -
__call__(self, step)¶