Optimizers


Usage of optimizers

An optimizer is one of the two arguments required for compiling a model.

Scala:

model.compile(loss = "mean_squared_error", optimizer = "sgd")

Python:

model.compile(loss='mean_squared_error', optimizer='sgd')

Scala:

model.compile(loss = "mean_squared_error", optimizer = Adam())

Python:

model.compile(loss='mean_squared_error', optimizer=Adam())

Available optimizers

SGD

A plain implementation of SGD which provides optimize method. After setting optimization method when create Optimize, Optimize will call optimization method at the end of each iteration.

Scala:

val optimMethod = SGD(learningRate = 1e-3, learningRateDecay = 0.0, 
                      weightDecay = 0.0, momentum = 0.0, dampening = Double.MaxValue, 
                      nesterov = false, learningRateSchedule = Default(), 
                      learningRates = null, weightDecays = null)

Parameters:

Python:

optim_method = SGD(learningrate=1e-3, learningrate_decay=0.0, weightdecay=0.0, 
                   momentum=0.0, dampening=DOUBLEMAX, nesterov=False, 
                   leaningrate_schedule=None, learningrates=None, 
                   weightdecays=None)

Parameters:

Adam

An implementation of Adam optimization, first-order gradient-based optimization of stochastic objective functions. http://arxiv.org/pdf/1412.6980.pdf

Scala:

val optimMethod = new Adam(learningRate = 1e-3, learningRateDecay = 0.0, beta1 = 0.9, beta2 = 0.999, Epsilon = 1e-8)

Parameters:

Python:

optim_method = Adam(learningrate=1e-3, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-8)

Parameters:

Adamax

An implementation of Adamax: http://arxiv.org/pdf/1412.6980.pdf

Scala:

val optimMethod = new Adamax(learningRate = 0.002, beta1 = 0.9, beta2 = 0.999, Epsilon = 1e-8)

Parameters:

Python:

optim_method = Adam(learningrate=0.002, beta1=0.9, beta2=0.999, epsilon=1e-8)

Parameters:

Adadelta

AdaDelta implementation for SGD It has been proposed in ADADELTA: An Adaptive Learning Rate Method. http://arxiv.org/abs/1212.5701.

Scala:

val optimMethod = Adadelta(decayRate = 0.9, Epsilon = 1e-10)

Parameters:

Python:

optim_method = AdaDelta(decayrate=0.9, epsilon=1e-10)

Parameters:

Adagrad

An implementation of Adagrad. See the original paper: http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

Scala:

val optimMethod = new Adagrad(learningRate = 1e-3, learningRateDecay = 0.0, weightDecay = 0.0)

Python:

optim_method = Adagrad(learningrate=1e-3, learningrate_decay=0.0, weightdecay=0.0)

Parameters:

Rmsprop

An implementation of RMSprop (Reference: http://arxiv.org/pdf/1308.0850v5.pdf, Sec 4.2)

Scala:

val optimMethod = new RMSprop(learningRate = 0.002, learningRateDecay = 0.0, decayRate = 0.99, Epsilon = 1e-8)

Parameters:

Python:

optim_method = RMSprop(learningrate=0.002, learningrate_decay=0.0, decayrate=0.99, epsilon=1e-8)

Parameters: