alpha:adam: the learning rate for the optimiser
beta1:adam: the exponentially moving average weight for the first moment
beta2:adam: the exponentially moving average weight for the second moment
eps:adam: the value to avoid division by zero
init:how to initialise weights in the first iteration
L2:how much L2 regularisation to apply to the weights (0 means none)
gradclip:perform gradient norm clipping at the given threshold
