Exponential penalty to the length that is used with beam-based generation.
It is applied as an exponent to the sequence length, which in turn is used
to divide the score of the sequence. Since the score is the log likelihood
of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences,
while length_penalty < 0.0 encourages shorter sequences.
length_penalty: float | None = None