xxxxxxxxxx
Causal meaning in NLP: can't look into the future
This type of RNN makes sense when forecasting time series,
or in the decoder of a sequence-to-sequence (seq2seq) model. But for
tasks like text-classification, or in the encoder of a seq2seq model,
it is often preferable to lookahead at the next words before encoding a
given word.