Pipeline is a set of tasks happening in sequence, where the output
of a task becomes the input of the next one, until it outputs the
final product at the end. The nice thing about pipelines is that they
make our data preparation faster.
The purpose of a pipeline is to chain multiple steps together, where
each step in the pipeline typically applies a specific transformation
to the data.
By using a pipeline, we can ensure that the same preprocessing steps are
applied to both the training and test sets in the same way, avoiding data
leakage or inconsistencies.
Once we have defined our pipeline, we can fit it to our training data &
use it to make predictions on new, unseen data.