xxxxxxxxxx
>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)
xxxxxxxxxx
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.
xxxxxxxxxx
Your model would never beat the real world!!!! That is reality.
So, when we create a model, the model produces a slighty different generalization
error on each run and no two model is same / produces exact result.
In order to counter this problem, we can do better through approximation.
The idea is,
1) divide the whole dataset into n equal subsets,
2) train on (n-1) sets and test on left-over set.
3) do step 2 for n times (never use the same set for testing)
4) At the end, use average result of all models' estimate
We can reach a consolidated result that is better than any other model.
This is n-fold (or k-fold) cross validation.
It is quite similar to how you use bootstrapping.
xxxxxxxxxx
# Import the necessary modules
from sklearn.model_selection import KFold, cross_val_score
# Create a KFold object
kf = KFold(n_splits=6, shuffle=True, random_state=5)
reg = LinearRegression()
# Compute 6-fold cross-validation scores
cv_scores = cross_val_score(reg, X, y, cv=kf)
# Print scores
print(cv_scores)
# Print the mean
print(np.mean(cv_results))
# Print the standard deviation
print(np.std(cv_results))
# Print the 95% confidence interval
print(np.quantile(cv_results, [0.025, 0.975]))
xxxxxxxxxx
>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores
array([ 0.96 , 1. , 0.96 , 0.96 , 1. ])
xxxxxxxxxx
>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)
xxxxxxxxxx
>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)