Choosing the Best Machine Learning Algorithm: A Performance-Based Guide
Roll Out the Red Carpet for Machine Learning Models
Step right up, folks!
In this data-driven circus, we’ve got a thrilling lineup of Machine Learning models waiting to show off their impressive skills. Yes, you heard it right — from logistic regression to random forest, from support vector machines to K-nearest neighbors, Gaussian Naive Bayes, and even the decision tree classifier, we’ve got them all!
So, buckle up and keep your popcorn ready, we are about to see a high-octane race for accuracy!
Imagine this as a scene from “The Fast and the Furious,” but instead of cars, our contenders are algorithms
Our star-studded cast of models is as follows:
models = [LogisticRegression(), RandomForestClassifier(), SVC(), KNeighborsClassifier(), GaussianNB(), DecisionTreeClassifier()]
These heavy hitters of the machine learning world are eager to prove their mettle on our training dataset, X_train
.
In the immortal words of Dominic Toretto, "I live my life a quarter mile at a time." Our models echo his sentiment as they gear up for their moment in the spotlight:
# Train the models
for model in models:
model.fit(X_train, y_train)
Just like how Dominic’s crew fine-tunes their cars for the big race, our models fine-tune themselves based on the training data. But the real test comes when they’re pushed to their limits with unseen data — the test set, X_test
.
scores = []
for model in models:
score = accuracy_score(model.predict(X_test), y_test)
scores.append(score)
This part of the code is where the rubber meets the road. Each model makes predictions on the test set, and these predictions are compared with the actual values y_test
.
The accuracy score is like the race time – the higher the score, the better the model's performance.
However, in the words of Queen from “The Highlander” — “There can be only one!” — we need to find the top performer. But before that, we line them all up for the big reveal:
# Create a table
table = pd.DataFrame({
"Model": [model.__class__.__name__ for model in models],
"Accuracy": scores
})
# Print the table
print(table.to_string())
Think of it as the nominations announcement at the Oscars. Each model, with its corresponding accuracy score, is presented in a neat DataFrame for everyone to see. But the big question is — who will take home the trophy?
# Print the best model
best_model = models[np.argmax(scores)]
print("\n The best model is:", best_model.__class__.__name__, "with an accuracy of", scores[np.argmax(scores)])
As I used the Titanic dataset from Seaborn, my output was:
Model Accuracy
0 LogisticRegression 0.802239
1 RandomForestClassifier 0.839552
2 SVC 0.679104
3 KNeighborsClassifier 0.720149
4 GaussianNB 0.809701
5 DecisionTreeClassifier 0.772388
The best model is: RandomForestClassifier with an accuracy of 0.8395522388059702
And the Oscar goes to RandomForestClassifier the model with the highest accuracy score!
It’s just like that nerve-wracking moment when the envelope is opened, and the winner’s name is announced. Our code does exactly that by identifying the model with the maximum accuracy score using np.argmax(scores)
. The winner basks in the glory of its triumph, ready to be deployed to tackle real-world problems!
Remember, every race, every dataset, every prediction can have a different winner. So keep fine-tuning those engines, keep pushing those boundaries, and keep making those predictions.
After all, in the grand scheme of machine learning, we’re all winners in the race for accuracy.
subscribe to read for the upcoming blog on: