I learned just recently about new feature what Azure Machine Learning holds – Auto Machine Learning (AutoML). According to Microsoft’s documentation:
”Traditional machine learning model development is resource-intensive, requiring significant domain knowledge and time to produce and compare dozens of models. Apply automated ML when you want Azure Machine Learning to train and tune a model for you using the target metric you specify. The service then iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score. The higher the score, the better the model is considered to ”fit” your data.”
Sounds good! I don’t have such a vast data science experience, so this kind of automated experimentation is just for people like me. There is no need for time consuming experimentation, because I don’t even know most of the algorithms.
I have created two machine learning models to predict Finnish ice hockey league results. Other one using neural networks and other is using Poisson regression. Latter does not work almost always giving same results. Maybe it’s because I use only current season data – too little data. This is excellent opportunity to find out if there is a better way to predict these results.
First I needed to create new Azure Machine Learning workspace. Note, that AutoML is only available in Enterprise version. This Azure Machine Learning is different service than what I have used before. Older one seems to be called nowadays Machine Learning Studio.
First I uploaded my already created and used dataset. There is basically date, team, opponent team, home or away game boolean and goals (don’t mind that ”V” letter). Note, model can predict only one column, in my case it’s goals. That’s why I have generated two rows from every game to indicate result for home team and away team. Goals are not exact goals, but I have adjusted those with couple other factors to indicate ”expected” goals value.
User Interface is quite similar to old one, so it was quite easy to get started. After uploading dataset, I selected Automated ML.
I select my dataset.
I gave a name for my experiment and select column to be predicted. Then I had to create new compute node for my experimentation. I didn’t check prices, I just selected smallest one, because my dataset is really small.
As a type I selected Regression, which is used to predict numeric values. Below you can see other options.
After that everything was ready to run. I would say that quite easy!
The whole Auto ML execution took something like half an hour and here are the results and the winner is VotingEnsemble! I need to find out a little bit theory about that,
Altogether 58 different calculations! Best model’s Spearman correlation value was 0.3419 and worst had 0.0094. Honestly I don’t what that means, but higher the better :). And here are some metrics about my best model. Basically model predicts with 0,5 goals accuracy. Root mean squared error is also quite meaningful value. My best model had 0.65056 and 29th 0.67139. Hmm, no big difference, but that’s why it’s good that machine makes model evaluations, because I don’t have enough understanding yet.
About Voting: ”You can train your model using diverse algorithms and then ensemble them to predict the final output. Say, you use a Random Forest Classifier, SVM Classifier, Linear Regression etc.; models are pitted against each other and selected upon best performance by voting using the VotingClassifier
Class from sklearn.ensemble
.” https://towardsdatascience.com/ensemble-learning-in-machine-learning-getting-started-4ed85eb38e00
Finally I deployed my best model just by pressing a button. My next task is to learn how can I take the best model into production use. It still involves quite many steps, because e.g. I want to retrain the model after every game day. Maybe I will write a separate blog post about that.
I tried to generate predictions for the next round, but it seems there is no manual interface for this. Something to do for the next time.