Machine Learning is an incredible useful technology that is being integrated into systems at an incredible rate, especially by the likes of large organizations such as Facebook, Amazon & Google. A Machine Learning System orders your Facebook feed, recommends products to you on Amazon, sorts your spam on Gmail and decides what songs to put in your Spotify Daily Mix.
In this blog I am going to discuss How a Machine Learning algorithm works in theory and I’m also going to discuss an example of a full stack application with a Machine Learning Component, which is hosted on Microsoft Azure.
How does Machine Learning work?
The first step is gathering some training data (Dataset). Before a Machine Learning algorithm can start making predictions it first must be trained on a dataset of prior examples. E.G. An Algorithm designed to predict house prices must be trained on a list of real-world house prices and corresponding house features. The more examples that the algorithm can be trained on the more accurate the predictions become.
Data is the most important part of a machine learning system, thus 80% of the work in setting up such a system involves data processing and data preparation. The questions that must be answered are:
- Where is the data I need?
- How do I get it?
- How do I deal with missing features?
- How do I deal with extreme exceptions?
The answers to the above questions depend on your specific scenario. Once they are answered and the dataset has been prepared, training can begin.
Each feature in the Dataset is assigned a coefficient (or weight) and an equation is constructed.
(a x feature1) + (b x feature2) + (y x feature3) …=result
The values of the features for each example in the dataset is inserted into the equation, if the examples result doesn’t match the equation result then the coefficients are adjusted. The size of the adjustment depends on the Machine Learning Model Selected.
Model Selection involves training several different machine learning models on a subset of the Dataset, the accuracy of each model is determined. The model with the highest accuracy is selected and is then trained using the entire dataset.
Once the training is complete the model is ready to start making predictions.
Now that the theory is out of the way, I’ll explain the structure of a full stack application with a Machine Learning Component hosted on Microsoft’s Azure Platform.
The problem that I set out to solve using Machine Learning was:
How do I predict the outcome of UFC fights?
The first step of the project was Data Gathering, I created a web scraper that would gather the data required. The scraper is executed by an Azure Function weekly; it gathers fight results and fighter statistics. The data is stored in an Azure Database.
I then used Azure’s Machine Learning Studio to create the Machine Learning Model. I linked the Database to the ML Studio using a dropdown menu, I then provided an SQL Query to create my Dataset.
Now that the data is ready, the next step is Models Selection. Azure can select a model for you, or you can upload your own model. I asked Azure to decide which model to use, the only thing I had to do was specify the amount of time Azure was to search for the best model. After 3 hours, Azure presented me with 30 models and their respective accuracies. I selected the most accurate model and published the model. Azure then created a REST API, through which I could access the model.
I then created a Node Application, also hosted on Azure, which displayed a HTML form. The form asks for two fighter names. Once the names are entered, the node app queries the Azure Database for the relevant fighter statistics and then sends a Post Request, with the stats, to the REST API. The response from the API contains the predicted winner of the fight. The name of the predicted winner is then displayed on the HTML Page.
I gained significant insight into the data, especially the features that Azure deemed the most significant in predicting a victory. Typical fighting knowledge says the age, reach, and height are the most important features however these features were determined to be the least important. The most important features were Strikes Landed per Minute and Takedown Defence %.
In conclusion, Machine Learning is an incredibly valuable tool which if implemented correctly can:
- Allow systems to make complex decisions.
- Categorize complex data (including images).
- Provide significant data insights.