Introduction

What is Machine Learning

 

Machine learning is an application of Artificial Intelligence where the machine learns itself without human intervention, in order to give accurate predictions. Human intervention is only required to define the problem statement, decide upon the dependent variables and provide data to the machine. Machine learning uses algorithms to learn from historical data and predict some unknown thing. Let's understand this with an example. This example might be a little hypothetical but will give you a clear idea about ML.

Let's say you have a very nice kitten and you are always worried about its safety from dogs. And, it's not possible for you to be around your kitten all the time. Machine learning can be of a great help here. You can build a small AI machine with a camera & speaker. You will feed hundreds of labeled pictures of dogs and other animals to this machine so that machine can learn from these pictures what is a dog and what is not. After machine learns, you can keep this machine at the entrance of your home. Whenever this machine sees a real dog, it will predict, from its past learning of hundreds of pictures, that its a dog and speaker will automatically start making a scary sound to shoo the dog away.

 

We realize it or not, Machine Learning (ML) has become an integral part of every facet of life. Some of its daily life applications are:

  1. Friends suggestion on Facebook - ML predicts who could be close to us and suggests for friendship
  2. Movies suggestion on Netflix - ML predicts what movies we may like to watch, on the basis of our liking & preferences
  3. Products recommendation on e-commerce websites - ML predicts what products we may tend to buy on the basis of our other purchases
  4. Customized offers on credit cards - ML predicts what offers on the credit card can excite us to get the card.

These are just a few of the examples. ML has a profound impact on our lives, and it's increasing day by day.

 

Types of Machine Learning Algorithms

 

1. On the basis of the type of target variable

 

In real life, we try to predict two types of outputs

1. Classification  – In classification, we try to predict a class/category.

Example – predicting whether the picture is of dog or cat.

Algorithms - Logistic Regression, Decision Tree Classifier, Random Forest, SVM, etc

2. Regression – In regression, we predict a real number.

Example – predicting car’s price

Algorithms - Linear Regression, Ridge Regression, Decision Tree Regressor, Lasso Regression, etc

 

The variable which we try to predict is called target/dependent variable.

 

2. On the basis of the presence/absence of the target variable

 

There could be two types of problem statements that we try to solve using a machine learning algorithm. We have a separate chapter to talk about that in detail. Here, we will just introduce them.

1. Supervised

The dog example above is a supervised ML problem. In this case, you provide labeled data(picture's of animals) to the machine, and these labels tell the machine which picture is of dog and which is not. And, then you provide unlabeled data to the machine and ask it to label the data.

Algorithms - Regression Algorithms, Classification Algorithms, etc

2. Unsupervised

If you have a problem statement where you have to categorize data in some groups on the basis of some characteristics, unsupervised algorithms are used for the purpose. Here, you provide just these characteristics to the machine. There is no target variable to predict here. For example, you are starting a credit card company, and you want to launch 3 types of credit cards. In this case, you will provide people's demographic, economic & social information to the machine and ask it to divide people into 3 categories. The machine will do this for you and then you can research more about these groups and add more customized features to these credit cards so that more people opt for them.

Algorithms - All clustering algorithms like K-means clustering, Hierarchical clustering

 

what is a model?

 

These machine learning algorithms actually build a model by learning from data. Model is like an input-output box, and if you provide an observation/data to the model, it will predict some characterstic of the observation. Below is an example of house price prediction. You first provide thousands of houses description & their prices, and algorithm learns from data and creates a model. Creating a model using available data is called training the model. It's like giving some training to someone to perform a task.

Then you give some information about a house to this model, and the model predicts a price for this house.

 

Independent and Target Variable

 

Independent variables are the variables which are not dependent on any other variable. In house pricing example above; number of rooms, number of bathrooms, area of flat, surrounding locality, etc. are independent variables. These are also called features or explanatory variables.

Variables which are dependent on other variables are called dependent or target variables. Independent variables determine the value of these variables. Price of a house is target variable in above example.

 

Performance Metrics

 

It's crucial to measure the performance of a model. There are many metrics available to test the performance of a model. These metrics are used on the basis of the problem statement, the type of target variable. Primarily, all these metrics talk about the difference between predicted value and actual value of an observation. Difference between the two values is called error term

\[ Error = \sum\limits_{i=1}^N\frac{[PV_{i} - AV_{i}]}{N} \]

where

  • N = The total number of observations
  • PV = Predicted Value
  • AV = Actual Value

If the error is beyond an acceptable level, the model needs to be revised. Either some more variables are added, or some more data is used for training.

We will be primarily using python library scikit for machine learning algorithm, because of its richness of tools(modules) which are helpful in the application of machine learning on real-life problems.

Complete and Continue