## Descriptive Statistics

Statistics is a science to work on data using mathematical models. It does the following operations on data :

- Collection
- Analysis
- Transform
- Interpret
- Present

There are two types of Statistics:

**Descriptive**This branch of Statistics is used to explore the data. It uses simple functions like mean, mode, etc. for the better understanding of data

**Inferential**This branch of Statistics uses mathematics algorithms/models to infer hidden information in the data

### Descriptive Statistics

Descriptive statistics can be broadly divided into 2 categories

- Measures of Central Tendency - To calculate the center points of data
- Measures of Dispersion - To determine how data is distributed

#### Measures of Central Tendency

**Mean**- Average of data points(sum of data points/number of data points)**Median**- Mid value of data when it's sorted- \(\frac{N + 1}{2}\) th value if N is odd
- Average of \(\frac{N}{2}\)th value and \(\frac{N}{2} + 1\)th value if N is even

**Mode**- Value which occurs the maximum number of times in data

#### Measures of Dispersion

**Range**- It's the difference between maximum value and the minimum value in data- Range = Max - Min

**Quartiles**- Quartiles are the values that divide data in quarters after arranging them in increasing order.- 1st Quartile(Q1) = 25% cut
- 2nd Quartile(Q2) = 50% cut (Also called as
**median**) - 3rd Quartile(Q3) = 75% cut
- InterQuartiles Range(IQR) = Q3 - Q1

**Standard Deviation**- This shows how data is dispersed/spread around mean value. It's calculated in 5 steps- Calculate Mean
- Calculate the difference between each value and mean
- Take the square of each difference and add them
- Divide this sum of squared differences by the number of data points
- Take the square root of the whole quantity
**Skewness**- It shows how data is dispersed when a graph is drawn between the values(x-axis) and their frequencies(y-axis) in data.**Positive skewness**- most of the data is towards left and has a long tail in the right direction**Normal curve**- A normal curve is symmetric around mean and has the same value for mean, median & mode.**Negative skewness**- most of the data is towards the right and has a long tail in the left direction

**Kurtosis**- A measure of the sharpness of the peak when a graph is drawn between the values(x-axis) and their frequencies(y-axis) in data.**Leptokurtic**- Lepto means 'Thin'. A leptokurtic curve has positive kurtosis value.**Mesokurtic/Normal**- Meso means 'middle'. A normal curve has zero(0) kurtosis value.**Platykurtic**- Lepto means 'Platy'. A platykurtic curve has negative kurtosis value.

where:

\(x_i\) : \(i_{th}\) data point

N : number of data points

Let's work on an example here to understand Descriptive Statistics better. Below table has the data of 25 students in a class. Let's try to calculate various statistical measures for this data.

**Mean**

sum of all 25 heights/ 25 = 3439/25= 137.56 cm

**Median**

To calculate the median, we need to arrange them in increasing order. So, let's sort the values in increasing order : 135 135 135 135 136 136 136 136 136 136 137 137 137 137 138 138 138 139 139 139 140 140 141 141 142 Number of data points N is 25(odd) here, so median will be the 13th value median = 137

**Mode **

136 has appeared the maximum number of times, so mode = 136

**Range**

142(max value) - 135(min value) = 7

**Standard Deviation**

[{(135-137.56)^2 + (135-137.56)^2 + (135-137.56)^2 + (135-137.56)^2 + (136-137.56)^2 + (136-137.56)^2 + (136-137.56)^2 + (136-137.56)^2 + (136-137.56)^2 + (136-137.56)^2 + (137-137.56)^2 + (137-137.56)^2 + (137-137.56)^2 + (137-137.56)^2 + (138-137.56)^2 + (138-137.56)^2 + (138-137.56)^2 + (139-137.56)^2 + (139-137.56)^2 + (139-137.56)^2 + (140-137.56)^2 + (140-137.56)^2 + (141-137.56)^2 + (141-137.56)^2 + (142-137.56)^2}/25]^(1/2) = 2.08

Fortunately, pandas provides us with inbuilt functions to calculate descriptive statistics. Let's solve the same example using pandas functions.

- Create a dataframe
*olympicsMedalTally*of top 10 countries of 2016 Olympics medal table as below - Calculate all Measures of Central Tendency.
- Calculate all Measures of Dispersion.