Statistics with Python

Upgrade Your Skills, Upgrade Your Career - Learn more

The statistics module is built in python, and we can use it to calculate statistics of any numeric data. In this tutorial, we will learn more about this module.

What is Statistics?

Statistics is a branch of mathematics that deals with numerical data representation. It manipulates, tabulates and interprets the data to draw conclusions from the data. Based on these conclusions, we can decide the impact of any business decision on the company.

Understanding descriptive statistics

Descriptive statistics sum up the complete dataset. The data set can represent the entire dataset or just a part of the population. Descriptive statistics are divided into mean, median and mode, known as the measures of central tendency. Measures of variable tendency include deviation and variance.

Measures of central tendency explain the central values in the data set, and the measures of variable tendency describe how the data is spread in the data set.

Descriptive statistics are broadly classified into two types:

  • Measures of central tendency
  • Measures of variability

Measures of central tendency

These measures primarily focus on the middle or central values in your data. However, the measures also use graphs, visuals and pictorial representations to understand and give knowledge about the data to the users.

We start by calculating the frequency of each point in the distribution and describe it with the help of mean, mode and median.

Calculating mean and median using Python Pandas

We calculate the mean and median with the help of the pandas library:

We can write the following piece of code:

import pandas as pd

[{"metadata":{"trusted":true},"cell_type":"code","source":"d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',\n   'Lee','Chanchal','Gasper','Naviya','Andres']),\n   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),\n   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}","execution_count":3,"outputs":[]}]

df = pd.DataFrame(d)

print("Mean Values in the Distribution")
print(df.mean())
print("*******************************")
print("Median Values in the Distribution")
print(df.median())

Output

Mean Values in the Distribution

Age             31.4333

Rating        3.74

dtype           float64

*******************************

Median Values in the Distribution

Age             29.50

Rating        3.79

dtype           float64

Calculating mode

The value that appears the most in your given data is defined as mode. It (mode()) is an in-built function in python that prints the mode or the most commonly occurring value within the dataset. Consider the following example:

Import statistics 
set1 =[6, 6, 6, 3, 6, 4, 6, 5, 5, 6]
print(statistics.mode(set1))

Output

6

Consider another example:

Consider another example:

import pandas as pd

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','Chanchal','Gasper','Naviya','Andres']),
   'Age':pd.Series([25,26,25,23,30,25,23,34,40,30,25,46])}
#Create a DataFrame
df = pd.DataFrame(d)

print(df.mode())

Output

Name                               Age

0                       Andres                              25.0

1                        Chanchal                         Null

2                        Gasper                             Null

3                         Jack                                 Null

4                        James                               Null

5                         Lee                                   Null

6                         Naviya                             Null

7                         Ricky                               Null

8                         Smith                               Null

9                         Steve                               Null

10                       Tom                                Null

11                       Vin                                   Null

Measures of variability

These measures help us understand the distribution and dispersion of the given data.

The most used measures of variability are:

  • Range
  • Variance
  • Standard deviation

For example, if the average of the given data lies between 55 and 60, the data can be between 1 and 100. Hence the measures of variability help us understand how the data is spread.

1. variance()

The variance is calculated by subtracting each data point in the dataset from the given average and squaring the answer. Finally, dividing this squared value by the number of data points provides us with the variance.

We use this when our sample dataset is a population measure.

Example:

import statistics as st
nums=[1,2,3,5,7,9]
st.variance(nums)

Output:

9.5

2. Standard deviation

The square root of the standard deviation is variance. We saw how to calculate the variance in the above code.

In the statistics library in python, the stdev() method calculates the standard deviation of the given dataset.

Example:

import statistics as st
nums=[1,2,3,5,7,9]
st.stdev(nums)

Output:

3.082207001484488

3. Range

The range indicates the difference between the highest and smallest value in the data. It is directly proportional to the spread of data, which means the larger the range, the bigger the data is spread.

range= highest value in the dataset – smallest value in the dataset

In addition, you can find the max and min values using the max() and min() functions in python.

Example:

arr = [1, 2, 3, 4, 5]
 
Maximum = max(arr)

Minimum = min(arr)

Range = Maximum-Minimum    
print("Maximum = {}, Minimum = {} and Range = {}".format(
    Maximum, Minimum, Range))

Output:

Maximum = 5, Minimum = 1 and Range = 4

Summary

This was all about Statistics with Python. Hope you liked it.

Did you like our efforts? If Yes, please give PythonGeeks 5 Stars on Google | Facebook

PythonGeeks Team

The PythonGeeks Team offers industry-relevant Python programming tutorials, from web development to AI, ML and Data Science. With a focus on simplicity, we help learners of all backgrounds build their coding skills.

Leave a Reply

Your email address will not be published. Required fields are marked *