DATA MUNGING
DATA CLEANING PYTHON
MACHINE LEARNING RECIPES
PANDAS CHEATSHEET
ALL TAGS
# How to determine Pearsons correlation in Python?

This recipe helps you determine Pearsons correlation in Python

Pearson"s correlation is very important statical data that we need many times. We can calculate it manually but it takes time.

So this is the recipe on how we can determine Pearson"s correlation in Python

```
import matplotlib.pyplot as plt
import statistics as stats
import pandas as pd
import random
import seaborn as sns
```

We have imported stats, seaborn and pandas which is needed.

We have created a empty dataframe and then added rows to it with random numbers.
```
df = pd.DataFrame()
df["x"] = random.sample(range(1, 100), 75)
df["y"] = random.sample(range(1, 100), 75)
print(); print(df.head())
```

We hawe defined a function with differnt steps that we will see.

- We have calculated mean and standard deviation of x and length of x
- We atre calculating mean and standard deviation of y
- We are calculating standard score by dividing difference of observation and mean with standard deviation. We have done this for both X and Y

```
def pearson(x,y):
n = len(x)
standard_score_x = []; standard_score_y = [];
mean_x = stats.mean(x)
standard_deviation_x = stats.stdev(x)
```

```
mean_y = stats.mean(y)
standard_deviation_y = stats.stdev(y)
```

```
for observation in x:
standard_score_x.append((observation - mean_x)/standard_deviation_x)
for observation in y:
standard_score_y.append((observation - mean_y)/standard_deviation_y)
return (sum([i*j for i,j in zip(standard_score_x, standard_score_y)]))/(n-1)
```

```
result = pearson(df.x, df.y)
print()
print("Pearson"s correlation coefficient is: ", result)
sns.lmplot("x", "y", data=df, fit_reg=True)
plt.show()
```

x y 0 96 62 1 1 81 2 27 73 3 55 26 4 83 93 Pearson"s correlation coefficient is: -0.006387074440361877

**
Download Materials
**

Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

We all at some point in time wished to create our own language as a child! But what if certain words always cooccur with another in a corpus? Thus you can make your own model which will understand which word goes with which one, which words are often coming together etc. This all can be done by building a custom embeddings model which we create in this project

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.