Analyzing Correlations with Python: Correlation Grid

Ed
3 min readJan 9, 2020

I’m going to make a Grid with the correlations of the most active futures in the US. Full code here.

1. First steps

Setting up a Python development environment

Pipenv automatically creates and manages a virtualenv for your projects, as well as adds/removes packages as you install/uninstall packages. More info about pipenv here.

$ pip install pipenv

Once it’s installed we run:

$ pipenv shell

This will create a file called Pipfile with all the info about the packages and versions of the local environment.

Installing packages

$ pipenv install pandas matplotlib quandl

Once everything is installed, create a new Python file and import the following packages:

Creating a Quandl account

Go to Quandl.com and create a free account. You can do this tutorial without this step although it’s worth it:

“Anonymous users have a limit of 20 calls per 10 minutes and 50 calls per day while Authenticated users have a limit of 300 calls per 10 seconds, 2,000 calls per 10 minutes and a limit of 50,000 calls per day.”- Source.

Once you have created your account copy the API key and paste it right below the imports. I have also added the Quandl codes for the futures that I’m going to use.

2. Creating the database to store our data

Creating the database

The following function will create one table per future. To check everything is correct, uncomment the lines as specified in the code below:

n tables:  [(19,)]
n futures: 19

Inserting the data into each table

Now it’s time to insert data for each instrument into the database. I have downloaded 20 years of data. From each future, I’m storing: Date, Open, High, Low, Close(Settle).

3. Compiling DataFrames

Read data from the database into a pandas DataFrame. The correlations will be run over the Close of each instrument so discard the remaining columns and add each close to the compiled DataFrame that will be used to visualize the correlations.

The result should be a DataFrame with a column per SQL-table(future)

4. Visualizing Data

Take the compiled DataFrame and run the correlations of the % Change. Then format the title and the labels.

I chose Matplotlib for the visualization and Red-Yellow-Green for the heatmap, feel free to choose any other combination here.

Run the code

If it’s the first time you execute the code, uncomment the first two lines to create the database and insert data.

The result looks like this:

--

--