Friday, 30 January 2015

How to estimate probability density function from sample data with Python

Suppose you have a sample of your data, maybe even a large sample, and you want to draw some conclusions based on its probability density function. Well, assuming the data is normally distributed, a basic thing to do is to estimate mean and standard deviation, since to fit a normal distribution those two are the only parameters you need.

However, sometimes you might not be that happy with the fitting results. That might be because your sample does not looks exactly bell shaped and you are wondering what would happened if the simulation you ran had taken this fact into account.

For instance, take stock returns, we know they are not normally distributed furthermore there is the “fat tails” problem to take into account. At least it would be interesting estimate a probability density function and then compare it to the parametric pdf you used before.

Here below is a simple example of how I estimated the pdf of a random variable using gaussian_kde from scipy

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats.kde import gaussian_kde
from scipy.stats import norm
plt.style.use("ggplot")
file = open("data.txt","r")
data = file.read().split("\n")
dataSplitted = data[6].split(",")
file.close()
d1 = []
for i in dataSplitted[5:2000]:
d1.append(float(i))
tick = dataSplitted[0]
mu = float(dataSplitted[1])
stdv = float(dataSplitted[2])
d1_np = np.array(d1)
# Estimating the pdf and plotting
KDEpdf = gaussian_kde(d1_np)
x = np.linspace(-1.5,1.5,1500)
plt.plot(x,KDEpdf(x),'r',label="KDE estimation",color="blue")
plt.hist(d1_np,normed=1,color="cyan",alpha=.8)
plt.plot(x,norm.pdf(x,mu,stdv),label="parametric distribution",color="red")
plt.legend()
plt.title("Returns: parametric and estimated pdf")
plt.show()
view raw KDE_example1.py hosted with ❤ by GitHub

and here is the plot that we get


figure_1


What we can see is that the estimated pdf is “more dense” around the mean and has some more density on the tails.


Let’s use this data to simulate a sample

# Calculate cumulative probability by using the integrate_box method
# Returns .50
print("Area below the curve from 0 to 10:",KDEpdf.integrate_box(0,10))
# Returns 1
print("Area below the curve from -10 to 10:",KDEpdf.integrate_box(-10,10))
# Finally this method randomly samples a dataset from the estimated pdf
print("Random simulated dataset from the estimated pdf",KDEpdf.resample(size=10)[0])
# Let's plot the actual sample and the random sample
plt.plot(KDEpdf.resample(size=100)[0],label="KDE returns")
plt.plot(d1_np[:100],label="Actual returns")
plt.title("Simulated returns from KDE estimation compared to actual return")
plt.legend()
plt.show()
view raw KDE_example2.py hosted with ❤ by GitHub

For sure in this randomly generated sample there are some extreme values and they look close to the actual sample.


figure_1__


Hope this was interesting and useful.


 




Disclaimer
This article is for educational purpose only. The author is not responsible for any consequence or loss due to inappropriate use. The article may well contain mistakes and errors. The numbers used might not be accurate. You should never use this article for purposes different from the educational one.

Thursday, 29 January 2015

Image scaling

This is a script that has been resting on my desktop for months. Basically I would like to take my knowledge of image recognition with machine learning a step further and this is a simple step towards that direction.

The basic idea behind the script is to convert the image in a 0 and 1 format which is easier to process with machine learning algorithms. After I wrote the script, I found out there is a function in sklearn that does something similar, anyway I thought I could share the piece of code I wrote, perhaps it could be useful to someone.

This the image I started with

ot

By using the following script we can convert the image in a more manageable format

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
# Load image
x = Image.open("image.jpg","r")
# Convert black and white jpg image to array
y = np.asarray(x.getdata(),dtype=np.float64).reshape(x.size[1],x.size[0])
#Plot image
#plt.imshow(y,cmap=plt.cm.gray_r,interpolation="nearest")
#plt.show()
# Normalize the array within the range 0 and 1 for machine learning purposes
# Float allowed
def normalize(a):
m = min(a)
M = max(a)
i = 0
while i < len(a):
a[i] = (a[i]-m)/(M-m)
i += 1
return a
def normalizeArray(b):
for k in range(len(b)):
b[k] = normalize(b[k])
return b
# Normalize the array within the range 0 and 1 for machine learning purposes
# Float not allowed, only integers allowed
def normalizeInt(a):
m = min(a)
M = max(a)
i = 0
while i < len(a):
temp = (a[i]-m)/(M-m)
if temp < 0.5:
a[i] = 0
else:
a[i] = 1
i += 1
return a
def normalizeArrayInt(b):
for k in range(len(b)):
b[k] = normalizeInt(b[k])
return b
A = normalizeArray(y)
plt.imshow(A,cmap=plt.cm.gray_r,interpolation="nearest")
plt.show()
B = normalizeArrayInt(y)
plt.imshow(B,cmap=plt.cm.gray_r,interpolation="nearest")
plt.show()
view raw imscale.py hosted with ❤ by GitHub

This is the result we obtain, A and B respectively


f


Only numbers between 0 and 1 are used in the picture A above while the picture below uses only 0s and 1.1



For sure we loose some information, however this is just a raw script, perhaps I can do better in the future. However, this is a rather complicated image (not to mention the fact that it’s the only sample I have got) the script should yield better results with simpler images and shapes.

Portfolio VaR with Python

After some posts on correlation (How to build a correlation matrix in Python) and variance (How to build a variance-covariance matrix in Python) today I’m posting an example of application: portfolio VaR. Please before you continue reading the article, make sure to read and understand the disclaimer at the bottom of the page. Thank you.

Portfolio VaR is the natural extension of the VaR risk indicator to a portfolio of stocks. The implementation is a little bit harder than the one or the two stock version since it involves calculations with matrices. If you are interested to get a first grasp on VaR you can check my first implementation in R here.

In this script I try to estimate parametric VaR of a portfolio of stocks using historical data.

figure_1

A brief outline of the process I followed:

1)First I downloaded data from Quandl (they are a great source of free data by the way), then I reshaped the data for each stock into a .csv file to make it resemble the following pattern:

ticker,average return, standard deviation,initial price, R1,R2,R3,…,Rn

where Ri stands for Rth return and initial price is the most recent price.
For my analysis I used simple returns calculation, however it should be more appropriate to use logarithmic returns if possible. Next time I’ll probably be using log returns.

2) The script contains 9 functions:

2 functions are the “builder functions” (fillDicts and runSimulation) and they are used to load the data into the program

2 functions (correlationMatrix and varCovarMatrix) are auxiliary functions since they are used in the VaR function to estimate VaR.

A plot function has been added to plot (if needed) the simulation of future returns by the runSimulation function. The model used to simulate future prices is fairly simple and can be found here.

A function (portfolioExpectedReturn) to calculate portfolio expected returns based on historical data

And finally, two functions (simple_optimise_return and optimise_risk_return) to optimise the portfolio for high returns and the risk/return ratio, respectively.

3)Some preliminary notes and some assumptions.
The model is based on parametric VaR, therefore it is assuming that returns are normally distributed like a smooth Bell curve. Furthermore, the VaR is calculated for an holding period of 1 day.

For the sake of simplicity I calculated 99% VaR for each example and ran the optimisation functions for a portfolio of 10 stocks. The optimisation functions operate as follow:

simple_optimise_return yields a portfolio of n stocks with the highest average daily return in the sample while optimise_risk_return yields the 10 stocks with higher average return to standard deviation ratio. The idea behind the second optimisation is that you may want to get the highest return possible for unit of risk that you are bearing. Note that I am assuming that the total invested amount is equally distributed among the 10 stocks (10%  of the total amount is invested in each stock).

While keeping in mind that the past usually does not predict with great accuracy the future, this script let me grasp an insight of what could have (possibly) been a good portfolio of stocks in the last years.

A sidenote: since this is just for fun, some data is actually missing and some stocks have longer time series since they were on the market long before than others.

Another sidenote: a great improvement to this script could be the calculation of the advantages (in terms of less risk) of diversification and optimisation for correlation (maximum and minimum correlation for instance)

# This simple script gathers data of stocks and uses it to simulate future price for
# 1 year (255 days). Then it optimises the portfolio for risk and returns using risk
# measures such as VaR
#
# Note: tickers are all lower case, such as "aapl"
from scipy.special import ndtri
import matplotlib.pyplot as plt
import numpy as np
import itertools
import time
import math
import os
# Set chdir where the tickers lie
os.chdir(mypath)
# Get tickers as a list
stocks = os.listdir()
# Get data on stocks
file = open(mystockData,"r")
data = file.read()
file.close()
lines = data.split("\n")
# A dictionary containing raw data as a string for each stock
stocks_dictionary = dict()
# Previons for 255 days for each stock
prevision = dict()
# Correlation data
tickersCorr = [] # tickers
returnsCorr = [] # for correlation matrix
# Standard deviation dictionary
stdv = dict()
# Stocks average returns
meanReturns = dict()
# Stocks mu/stdv ratio
muStdvRatio = dict()
# Store data in dictionaries and run prevision of future prices
def runSimulation(ticker):
mu = float(stocks_dictionary[ticker][0])
sd = float(stocks_dictionary[ticker][1])
try:
# Storing initial price
initialp = float(stocks_dictionary[ticker][2])
# Calculating and storing standard deviation
returnsData = np.array([float(i) for i in stocks_dictionary[ticker][3:-1]])
sd = returnsData.std()
stdv[ticker] = sd
# Storing average returns
meanReturns[ticker] = mu
# Storing mu/stdv ratio
muStdvRatio[ticker] = mu/sd
except Exception as e:
# In case initial price is not available, 100 is fixed as default
initialp = 100
print(initialp,ticker.upper(),"Exception: ",e)
try:
# Calculate correlation only for stocks with datapoints > 509 days
if (len(stocks_dictionary[ticker])-2) >= 509:
tickersCorr.append(ticker)
returnsCorr.append(stocks_dictionary[ticker][3:505])
else:
pass
except Exception as e:
print("Exception triggered: ",e)
# Run stock simulation for a year (255 days)
prevision_price = []
for t in range(255):
try:
n = np.random.normal(0,1,1)
r = mu + n*sd
price = initialp * math.pow(math.e,r)
prevision_price.append(price)
initialp = price
except Exception as e:
print("Exception triggered__2",e)
prevision[ticker] = np.array(prevision_price)
def fillDicts():
global tickersCorr, returnsCorr
# Fill dictionaries with values
for line in lines[1:]:
try:
lines_data = line.split(",")
stocks_dictionary[lines_data[0]] = lines_data[1:]
except Exception as e:
print("Exception triggered",e)
# Run simulation for each stock
for stock in stocks_dictionary.keys():
try:
runSimulation(stock)
except Exception as e:
print("Exception triggered",e)
# Convert returnsCorr into a numpy array
returnsCorr = np.array(returnsCorr)
# This function builds a correlation matrix and if plott = True plots the heat-map
def correlationMatrix(plott = False):
cmatrix = np.corrcoef(returnsCorr)
if plott:
plt.imshow(cmatrix,interpolation='nearest')
plt.colorbar()
plt.show()
return cmatrix
else:
return cmatrix
# This function plots price prediction for the last 20 stocks in the sample
def plotStocks(index=1,ticker=0,single=False):
plt.style.use("ggplot")
if single:
ticker = ticker.lower()
returns = [float(i) for i in stocks_dictionary[ticker][3:-1]]
prev = prevision[ticker]
plt.plot(returns,label=ticker)
plt.plot(prev,label=ticker)
else:
for i in stocks[-index:]:
try:
plt.plot(prevision[i],label=i)
except:
pass
plt.xlim([0,255])
if index < 8 or single:
plt.legend()
plt.title("Stock price simulation")
plt.show()
# This function builds the variance covariance matrix given a set of tickers (stocks)
def varCovarMatrix(stocksInPortfolio):
cm = correlationMatrix()
vcv = []
for eachStock in stocksInPortfolio:
row = []
for ticker in stocksInPortfolio:
if eachStock == ticker:
variance = math.pow(stdv[ticker],2)
row.append(variance)
else:
cov = stdv[ticker]*stdv[eachStock]* cm[tickersCorr.index(ticker)][tickersCorr.index(eachStock)]
row.append(cov)
vcv.append(row)
vcvmat = np.mat(vcv)
return vcvmat
# This function calculates Value at Risk for the given portfolio
def VaR(stocksInPortfolio,stocksExposure,confidenceAlpha,Print=False):
alpha = ndtri(confidenceAlpha)
# Stocks weighs in portfolio
weight = (np.array(stocksExposure)/sum(stocksExposure))*100
# VarianceCovariance matrix and exposure matrix
vcvm = varCovarMatrix(stocksInPortfolio)
vmat = np.mat(stocksExposure)
# Variance of portfolio in euro/usd
varianceRR = vmat * vcvm * vmat.T
# Value at Risk (portfolio)
var = alpha * np.sqrt(varianceRR)
if Print:
print("\nPortfolio total value: ",sum(stocksExposure))
for s, v, w in zip(stocksInPortfolio,stocksExposure,weight):
print(s.upper(),v,"usd/euro",round(w,2),"% of portfolio")
print("VaR: @ "+str(confidenceAlpha*100)+"% confidence:",var,"euro/usd")
print("VaR: "+str(var[0][0]/sum(stocksExposure)*100)+"% of portfolio value.")
return var
# Calculates expected return for the portfolio
def portfolioExpectedReturn(stocksInPortfolio,stocksExposure,alpha=0.99,weightRisk=False,Print=False):
weight = (np.array(stocksExposure)/sum(stocksExposure))
returnsPortfolio = []
for stock in stocksInPortfolio:
returnsPortfolio.append(meanReturns[stock])
returnsPortfolio = np.array(returnsPortfolio)
# Dot product: elementwise moltiplication and then sum
weightedReturn = weight.dot(returnsPortfolio)
if weightRisk:
varPortfolio = VaR(stocksInPortfolio,stocksExposure,alpha,Print=False)
portfolioPercentage = varPortfolio/sum(stocksExposure)*100
weightedRiskReturn = weightedReturn/portfolioPercentage
if Print:
print("\nPortfolio composition and expected returns (daily):")
for s,r,w in zip(stocksInPortfolio,returnsPortfolio,weight):
print(s.upper(),"expected return:",r*100,"%","weight",w*100,"%")
print("Portfolio percentage weighted return:",weightedReturn*100,"%")
if weightRisk and Print:
print("Portfolio return risk weighted:",weightedRiskReturn*100,"%")
return weightedRiskReturn
return weightedReturn
# Optimizes portfolio for the most performing (returningwhise) stocks
def simple_optimise_return(n,portfolio=False,Print=False):
avgReturns = meanReturns.copy()
ticks = []
returns = []
for i in range(n):
bigReturn = max(avgReturns.values())
# >5% average daily return is usually an outlier or some kind of error in the data
while (bigReturn*100) > 5:
ticker = [key for key in avgReturns.keys() if avgReturns[key] == bigReturn][0]
del avgReturns[ticker]
bigReturn = max(avgReturns.values())
ticker = [key for key in avgReturns.keys() if avgReturns[key] == bigReturn][0]
ticks.append(ticker)
returns.append(avgReturns[ticker])
del avgReturns[ticker]
if Print:
print("\n")
print("##########################################################")
print("OPTIMISING FOR RETURN")
print("Daily returns")
for t,r in zip(ticks,returns):
print(t.upper(),r*100,"%")
if portfolio:
amounts = [1000 for i in range(n)]
VaR(ticks,amounts,0.99,Print=True)
portfolioExpectedReturn(ticks,amounts,weightRisk=True,Print=True)
# Optimizes portfolio with n stocks for mu/stdv ratio.
# The greater the ratio, the better
def optimise_risk_return(n,portfolio,Print):
muStdv = muStdvRatio.copy()
ticks = []
ratios = []
k = 0
while k < n:
bigRatio = max(muStdv.values())
ticker = [key for key in muStdv.keys() if muStdv[key] == bigRatio][0]
# The data for these stocks seems to be corrupted so I delete it
if ticker in ["spls","unm","pgr"]:
del muStdv[ticker]
else:
ticks.append(ticker)
ratios.append(muStdv[ticker])
del muStdv[ticker]
k += 1
if Print:
print("\n")
print("##########################################################")
print("OPTIMISING FOR RETURN/RISK")
print("Avg return to stdv ratios (return for unit of risk):")
for t,r in zip(ticks,ratios):
print(t.upper(),r)
if portfolio:
amounts = [1000 for i in range(n)]
VaR(ticks,amounts,0.99,Print=True)
portfolioExpectedReturn(ticks,amounts,weightRisk=True,Print=True)
view raw var1.py hosted with ❤ by GitHub

Now I run the program

#---------------------------------RUN THE PROGRAM------------------------------------
fillDicts()
# plotStocks(index=20)
# Testing the VaR function with a random portfolio of stocks
VaR(["aapl","aes","bks"],[10000,20000,25000],0.99,Print=True)
# Optimising
simple_optimise_return(10,portfolio=True,Print=True)
optimise_risk_return(10,portfolio=True,Print=True)
view raw var2.py hosted with ❤ by GitHub

And after some computations here is the result that we get:

# >>> ================================ RESTART ================================
# >>>
#
# Portfolio total value: 55000
# AAPL 10000 usd/euro 18.18 % of portfolio
# AES 20000 usd/euro 36.36 % of portfolio
# BKS 25000 usd/euro 45.45 % of portfolio
# VaR: @ 99.0% confidence: [[ 2928.10001036]] euro/usd
# VaR: [[ 5.3238182]]% of portfolio value.
#
# Portfolio composition and expected returns (daily):
# AAPL expected return: 0.163040987239 % weight 18.1818181818 %
# AES expected return: 0.0338781514838 % weight 36.3636363636 %
# BKS expected return: 0.0612855150617 % weight 45.4545454545 %
# Portfolio percentage weighted return: 0.0698201959747 %
# Portfolio return risk weighted: [[ 0.01311468]] %
#
#
# ##########################################################
# OPTIMISING FOR RETURN
# Daily returns
# ABBV 0.29866450622999996 %
# TSLA 0.263525046606 %
# NTRI 0.236777417156 %
# MNST 0.22788141423299998 %
# PCLN 0.20846570118 %
# ECYT 0.20556515452000002 %
# PETS 0.199334672597 %
# LIFE 0.197835842242 %
# NFLX 0.19741444376199999 %
# GMCR 0.189683647114 %
#
# Portfolio total value: 10000
# ABBV 1000 usd/euro 10.0 % of portfolio
# TSLA 1000 usd/euro 10.0 % of portfolio
# NTRI 1000 usd/euro 10.0 % of portfolio
# MNST 1000 usd/euro 10.0 % of portfolio
# PCLN 1000 usd/euro 10.0 % of portfolio
# ECYT 1000 usd/euro 10.0 % of portfolio
# PETS 1000 usd/euro 10.0 % of portfolio
# LIFE 1000 usd/euro 10.0 % of portfolio
# NFLX 1000 usd/euro 10.0 % of portfolio
# GMCR 1000 usd/euro 10.0 % of portfolio
# VaR: @ 99.0% confidence: [[ 366.11836467]] euro/usd
# VaR: [[ 3.66118365]]% of portfolio value.
#
# Portfolio composition and expected returns (daily):
# ABBV expected return: 0.29866450623 % weight 10.0 %
# TSLA expected return: 0.263525046606 % weight 10.0 %
# NTRI expected return: 0.236777417156 % weight 10.0 %
# MNST expected return: 0.227881414233 % weight 10.0 %
# PCLN expected return: 0.20846570118 % weight 10.0 %
# ECYT expected return: 0.20556515452 % weight 10.0 %
# PETS expected return: 0.199334672597 % weight 10.0 %
# LIFE expected return: 0.197835842242 % weight 10.0 %
# NFLX expected return: 0.197414443762 % weight 10.0 %
# GMCR expected return: 0.189683647114 % weight 10.0 %
# Portfolio percentage weighted return: 0.222514784564 %
# Portfolio return risk weighted: [[ 0.06077673]] %
#
#
# ##########################################################
# OPTIMISING FOR RETURN/RISK
# Avg return to stdv ratios (return for unit of risk):
# DLPH 0.11303676191
# PSX 0.0731996638171
# KRFT 0.0731795847888
# TSLA 0.0721904812827
# LYB 0.0704532407386
# MNST 0.0691306364895
# MA 0.0676123468044
# MJN 0.0673849545213
# AAPL 0.0655578125275
# DG 0.0626210193236
#
# Portfolio total value: 10000
# DLPH 1000 usd/euro 10.0 % of portfolio
# PSX 1000 usd/euro 10.0 % of portfolio
# KRFT 1000 usd/euro 10.0 % of portfolio
# TSLA 1000 usd/euro 10.0 % of portfolio
# LYB 1000 usd/euro 10.0 % of portfolio
# MNST 1000 usd/euro 10.0 % of portfolio
# MA 1000 usd/euro 10.0 % of portfolio
# MJN 1000 usd/euro 10.0 % of portfolio
# AAPL 1000 usd/euro 10.0 % of portfolio
# DG 1000 usd/euro 10.0 % of portfolio
# VaR: @ 99.0% confidence: [[ 186.59653898]] euro/usd
# VaR: [[ 1.86596539]]% of portfolio value.
#
# Portfolio composition and expected returns (daily):
# DLPH expected return: 0.173393787383 % weight 10.0 %
# PSX expected return: 0.134631079096 % weight 10.0 %
# KRFT expected return: 0.0798929650121 % weight 10.0 %
# TSLA expected return: 0.263525046606 % weight 10.0 %
# LYB expected return: 0.162308154081 % weight 10.0 %
# MNST expected return: 0.227881414233 % weight 10.0 %
# MA expected return: 0.165804553103 % weight 10.0 %
# MJN expected return: 0.110886753605 % weight 10.0 %
# AAPL expected return: 0.163040987239 % weight 10.0 %
# DG expected return: 0.101437112162 % weight 10.0 %
# Portfolio percentage weighted return: 0.158280185252 %
# Portfolio return risk weighted: [[ 0.08482482]] %
# >>>
view raw var3.py hosted with ❤ by GitHub

The results look interesting to me, especially since the VaR of the last portfolio is really low if compared with the first portfolio I used to test the VaR function and the non-weighted return seems pretty good too.





Disclaimer
This article is for educational purpose only. The author is not responsible for any consequence or loss due to inappropriate use. The article may well contain mistakes and errors. The numbers used might not be accurate. You should never use this article for purposes different from the educational one.

Tuesday, 27 January 2015

How to build a variance-covariance matrix in Python

Recently I wrote a script to calculate the VaR of a portfolio of stocks given historical prices and returns and, in order to do that, I had to study the basics of a variance-covariance matrix.

First of all, what is it?
The formal definition should sound something like this: a variance-covariance matrix is a matrix whose element in the i,j position is the covariance between the ith and jth elements of a vector of random variables. Informally, we may say that a variance-covariance matrix is the matrix of the covariances and since the covariance of a random variable with itself is its variance, the main diagonal of the matrix is filled with the variances of the random variables (hence the fancy name).

clip_image002[17]

What is it useful for?
When calculating VaR, say for a single stock, you need to gather the standard deviation of the returns of your stock. However, when calculating the VaR of a portfolio, things get pretty messy pretty quick, since you cannot simply add or subtract variances. In a more intuitive way, we may say that the variance-covariance matrix generalizes the notion of variance to multiple dimensions.

So how can we build it in Python?
Here is a simple template of how I built mine. I used only two stocks, but in the script I talked about earlier I used 500 stocks, you can easily imagine what a mess it can be if you miss some numbers.

Before showing the code, let’s take a quick look at relationships between variance, standard deviation and covariance:

Standard deviation is the square root of the variance

clip_image002[15]

Covariance can be obtained given correlation (check how to build a correlation matrix) and standard deviations.

clip_image002[13]

Now we can look at the script:

import numpy as np
import math
stdv = {"ABC":0.3,"XYZ":0.2}
tickersCorr = ["ABC","XYZ"]
# Assuming a 0.5 correlation here is the correlation matrix
c = [[1,0.5],[0.5,1]]
def varCovarMatrix(stocksInPortfolio):
cm = np.array(c)
vcv = []
for eachStock in stocksInPortfolio:
row = []
for ticker in stocksInPortfolio:
if eachStock == ticker:
variance = math.pow(stdv[ticker],2)
row.append(variance)
else:
cov = stdv[ticker]*stdv[eachStock]* cm[tickersCorr.index(ticker)][tickersCorr.index(eachStock)]
row.append(cov)
vcv.append(row)
vcvmat = np.mat(vcv)
return vcvmat
print(varCovarMatrix(["ABC","XYZ"]))
view raw varCovar.py hosted with ❤ by GitHub

And here is the output:

# Output
# >>> ================================ RESTART ================================
# >>>
# [[ 0.09 0.03]
# [ 0.03 0.04]]
# >>>
view raw varCovar_out.py hosted with ❤ by GitHub

Hope this was interesting.

Monday, 26 January 2015

How to build a correlation matrix in Python

Suppose you have an array of data, stocks returns for instance, and you are wondering whether there is some degree of correlation. If you are using Python then it is easy to plug in your data and do some simple calculations.

Here is a simple example of a correlation matrix using Python. This matrix shows the correlation between 10 variables (10x10 matrix): as you can see, the correlation of a variable with itself is 1 as expected.

correlation

By entering more variables into the calculation the matrix becomes more difficult to interpret although it remains quite nice to visualize. Here below is an example of a 50x50 matrix.

corr2

The code I used is really simple, you can find it here below. Basically I built a random array with 10/50 data points and then used the numpy function “corrcoef” which handles all the troubles of building a matrix in an elegant way.

import numpy as np
import matplotlib.pyplot as plt
b = []
for i in range(50):
a = np.random.normal(5,i+1,10)
b.append(a)
c = np.array(b)
cm =np.corrcoef(c)
plt.imshow(cm,interpolation='nearest')
plt.colorbar()
plt.show()
view raw corrMatrix.py hosted with ❤ by GitHub

Thursday, 8 January 2015

Some plots with matplotlib

I recently updated matplotlib and could finally use the style attribute from the updated module. I thought it might be a good time to plot some data and try it out.

In order to get a grasp on how some emerging markets have performed so far, I thought I could plot some data on the real GDP (GDP adjusted for inflation) and perhaps compare it to a big part of the EU’s economy (Germany).

The data I used is from Quandl a great source of data you should check it out https://www.quandl.com/

figure_1

gdp_growth

performance_relative_toGermany

Note: dotted lines indicates that the data is prospective (in this case obtained through linear regression).

Hope this was interesting