Here is the animation on you tube:
Shared thoughts, experiments, simulations and simple ideas with Python, R and other languages
Sunday, 22 February 2015
A plus: 2 pistons engine, get the timing right!
Here is the animation on you tube:
Crankshaft connecting rod and piston mechanism simulation with Python
It turns out the physics behind this mechanism is pretty interesting and easy to code, therefore I thought I could give it a go and try to make a simulation.
First of all we need to define the problem and solve it:
This is the basics from which we start: the crankshaft (rod ‘a’), the connecting rod (rod ‘b’) and the piston whose position is denoted with c. The motion of rod ‘a’ is a pure rotational motion while the motion of rod ‘b’ is somewhat more complex. The piston is performing a linear motion since it is constrained onto the real axis.
Now you might ask yourself why I am using the imaginary plane… It turns out that you can represent each vector (a,b and c) with a complex number, and this semplifies our problem into a more manageable system of two equations.
By using vector properties we can easily write:
The vector equation above states that the position of c is the sum of a and b as it can easily be seen on the picture. Now, we can get our real and imaginary coordinates in the plane by using Euler’s formula:
Note that every one of these parameters is a function of time and assuming that alpha(t), the length of the rods a and b are known, the system can be solved for c and beta. Furthermore by deriving the original equation we can obtain velocity and acceleration for each time t, since assuming alpha(t) is known then the derivatives of alpha(t) are known too (assuming alpha is derivable two times with respect to t).
Here are velocity and acceleration (vectors), respectively:
Now, for the sake of this example, I am assuming
Given my assumption,
Ok, so far so good, now that we have the position at each time we just need to translate everything in a language that Python can understand: I used a class, however you can easily avoid using it since it is not really necessary
import matplotlib.animation as animation | |
import matplotlib.pyplot as plt | |
import visual as v | |
import numpy as np | |
import time | |
plt.style.use('ggplot') | |
class My_mechanism(object): | |
# The init function | |
def __init__(self,a,b,alpha_dot,alpha0=0): | |
self.a = a # Rod a | |
self.b = b # Rod b | |
self.alpha_dot = alpha_dot # Angular speed of rod a in rad/s | |
self.alpha0 = alpha0 # Initial angular position of rod a | |
self.k = np.array([0.01]) # Initial time for animation | |
self.c_position = [] # Piston position for animation | |
self.conn_rod_angular_speed = [] # Classes for the graphs animation | |
self.c_speed = [] | |
# Angular position of rod a as a function of time | |
def alpha(self,t): | |
if not all(t): | |
raise ValueError('Each time t must be greater than 0') | |
alpha = self.alpha0 + self.alpha_dot*t | |
return alpha | |
# Angular position of rod b as a function of time | |
def beta(self,t): | |
beta = np.arcsin((self.a/self.b)*np.sin(self.alpha(t))) | |
return beta | |
# Piston position of rod a as a function of time | |
def piston_position(self,t): | |
c_y = np.zeros(t.shape[0]) | |
c_x = self.a*np.cos(self.alpha(t)) + self.b*np.cos(-np.arcsin((self.a/self.b)*np.sin(self.alpha(t)))) | |
return c_x,c_y | |
# Rod a position in the complex plane | |
def rod_a_position(self,t): | |
a_y = self.a*np.sin(self.alpha(t)) | |
a_x = self.a*np.cos(self.alpha(t)) | |
return a_x,a_y | |
# Angular speed of rod b | |
def beta_dot(self,t): | |
beta_dot = -(self.a*self.alpha_dot*np.cos(self.alpha(t)))/(self.b*np.cos(self.beta(t))) | |
return beta_dot | |
# Piston speed | |
def c_dot(self,t): | |
c_dot = self.a*self.alpha_dot*(np.tan(self.beta(t))*np.cos(self.alpha(t))-np.sin(self.alpha(t))) | |
return c_dot | |
# Animation using vpython | |
def animation_vp(self,t0,tn,points): | |
if (tn-t0) < 0: | |
raise ValueError('tn must be greater than t0') | |
if (tn < 0) or (tn < 0) or (points < 0): | |
raise ValueError('tn, t0, and points must be greater than 0') | |
t = np.linspace(t0,tn,num=points) | |
beta = self.beta(t) | |
a_x,a_y = self.rod_a_position(t) | |
c_x,c_y = self.piston_position(t) | |
crank = v.arrow(pos=(0,0,0),axis=(1,0,0),color=(1,1,1),length=m.a,make_trail=False) | |
connectingRod = v.arrow(pos=(0,0,0),axis=(-1,0,0),color=(0,0,1),length=m.b,make_trail=False) | |
piston = v.cylinder(pos=(c_x[0],c_y[0],0),radius=5,color=(1,0,0),length=10,make_trail=True) | |
ball = v.sphere(pos=(0,0,0),radius=3,color=(1,1,1)) | |
theta0 = beta[1] | |
for x_a,y_a,x_c,y_c,b in zip(a_x,a_y,c_x,c_y,beta): | |
crank.axis = (x_a,y_a,0) | |
connectingRod.pos = (x_a,y_a,0) | |
ball.pos = (x_a,y_a,0) | |
# Change in angular position | |
dtheta = theta0-b | |
theta0 = b | |
connectingRod.rotate(angle=dtheta,axis=(0,0,1)) | |
connectingRod.pos = (x_c,y_c,0) | |
piston.pos = (x_c,y_c,0) | |
time.sleep(0.1) | |
# Animation using matplotlib | |
def animation_m(self): | |
fig = plt.figure() | |
ax1 = fig.add_subplot(1,1,1) | |
def animate(i): | |
a_x,a_y = self.rod_a_position(self.k) | |
c_x,c_y = self.piston_position(self.k) | |
ax1.clear() | |
# Connecting rod (b) | |
plt.xlim(-15,35) | |
plt.ylim(-15,15) | |
ax1.plot([0,a_x[0]],[0,a_y[0]],linewidth=3,color='blue') | |
# Crankshaft rod (a) | |
ax1.plot([a_x[0],c_x[0]],[a_y[0],c_y[0]],linewidth=3,color='green') | |
# Piston (c) | |
ax1.plot([0,c_x[0]],[0,c_y[0]],'o',markersize=20,color='red') | |
self.k += 0.1 | |
ani = animation.FuncAnimation(fig,animate,interval=50) | |
plt.show() | |
# Matplotlib animation with graphs | |
def animation_m_plus(self): | |
fig = plt.figure() | |
ax1 = fig.add_subplot(3,1,1) | |
ax2 = fig.add_subplot(3,1,2) | |
ax3 = fig.add_subplot(3,1,3) | |
def animate(i): | |
a_x,a_y = self.rod_a_position(self.k) | |
c_x,c_y = self.piston_position(self.k) | |
ax1.clear() | |
ax2.clear() | |
ax3.clear() | |
# Crankshaft rod (a) | |
ax1.plot([0,a_x[0]],[0,a_y[0]],linewidth=3,color='blue') | |
# Connecting rod (b) | |
ax1.plot([c_x[0],a_x[0]],[c_y[0],a_y[0]],linewidth=3,color='green') | |
# Piston | |
ax1.plot([0,c_x[0]],[0,c_y[0]],'o',markersize=20,color='red') | |
ax1.set_xlim(-15,35) | |
ax1.set_ylim(-11,11) | |
ax1.set_title('Crankshaft, connecting rod and piston mechanism') | |
# Piston position | |
self.c_position.append(c_x) | |
ax2.plot(self.c_position,color='green') | |
ax2.set_xlim(0,300) | |
ax2.set_ylim(8,32) | |
ax2.set_title('Piston position') | |
# Piston speed | |
self.c_speed.append(self.c_dot(self.k)) | |
ax3.plot(self.c_speed,color='green') | |
ax3.plot([0,300],[0,0],linewidth=1,color='black') | |
ax3.set_xlim(0,300) | |
ax3.set_ylim(-15,15) | |
ax3.set_title('Piston speed') | |
self.k += 0.1 | |
ani = animation.FuncAnimation(fig,animate,interval=50) | |
plt.show() |
Once we have coded all what is above, we can create a My_mechanism instance and call the methods, be sure not to call all the methods at once since it will not run them all. Call a method at each time:
m = My_mechanism(10,20,1) | |
# m.animation_vp(0.01,100,400) | |
# m.animation_m() | |
m.animation_m_plus() |
Here below are the videos I made using the animations of the mechanism:
Another Excel spreadsheet: Savings with LED lights
I recently uploaded a new Excel spreadsheet where I made a calculation of how much can be saved by replacing neon tubes with LED tubes.
As you might have noticed, I do like using Excel for calculations too, and I set up a page where some of my little projects and calculations with Excel are shared. Here it is: Excel page.
Tuesday, 17 February 2015
Some exercises with plots and matplotlib on currencies
EDIT: Apparently, some of the prices in the .csv files I used were missing, and this caused some problems with pandas dataframe since it replaces missing values with ‘na’. Should you encounter the same problem you could check every line with an if statement or use a method to replace the na. You can check the pandas’ documentation here.
Yesterday I was really bored and had some time which could be put to good use, therefore I decided to write a quick script to plot percentage change of currencies of some countries and compare them. The code I ended up with is a bit sloppy I guess, but that’s fine since I was primarily interested in improving my (limited, as of now) use of pandas and having fun.
First of all I gathered data from Quandl, namely the prices of the selected currencies in terms of dollars, that’s to say the value per day of every selected currency with respect to the dollar:
XYZ/USD (daily)
Gathering data from Quandl is really easy and fast using the Quandl API for Python. By the way, an API for R is available too.
I then computed the percentage change for 2 years and defined some plotting functions. Here is the main result the plotting function produces given a reference currency: it plots the percentage change for every currency (with respect to the dollar) against the percentage change of the reference currency. Some of the plotted data looks definitely weird, I wonder if I did something wrong or lost some information during the process.
Here is the code I used:
import pandas as pd | |
import numpy as np | |
import matplotlib.pyplot as plt | |
plt.style.use('ggplot') | |
currency_needed = ['EUR','CNY','THB','VND','MYR','KHR','XOF'] | |
country = ['Europe','China','Thailand','Vietnam','Malesia','Cambodia','Senegal'] | |
print('Available currencies:') | |
for i,k in zip(country,currency_needed): | |
print(i,k+"/USD") | |
data_against_USD = dict() | |
data_against_USD_pchange = dict() | |
USD_against_data = dict() | |
countries = dict() | |
def loadDB_against_USD(currency): | |
d = pd.DataFrame() | |
db = d.from_csv(currency+'_USD'+'.csv') | |
return db | |
def loadDB_USD_against_other(currency): | |
d = pd.DataFrame() | |
db = d.from_csv('USD_'+currency+'.csv') | |
return db | |
for i in currency_needed: | |
data_against_USD[i+'_USD'] = loadDB_against_USD(i) | |
countries[i+'_USD'] = country[currency_needed.index(i)] | |
USD_against_data['USD_'+i] = loadDB_USD_against_other(i) | |
for key in data_against_USD.keys(): | |
data_against_USD_pchange[key] = data_against_USD[key].pct_change() | |
# Plot everything in a single graph | |
def plt_pchange_all(): | |
for key in data_against_USD_pchange.keys(): | |
lab = key + ' ' + countries[key] + ' percentage change' | |
plt.plot(data_against_USD_pchange['EUR_USD'].index,data_against_USD_pchange[key]*100,label=lab) | |
plt.legend() | |
plt.show() | |
# Plot 2 currency percentage change | |
def plt_two_data(c,d,typ="scatter"): | |
c1 = c.upper() + '_USD' | |
c2 = d.upper() + '_USD' | |
if typ == "scatter": | |
plt.scatter(data_against_USD_pchange[c1]['rate']*100 ,data_against_USD_pchange[c2]['rate']*100) | |
elif typ == "line": | |
plt.plot(data_against_USD_pchange['EUR_USD'].index,data_against_USD_pchange[c1]['rate']*100,color="blue",label=c1) | |
plt.plot(data_against_USD_pchange['EUR_USD'].index,data_against_USD_pchange[c2]['rate']*100,color="red",label=c2) | |
plt.title(c1 + " and " + c2 + " percentage change") | |
plt.legend() | |
plt.show() | |
# Plot every currency percentage change against | |
# the percentage change of a reference currency | |
def plot_reference(reference): | |
fig = plt.figure() | |
ref_key = reference + '_USD' | |
ref_lab = reference + ' ' + country[currency_needed.index(reference)] | |
i = 1 | |
for key in data_against_USD_pchange.keys(): | |
fig.add_subplot(3,3,i) | |
lab = key + ' ' + countries[key] | |
plt.plot(data_against_USD_pchange['EUR_USD'].index,data_against_USD_pchange[key]*100,label=lab,color='blue') | |
plt.plot(data_against_USD_pchange['EUR_USD'].index,data_against_USD_pchange[ref_key]*100,label=ref_lab,color='red') | |
i += 1 | |
plt.legend() | |
plt.savefig('image.png') | |
plt.show() |
Disclaimer
This article is for educational purpose only. The author is not responsible for any consequence or loss due to inappropriate use. The article may well contain mistakes and errors. The data used might not be accurate. You should never use this article for purposes different from the educational one.
Saturday, 14 February 2015
CopulaClass a Python class for using copulas: a fitting example
As I have already said in my previous post Copulalib is really user-friendly, it is difficult to write something easier, however I thought I might give it a try.
This class is built around Copulalib and since it is to be used with 2-dimensional copulas, it implements plots for data visualization and some other functions such as:
-showAvailableCopulas() a method to show visually what copulas are included in the package
-generateCopula() a method to generate the copula
-printCorrelation() this method prints out Spearman’s rho, Kendall’s tau and the fitted parameter
-getSimulatedData() this method retrieves your simulated data from the copula assuming your original data is normally distributed. It would be nice to implement some tool which could figure out the most likely distribution of your data and then use it to get the simulated observations. Perhaps in the future I’ll do it.
Furthermore, the class does not mind if you feed in python lists of numpy arrays as it turns x and y in numpy arrays. Be careful however that it does not check if your lists/arrays are of the same length.
Anyway, as for the result of the testing script below, the fitting of the Frank copula to the data seems to have been successful, our simulated data seems to fit the real quite nicely:
Originally our data was (very) approximately normally distributed, with some sort of positive correlation as you can clearly see from the plots below
Here below you can see 1000 simulated pseudo-observations from the Frank copula
Below you can find the code I used to generate this simple model:
# Copula class | |
import numpy as np | |
import matplotlib.pyplot as plt | |
from copulalib.copulalib import Copula | |
from scipy.stats import norm | |
plt.style.use('ggplot') | |
class copulaClass(object): | |
# Available copulas | |
families = ['frank','clayton','gumbel'] | |
def __init__(self,x,y): | |
# Information about the data | |
self.x = x | |
self.y = y | |
self.mu_x = np.array(x).mean() | |
self.mu_y = np.array(y).mean() | |
self.std_x = np.array(x).std() | |
self.std_y = np.array(y).std() | |
# Information about the copula | |
self.cop = 0 | |
self.famil = 0 | |
self.tau_ = 0 | |
self.sr_ = 0 | |
self.theta_ = 0 | |
def showAvailableCopulas(self): | |
"""This function plots available copulas | |
to give you a visual insight """ | |
# Random simulated data | |
x = np.random.normal(size=250) | |
y = 2.5*x + np.random.normal(size=250) | |
fig = plt.figure() | |
# Frank | |
frank = Copula(x,y,family='frank') | |
uf,vf = frank.generate_uv(1000) | |
fig.add_subplot(2,2,1) | |
plt.scatter(uf,vf,marker='.',color='blue') | |
plt.ylim(0,1) | |
plt.xlim(0,1) | |
plt.title('Frank copula') | |
# Clayton | |
clayton = Copula(x,y,family='clayton') | |
uc,vc = clayton.generate_uv(1000) | |
fig.add_subplot(2,2,2) | |
plt.scatter(uc,vc,marker='.',color='red') | |
plt.ylim(0,1) | |
plt.xlim(0,1) | |
plt.title('Clayton copula') | |
# Gumbel | |
gumbel = Copula(x,y,family='gumbel') | |
ug,vg = gumbel.generate_uv(1000) | |
fig.add_subplot(2,2,3) | |
plt.scatter(ug,vg,marker='.',color='green') | |
plt.ylim(0,1) | |
plt.xlim(0,1) | |
plt.title('Gumbel copula') | |
plt.show() | |
def plotData(self): | |
"""This function plots the data you've fed in | |
to give you a visual insight of the correlation | |
structure and the marginal distributions """ | |
x = self.x | |
y = self.y | |
fig = plt.figure() | |
fig.add_subplot(2,2,1) | |
plt.hist(x,bins=20,color='green',alpha=0.8,align='mid') | |
plt.title('X variable distribution') | |
fig.add_subplot(2,2,3) | |
plt.scatter(x,y,marker="o",alpha=0.8) | |
fig.add_subplot(2,2,4) | |
plt.title('Joint X,Y') | |
plt.hist(y,bins=20,orientation='horizontal',color='red',alpha=0.8,align='mid') | |
plt.title('Y variable distribution') | |
plt.show() | |
def generateCopula(self,fam,plot=False): | |
"""Generate the copula and optionally plot it""" | |
if fam.lower() not in self.families: | |
raise ValueError('Please select a valid family name') | |
# Copula generation | |
self.famil = fam.lower() | |
c = Copula(self.x,self.y,family=fam.lower()) | |
self.cop = c | |
# Parameters are estimated and set | |
self.tau_ = c.tau | |
self.sr_ = c.sr | |
self.theta_ = c.theta | |
if plot: | |
u,v = c.generate_uv(1000) | |
plt.scatter(u,v,marker='.',color='red') | |
plt.xlim(0,1) | |
plt.ylim(0,1) | |
plt.title(fam.lower().capitalize()+' Copula 1000 pseudo observations') | |
plt.show() | |
def printCorrelation(self): | |
# Print details about correlations and parameters | |
print("#################################################") | |
print("Correlation details:") | |
print("Correlation index range: [-1,1] [negative,positive]") | |
print("Kendall's tau:",self.tau_) | |
print("Spearman's rho:",self.sr_) | |
print("Parameter of the copula (theta):",self.theta_) | |
print("#################################################") | |
def generatePseudoObs(self,n=1000,plot=False): | |
"""This function generates and returns simulated pseudo observations """ | |
if self.famil == 0: | |
raise ValueError('Generate copula first') | |
u,v = self.cop.generate_uv(n) | |
if plot: | |
plt.scatter(u,v,marker='.',color='red') | |
plt.xlim(0,1) | |
plt.ylim(0,1) | |
plt.title(self.famil.capitalize()+' Copula 1000 pseudo observations') | |
plt.show() | |
return u,v | |
def getSimulatedData(self,dist='normal',n=1000): | |
"""This function simulates real observations assuming that your data | |
is normally distributed. Optionally you can edit this function and | |
choose the distribution that fits your data best""" | |
if dist.lower() == 'normal': | |
u,v = self.generatePseudoObs(n=n) | |
x = norm.ppf(u,loc=self.mu_x,scale=self.std_x) | |
y = norm.ppf(v,loc=self.mu_y,scale=self.std_y) | |
return x,y | |
#------------------------------Run the program--------------------------------- | |
# Simulate some data | |
x = np.random.normal(loc=0,scale=0.4,size=250) | |
y = 2.5*x + np.random.normal(size=250) | |
# Genereate a copulaClass instance | |
a = copulaClass(x,y) | |
# Visualize your data | |
a.plotData() | |
# Should you want to check available copulas run the line below | |
# a.showAvailableCopulas() | |
# Fit the copula (say a Frank copula) | |
a.generateCopula('frank',plot=False) | |
# Plot pseudo observations | |
a.generatePseudoObs(plot=True) | |
# Print parameters | |
a.printCorrelation() | |
# Simulate data and plot real observations versus simulated obs. | |
c,d = a.getSimulatedData() | |
plt.scatter(c,d,color="red",label="Simulated data",marker='.') | |
plt.scatter(x,y,color="blue",label="Real data",marker='.') | |
plt.legend() | |
plt.title("Fitted Frank copula: simulated data versus real data") | |
plt.show() |
And here are the correlation details
##################################################### | |
# Correlation details: | |
# Correlation index range: [-1,1] [negative,positive] | |
# Kendall's tau: 0.4859437751 | |
# Spearman's rho: 0.665549080785 | |
# Parameter of the copula (theta): 5.48651123047 | |
##################################################### |
Copulalib: How to use copulas in Python
When dealing with copulas, R is a better option in my opinion, however, what could you do if you wish to use Python instead? There’s a good starting package called Copulalib which you can easily download here.
The package is really simple to use and very user-friendly I would say, it basically handles everything (pseudo-observations etc…) once you fed in the raw data. There is a simple example of implementation in the download page. Once the data has been fed into the function, the fitting is done automatically and the following parameters are generated:
-Spearman’s rho
-Kendall’s tau
-Theta (the parameter of the copula)
As of my understanding of the package, only Frank, Gumbel and Clayton copulas are available, and this of course could be a limitation, however it is for sure a good start. Another point which is problematic is that multidimensional copulas seem not to be supported.
Now for the real complaints: for some reason once the sample size is larger than 300 observations per variable (say 300 x and 300 y) the script raises an error saying that x and y must be of the same dimensions which is strange since they are already of the same size. Anyway maybe I did something incorrect.
Here below is the short piece of code which generated the plots of the data and of the available copulas
import numpy as np | |
import matplotlib.pyplot as plt | |
from copulalib.copulalib import Copula | |
plt.style.use('ggplot') | |
def generateData(): | |
global x,y | |
x = np.random.normal(size=250) | |
y = 2.5*x + np.random.normal(size=250) | |
# Data and histograms | |
def plotData(): | |
global x,y | |
fig = plt.figure() | |
fig.add_subplot(2,2,1) | |
plt.hist(x,bins=20,color='green',alpha=0.8,align='mid') | |
plt.title('X variable distribution') | |
fig.add_subplot(2,2,3) | |
plt.scatter(x,y,marker="o",alpha=0.8) | |
fig.add_subplot(2,2,4) | |
plt.title('Joint X,Y') | |
plt.hist(y,bins=20,orientation='horizontal',color='red',alpha=0.8,align='mid') | |
plt.title('Y variable distribution') | |
plt.show() | |
def generateCopulas(): | |
global x,y | |
fig = plt.figure() | |
frank = Copula(x,y,family='frank') | |
uf,vf = frank.generate_uv(1000) | |
fig.add_subplot(2,2,1) | |
plt.scatter(uf,vf,marker='.',color='blue') | |
plt.ylim(0,1) | |
plt.xlim(0,1) | |
plt.title('Frank copula') | |
clayton = Copula(x,y,family='clayton') | |
uc,vc = clayton.generate_uv(1000) | |
fig.add_subplot(2,2,2) | |
plt.scatter(uc,vc,marker='.',color='red') | |
plt.ylim(0,1) | |
plt.xlim(0,1) | |
plt.title('Clayton copula') | |
gumbel = Copula(x,y,family='gumbel') | |
ug,vg = gumbel.generate_uv(1000) | |
fig.add_subplot(2,2,3) | |
plt.scatter(ug,vg,marker='.',color='green') | |
plt.ylim(0,1) | |
plt.xlim(0,1) | |
plt.title('Gumbel copula') | |
plt.show() | |
#------------------------------------------------------------------------------- | |
# Run the functions | |
generateData() | |
plotData() | |
generateCopulas() |
Next I’m going to post a class for copulas.
Tuesday, 10 February 2015
How to fit a copula model in R
I have been working on this topic for a great amount of time and to be honest I find R documentation not that user-friendly as the documentation for most Python modules. Anyway the fact that copulas are not the easiest model to grasp has contributed to further delays too. But mainly the lack of examples and users of these models was the biggest obstacle. Then again, I might have looked in the wrong places, if you have any good resource to suggest please feel free to leave a comment. At the bottom of this page I’ll post some links that I found very useful.
If you are new to copulas, perhaps you’d like to start with an introduction to the Gumbel copula in R here.
The package I am going to be using is the copula package, a great tool for using copulas in R. You can easily install it through R-Studio.
The dataset
For the purpose of this example I used a simple dataset of returns for stock x and y (x.txt and y.txt). You can download the dataset by clicking here. The dataset is given merely for the purpose of this example.
First of all we need to load the data and convert it into a matrix format. Optionally one can plot the data. Remember to load the copula package with library(copula)
x <- read.table("x.txt") | |
y <- read.table("y.txt") | |
mat <- matrix(nrow=100,ncol=2) | |
for(i in 1:100) | |
{ | |
mat[i,1] <- x[,1][i] | |
mat[i,2] <- y[,1][i] | |
} | |
# Actual observations | |
plot(mat[,1],mat[,2],main="Returns",xlab="x",ylab="y",col="blue") |
The plot of the data
Now we have our data loaded, we can clearly see that there is some kind of positive correlation.
The next step is the fitting. In order to fit the data we need to choose a copula model. The model should be chose based on the structure of data and other factors. As a first approximation, we may say that our data shows a mild positive correlation therefore a copula which can replicate such mild correlation should be fine. Be aware that you can easily mess up with copula models and this visual approach is not always the best option. Anyway I choose to use a normal copula from the package. The fitting process anyway is identical for the other types of copula.
Let’s fit the data
# Normal copula | |
normal.cop <- normalCopula(dim=2) | |
fit.cop<- fitCopula(normal.cop,pobs(mat),method="ml") | |
# Coefficients | |
rho <- coef(fit.cop) | |
print(rho) |
Note that the data must be fed through the function pobs() which converts the real observations into pseudo observations into the unit square [0,1].
Note also that we are using the “ml” method (maximum likelihood method) however other methods are available such as “itau”.
The parameter of the fitted copula, rho, in our case is equal to 0.7387409. Let’s simulate some pseudo observations
By plotting the pseudo and simulated observations we can see how the simulation with the copula matches the pseudo observations
# Pseudo observations | |
p_obs <- pobs(mat) | |
plot(p_obs[,1],p_obs[,2],main="Pseudo/simulated observations: BLUE/RED",xlab="u",ylab="v",col="blue") | |
# Simulate data | |
set.seed(100) | |
u1 = rCopula(500,normalCopula(coef(fit.cop),dim=2)) | |
points(u1[,1],u1[,2],col="red") |
This particular copula might not be the best since it shows a heavy tail correlation which is not that strong in our data, however it’s a start.
Optionally at the beginning we could have plot the data with the distribution for each random variable as below
# Plot data with histograms | |
xhist <- hist(mat[,1], breaks=30, plot=FALSE) | |
yhist <- hist(mat[,2], breaks=30, plot=FALSE) | |
top <- max(c(xhistcounts, yhistcounts)) | |
xrange <- c(-4,4) | |
yrange <- c(-6,6) | |
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE) | |
par(mar=c(3,3,1,1)) | |
plot(mat[,1], mat[,2], xlim=xrange, ylim=yrange, xlab="", ylab="") | |
par(mar=c(0,3,1,1)) | |
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0) | |
par(mar=c(3,0,1,1)) | |
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE) |
And get this beautiful representation of our original dataset
Now for the useful documentation:
Copula package official documentation:
http://cran.r-project.org/web/packages/copula/copula.pdf
R blogger article on copulas
http://www.r-bloggers.com/copulas-made-easy/
An interesting question on CrossValidated
http://stats.stackexchange.com/questions/90729/generating-values-from-copula-using-copula-package-in-r
A paper on copulas and the copula package
http://www.jstatsoft.org/v21/i04/paper
That’s all for now.