The Beginner Programmer

Sunday, 22 February 2015

A plus: 2 pistons engine, get the timing right!

This is a nice add-on to my previous project: here below is a video where I simulate two pistons working together (moving a common crankshaft for instance). In order to do this effectively, one should calculate how to provide the right timing to each piston. In this example it’s quite easily done just by shifting the position of piston 2 by 180 degrees (or pi radians). A nice exercise would be trying to add more pistons.

Here is the animation on you tube:

Crankshaft connecting rod and piston mechanism simulation with Python

What is a crankshaft connecting rod and piston mechanism? It basically is a mechanical part which converts rotational motion into reciprocating motion. The applications are pretty vast, car engines are one of the most obvious I can recall right now.
It turns out the physics behind this mechanism is pretty interesting and easy to code, therefore I thought I could give it a go and try to make a simulation.
First of all we need to define the problem and solve it:

This is the basics from which we start: the crankshaft (rod ‘a’), the connecting rod (rod ‘b’) and the piston whose position is denoted with c. The motion of rod ‘a’ is a pure rotational motion while the motion of rod ‘b’ is somewhat more complex. The piston is performing a linear motion since it is constrained onto the real axis.
Now you might ask yourself why I am using the imaginary plane… It turns out that you can represent each vector (a,b and c) with a complex number, and this semplifies our problem into a more manageable system of two equations.
By using vector properties we can easily write:

The vector equation above states that the position of c is the sum of a and b as it can easily be seen on the picture. Now, we can get our real and imaginary coordinates in the plane by using Euler’s formula:

Note that every one of these parameters is a function of time and assuming that alpha(t), the length of the rods a and b are known, the system can be solved for c and beta. Furthermore by deriving the original equation we can obtain velocity and acceleration for each time t, since assuming alpha(t) is known then the derivatives of alpha(t) are known too (assuming alpha is derivable two times with respect to t).
Here are velocity and acceleration (vectors), respectively:

Now, for the sake of this example, I am assuming

however that’s not necessary. For instance one could try angular acceleration constant and so on.
Given my assumption,

is known and the initial angle can be assumed later. Now we can solve our first system for

and

. The solution looks something like this:

Ok, so far so good, now that we have the position at each time we just need to translate everything in a language that Python can understand: I used a class, however you can easily avoid using it since it is not really necessary

Once we have coded all what is above, we can create a My_mechanism instance and call the methods, be sure not to call all the methods at once since it will not run them all. Call a method at each time:

Here below are the videos I made using the animations of the mechanism:

Hope this was interesting.

Another Excel spreadsheet: Savings with LED lights

I recently uploaded a new Excel spreadsheet where I made a calculation of how much can be saved by replacing neon tubes with LED tubes.

As you might have noticed, I do like using Excel for calculations too, and I set up a page where some of my little projects and calculations with Excel are shared. Here it is: Excel page.

Tuesday, 17 February 2015

Some exercises with plots and matplotlib on currencies

EDIT: Apparently, some of the prices in the .csv files I used were missing, and this caused some problems with pandas dataframe since it replaces missing values with ‘na’. Should you encounter the same problem you could check every line with an if statement or use a method to replace the na. You can check the pandas’ documentation here.

Yesterday I was really bored and had some time which could be put to good use, therefore I decided to write a quick script to plot percentage change of currencies of some countries and compare them. The code I ended up with is a bit sloppy I guess, but that’s fine since I was primarily interested in improving my (limited, as of now) use of pandas and having fun.

First of all I gathered data from Quandl, namely the prices of the selected currencies in terms of dollars, that’s to say the value per day of every selected currency with respect to the dollar:

XYZ/USD (daily)

Gathering data from Quandl is really easy and fast using the Quandl API for Python. By the way, an API for R is available too.

I then computed the percentage change for 2 years and defined some plotting functions. Here is the main result the plotting function produces given a reference currency: it plots the percentage change for every currency (with respect to the dollar) against the percentage change of the reference currency. Some of the plotted data looks definitely weird, I wonder if I did something wrong or lost some information during the process.

Here is the code I used:

Disclaimer
This article is for educational purpose only. The author is not responsible for any consequence or loss due to inappropriate use. The article may well contain mistakes and errors. The data used might not be accurate. You should never use this article for purposes different from the educational one.

Saturday, 14 February 2015

CopulaClass a Python class for using copulas: a fitting example

As I have already said in my previous post Copulalib is really user-friendly, it is difficult to write something easier, however I thought I might give it a try.

This class is built around Copulalib and since it is to be used with 2-dimensional copulas, it implements plots for data visualization and some other functions such as:
-showAvailableCopulas() a method to show visually what copulas are included in the package
-generateCopula() a method to generate the copula
-printCorrelation() this method prints out Spearman’s rho, Kendall’s tau and the fitted parameter
-getSimulatedData() this method retrieves your simulated data from the copula assuming your original data is normally distributed. It would be nice to implement some tool which could figure out the most likely distribution of your data and then use it to get the simulated observations. Perhaps in the future I’ll do it.

Furthermore, the class does not mind if you feed in python lists of numpy arrays as it turns x and y in numpy arrays. Be careful however that it does not check if your lists/arrays are of the same length.

Anyway, as for the result of the testing script below, the fitting of the Frank copula to the data seems to have been successful, our simulated data seems to fit the real quite nicely:

Originally our data was (very) approximately normally distributed, with some sort of positive correlation as you can clearly see from the plots below

Here below you can see 1000 simulated pseudo-observations from the Frank copula

Below you can find the code I used to generate this simple model:

And here are the correlation details

Copulalib: How to use copulas in Python

When dealing with copulas, R is a better option in my opinion, however, what could you do if you wish to use Python instead? There’s a good starting package called Copulalib which you can easily download here.

The package is really simple to use and very user-friendly I would say, it basically handles everything (pseudo-observations etc…) once you fed in the raw data. There is a simple example of implementation in the download page. Once the data has been fed into the function, the fitting is done automatically and the following parameters are generated:
-Spearman’s rho
-Kendall’s tau
-Theta (the parameter of the copula)

As of my understanding of the package, only Frank, Gumbel and Clayton copulas are available, and this of course could be a limitation, however it is for sure a good start. Another point which is problematic is that multidimensional copulas seem not to be supported.

Now for the real complaints: for some reason once the sample size is larger than 300 observations per variable (say 300 x and 300 y) the script raises an error saying that x and y must be of the same dimensions which is strange since they are already of the same size. Anyway maybe I did something incorrect.

Here below is the short piece of code which generated the plots of the data and of the available copulas

Next I’m going to post a class for copulas.

Tuesday, 10 February 2015

How to fit a copula model in R

I have been working on this topic for a great amount of time and to be honest I find R documentation not that user-friendly as the documentation for most Python modules. Anyway the fact that copulas are not the easiest model to grasp has contributed to further delays too. But mainly the lack of examples and users of these models was the biggest obstacle. Then again, I might have looked in the wrong places, if you have any good resource to suggest please feel free to leave a comment. At the bottom of this page I’ll post some links that I found very useful.

If you are new to copulas, perhaps you’d like to start with an introduction to the Gumbel copula in R here.

The package I am going to be using is the copula package, a great tool for using copulas in R. You can easily install it through R-Studio.

The dataset
For the purpose of this example I used a simple dataset of returns for stock x and y (x.txt and y.txt). You can download the dataset by clicking here. The dataset is given merely for the purpose of this example.

First of all we need to load the data and convert it into a matrix format. Optionally one can plot the data. Remember to load the copula package with library(copula)

The plot of the data

Now we have our data loaded, we can clearly see that there is some kind of positive correlation.

The next step is the fitting. In order to fit the data we need to choose a copula model. The model should be chose based on the structure of data and other factors. As a first approximation, we may say that our data shows a mild positive correlation therefore a copula which can replicate such mild correlation should be fine. Be aware that you can easily mess up with copula models and this visual approach is not always the best option. Anyway I choose to use a normal copula from the package. The fitting process anyway is identical for the other types of copula.

Let’s fit the data

Note that the data must be fed through the function pobs() which converts the real observations into pseudo observations into the unit square [0,1].
Note also that we are using the “ml” method (maximum likelihood method) however other methods are available such as “itau”.

The parameter of the fitted copula, rho, in our case is equal to 0.7387409. Let’s simulate some pseudo observations

By plotting the pseudo and simulated observations we can see how the simulation with the copula matches the pseudo observations

This particular copula might not be the best since it shows a heavy tail correlation which is not that strong in our data, however it’s a start.

Optionally at the beginning we could have plot the data with the distribution for each random variable as below

And get this beautiful representation of our original dataset

Now for the useful documentation:

Copula package official documentation:
http://cran.r-project.org/web/packages/copula/copula.pdf

R blogger article on copulas
http://www.r-bloggers.com/copulas-made-easy/

An interesting question on CrossValidated
http://stats.stackexchange.com/questions/90729/generating-values-from-copula-using-copula-package-in-r

A paper on copulas and the copula package
http://www.jstatsoft.org/v21/i04/paper

That’s all for now.

Friday, 30 January 2015

How to estimate probability density function from sample data with Python

Suppose you have a sample of your data, maybe even a large sample, and you want to draw some conclusions based on its probability density function. Well, assuming the data is normally distributed, a basic thing to do is to estimate mean and standard deviation, since to fit a normal distribution those two are the only parameters you need.

However, sometimes you might not be that happy with the fitting results. That might be because your sample does not looks exactly bell shaped and you are wondering what would happened if the simulation you ran had taken this fact into account.

For instance, take stock returns, we know they are not normally distributed furthermore there is the “fat tails” problem to take into account. At least it would be interesting estimate a probability density function and then compare it to the parametric pdf you used before.

Here below is a simple example of how I estimated the pdf of a random variable using gaussian_kde from scipy

and here is the plot that we get

What we can see is that the estimated pdf is “more dense” around the mean and has some more density on the tails.

Let’s use this data to simulate a sample

For sure in this randomly generated sample there are some extreme values and they look close to the actual sample.

Hope this was interesting and useful.

Disclaimer
This article is for educational purpose only. The author is not responsible for any consequence or loss due to inappropriate use. The article may well contain mistakes and errors. The numbers used might not be accurate. You should never use this article for purposes different from the educational one.

Thursday, 29 January 2015

Image scaling

This is a script that has been resting on my desktop for months. Basically I would like to take my knowledge of image recognition with machine learning a step further and this is a simple step towards that direction.

The basic idea behind the script is to convert the image in a 0 and 1 format which is easier to process with machine learning algorithms. After I wrote the script, I found out there is a function in sklearn that does something similar, anyway I thought I could share the piece of code I wrote, perhaps it could be useful to someone.

This the image I started with

By using the following script we can convert the image in a more manageable format

This is the result we obtain, A and B respectively

Only numbers between 0 and 1 are used in the picture A above while the picture below uses only 0s and 1.

For sure we loose some information, however this is just a raw script, perhaps I can do better in the future. However, this is a rather complicated image (not to mention the fact that it’s the only sample I have got) the script should yield better results with simpler images and shapes.

Portfolio VaR with Python

After some posts on correlation (How to build a correlation matrix in Python) and variance (How to build a variance-covariance matrix in Python) today I’m posting an example of application: portfolio VaR. Please before you continue reading the article, make sure to read and understand the disclaimer at the bottom of the page. Thank you.

Portfolio VaR is the natural extension of the VaR risk indicator to a portfolio of stocks. The implementation is a little bit harder than the one or the two stock version since it involves calculations with matrices. If you are interested to get a first grasp on VaR you can check my first implementation in R here.

In this script I try to estimate parametric VaR of a portfolio of stocks using historical data.

A brief outline of the process I followed:

1)First I downloaded data from Quandl (they are a great source of free data by the way), then I reshaped the data for each stock into a .csv file to make it resemble the following pattern:

ticker,average return, standard deviation,initial price, R1,R2,R3,…,Rn

where Ri stands for Rth return and initial price is the most recent price.
For my analysis I used simple returns calculation, however it should be more appropriate to use logarithmic returns if possible. Next time I’ll probably be using log returns.

2) The script contains 9 functions:

2 functions are the “builder functions” (fillDicts and runSimulation) and they are used to load the data into the program

2 functions (correlationMatrix and varCovarMatrix) are auxiliary functions since they are used in the VaR function to estimate VaR.

A plot function has been added to plot (if needed) the simulation of future returns by the runSimulation function. The model used to simulate future prices is fairly simple and can be found here.

A function (portfolioExpectedReturn) to calculate portfolio expected returns based on historical data

And finally, two functions (simple_optimise_return and optimise_risk_return) to optimise the portfolio for high returns and the risk/return ratio, respectively.

3)Some preliminary notes and some assumptions.
The model is based on parametric VaR, therefore it is assuming that returns are normally distributed like a smooth Bell curve. Furthermore, the VaR is calculated for an holding period of 1 day.

For the sake of simplicity I calculated 99% VaR for each example and ran the optimisation functions for a portfolio of 10 stocks. The optimisation functions operate as follow:

simple_optimise_return yields a portfolio of n stocks with the highest average daily return in the sample while optimise_risk_return yields the 10 stocks with higher average return to standard deviation ratio. The idea behind the second optimisation is that you may want to get the highest return possible for unit of risk that you are bearing. Note that I am assuming that the total invested amount is equally distributed among the 10 stocks (10% of the total amount is invested in each stock).

While keeping in mind that the past usually does not predict with great accuracy the future, this script let me grasp an insight of what could have (possibly) been a good portfolio of stocks in the last years.

A sidenote: since this is just for fun, some data is actually missing and some stocks have longer time series since they were on the market long before than others.

Another sidenote: a great improvement to this script could be the calculation of the advantages (in terms of less risk) of diversification and optimisation for correlation (maximum and minimum correlation for instance)

Now I run the program

And after some computations here is the result that we get:

The results look interesting to me, especially since the VaR of the last portfolio is really low if compared with the first portfolio I used to test the VaR function and the non-weighted return seems pretty good too.

Disclaimer
This article is for educational purpose only. The author is not responsible for any consequence or loss due to inappropriate use. The article may well contain mistakes and errors. The numbers used might not be accurate. You should never use this article for purposes different from the educational one.

Tuesday, 27 January 2015

How to build a variance-covariance matrix in Python

Recently I wrote a script to calculate the VaR of a portfolio of stocks given historical prices and returns and, in order to do that, I had to study the basics of a variance-covariance matrix.

First of all, what is it?
The formal definition should sound something like this: a variance-covariance matrix is a matrix whose element in the i,j position is the covariance between the ith and jth elements of a vector of random variables. Informally, we may say that a variance-covariance matrix is the matrix of the covariances and since the covariance of a random variable with itself is its variance, the main diagonal of the matrix is filled with the variances of the random variables (hence the fancy name).

What is it useful for?
When calculating VaR, say for a single stock, you need to gather the standard deviation of the returns of your stock. However, when calculating the VaR of a portfolio, things get pretty messy pretty quick, since you cannot simply add or subtract variances. In a more intuitive way, we may say that the variance-covariance matrix generalizes the notion of variance to multiple dimensions.

So how can we build it in Python?
Here is a simple template of how I built mine. I used only two stocks, but in the script I talked about earlier I used 500 stocks, you can easily imagine what a mess it can be if you miss some numbers.

Before showing the code, let’s take a quick look at relationships between variance, standard deviation and covariance:

Standard deviation is the square root of the variance

Covariance can be obtained given correlation (check how to build a correlation matrix) and standard deviations.

Now we can look at the script:

And here is the output:

Hope this was interesting.

Monday, 26 January 2015

How to build a correlation matrix in Python

Suppose you have an array of data, stocks returns for instance, and you are wondering whether there is some degree of correlation. If you are using Python then it is easy to plug in your data and do some simple calculations.

Here is a simple example of a correlation matrix using Python. This matrix shows the correlation between 10 variables (10x10 matrix): as you can see, the correlation of a variable with itself is 1 as expected.

By entering more variables into the calculation the matrix becomes more difficult to interpret although it remains quite nice to visualize. Here below is an example of a 50x50 matrix.

The code I used is really simple, you can find it here below. Basically I built a random array with 10/50 data points and then used the numpy function “corrcoef” which handles all the troubles of building a matrix in an elegant way.

Thursday, 8 January 2015

Some plots with matplotlib

I recently updated matplotlib and could finally use the style attribute from the updated module. I thought it might be a good time to plot some data and try it out.

In order to get a grasp on how some emerging markets have performed so far, I thought I could plot some data on the real GDP (GDP adjusted for inflation) and perhaps compare it to a big part of the EU’s economy (Germany).

The data I used is from Quandl a great source of data you should check it out https://www.quandl.com/

Note: dotted lines indicates that the data is prospective (in this case obtained through linear regression).

Hope this was interesting

Sunday, 28 December 2014

Handwritten number recognition with Python (Machine Learning)

Here I am again with Machine Learning! This time I’ve achieved a great result though (for me at least!). By using another great dataset from UCI I was able to write a decent ML script which scored 95% in the testing part! I am really satisfied with the result.

Here is a sample of what the script should be able to read (in the example the number 9):

Some numbers, as the one above, were clear, others not so clear, since they were handwritten and then somehow (I do not know how) converted into digital images.

I had a hard time figuring out how the attributes in the dataset were coded but in the end I managed to figure it out! Sorriso I guess making up such a dataset was a really long and boring work.

Anyway here is my script and below you can find the result of the test on the last 50 numbers or so.

This time I got 89% success rate! Pretty good I guess! I wonder whether I could train Python to recognize other things, maybe faces or other! Well first of all I have to figure out how to convert a picture into readable numpy arrays. Readable for Python of course!! If you have any suggestion please do leave a comment! Sorriso

Here below is the citation of the source where I found the dataset “Semeion Handwritten Digits Data Set”:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

and

Semeion Research Center of Sciences of Communication, via Sersale 117, 00128 Rome, Italy
Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy.

Hope this was interesting!

Poker hands recognition (Machine Learning)

A few months ago I downloaded the scikit-learn package for Python, for those of you who might not be aware, scikit-learn is a powerful yet very simple package useful to apply machine learning. Basically they give you “all” the algorithms you may need and you “only” have to get the data and make it ready to feed into the scikit-learn algorithm.

Anyway, I only recently had time to check it out and write some code, furthermore only recently I found a great site full of sample datasets dedicated to machine learning (check it out here). Since the data from this great site is more or less in the right shape for being ready to import (.txt files or similar) the only task left to the programmer is to put it into numpy arrays.

This is one of my first attempt at generating a “real” machine learning program.

I won’t post the pre-settings script since it is pretty boring, instead I’ll briefly describe the pre-setting: basically, using a simple script I’ve splitted the training samples from the database in two .txt files trainingP.txt and target.txt respectively. TrainingP.txt contains the questions (the hands of poker) and target.txt contains the answers (=the score, that’s to say poker, full house etc… coded into numbers according to the description given in the database description file).

Below is the ml script: it is composed of 3 parts: setting, training and testing

setting: import the training sets and fit them into numpy arrays of the correct shape

training: fit the data into the model using scikit-learn

testing: test the algorithm and check what it has learned so far! Some statistics are printed, for reference. Check the results below!

So far the best score is 56.4%, I wonder if I did everything correctly! Anyway soon I will post a script with better results and achievements!

Below is the citation of the source of the database, according to their citation policy.

The name of the dataset is Poker Hand and it is from:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Pages