Sunday, 28 December 2014

Handwritten number recognition with Python (Machine Learning)

Here I am again with Machine Learning! This time I’ve achieved a great result though (for me at least!). By using another great dataset from UCI I was able to write a decent ML script which scored 95% in the testing part! I am really satisfied with the result.

Here is a sample of what the script should be able to read (in the example the number 9):


Some numbers, as the one above, were clear, others not so clear, since they were handwritten and then somehow (I do not know how) converted into digital images.

I had a hard time figuring out how the attributes in the dataset were coded but in the end I managed to figure it out! SorrisoI guess making up such a dataset was a really long and boring work.

Anyway here is my script and below you can find the result of the test on the last 50 numbers or so.

This time I got 89% success rate! Pretty good I guess! I wonder whether I could train Python to recognize other things, maybe faces or other! Well first of all I have to figure out how to convert a picture into readable numpy arrays. Readable for Python of course!! If you have any suggestion please do leave a comment! Sorriso

Here below is the citation of the source where I found the dataset “Semeion Handwritten Digits Data Set”:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science.


Semeion Research Center of Sciences of Communication, via Sersale 117, 00128 Rome, Italy
Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy.


Hope this was interesting!

Poker hands recognition (Machine Learning)

A few months ago I downloaded the scikit-learn package for Python, for those of you who might not be aware, scikit-learn is a powerful yet very simple package useful to apply machine learning. Basically they give you “all” the algorithms you may need and you “only” have to get the data and make it ready to feed into the scikit-learn algorithm.

Anyway, I only recently had time to check it out and write some code, furthermore only recently I found a great site full of sample datasets dedicated to machine learning (check it out here). Since the data from this great site is more or less in the right shape for being ready to import (.txt files or similar) the only task left to the programmer is to put it into numpy arrays.

This is one of my first attempt at generating a “real” machine learning program.

I won’t post the pre-settings script since it is pretty boring, instead I’ll briefly describe the pre-setting: basically, using a simple script I’ve splitted the training samples from the database in two .txt files trainingP.txt and target.txt respectively. TrainingP.txt contains the questions (the hands of poker) and target.txt contains the answers (=the score, that’s to say poker, full house etc… coded into numbers according to the description given in the database description file).

Below is the ml script: it is composed of 3 parts: setting, training and testing

setting: import the training sets and fit them into numpy arrays of the correct shape

training: fit the data into the model using scikit-learn

testing: test the algorithm and check what it has learned so far! Some statistics are printed, for reference. Check the results below!

So far the best score is 56.4%, I wonder if I did everything correctly! Anyway soon I will post a script with better results and achievements!

Below is the citation of the source of the database, according to their citation policy.

The name of the dataset is Poker Hand and it is from:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository[]. Irvine, CA: University of California, School of Information and Computer Science.

Friday, 26 December 2014

Fluid dynamics: pressure drop modelling

Likely the last post of the year, on a rather intriguing subject: fluid dynamics.

Recently I’ve been asked to model a simulation for a series of pipes through which water flows out of a reservoir. The task seems quite easy at first, however many factors must be taken into account. Pressure drop due to friction is one of the things that need to be addressed. Therefore, I documented myself a little on the subject and developed a simple model to take friction into account. Pressure drop depends both by the speed of the water and the pipes diameter.

Here is a short piece of code to calculate pressure drop. Note that we are using the average speed of the water. For each average speed we can draw a line that shows the pressure drop versus the pipe’s diameter.


Considering the fact that this was my first attempt I think the results are pretty good when compared with an actual diagram such as this (note that on this graph you’ll need to multiply pressure drop by 10 to compare pears to pears and apples to apples):


Here is the code I used. I should mention, I had a hard time finding out how to do this and furthermore I couldn’t find a comprehensive piece lecture where fluid dynamics is analysed in depth. Online there is a lot of material on Bernoulli’s equation and Torricelli’s special case but it is definitely harder to find out how to model real fluids, friction and fluid dynamics in general. Hope someone will upload some good material.

Saturday, 20 December 2014

And here comes the whole Solar System! ;)

Modelling Sun, Earth and Moon only? Naah that’s too naive. Let’s add some other planets! Occhiolino
The code on my previous article was rather ugly. It did not follow an OOP style and it was mostly non re-usable. Aside from the content of the piece of code I posted in this article, I’m quite satisfied with how I set it: a class for planets and a class to handle time. Sure, perhaps this is not the smartest way of using a class, however for someone like me, who programs mainly as a hobby, that’s fine I guess.
This piece of code models and simulates the whole Solar System (without the Sun, you can add it though!) and gives you, aside from the code, a hint on how slow/fast planets are orbiting around the Sun. note that I use large intervals of time (10000 seconds or similar). This model for sure can be improved. For instance the colours could be improved and, jokes aside, perhaps elliptical orbits could be implemented instead of circular. To do that I still need to work on my knowledge of gravity and its implementation in Python.
solar sys

Anyway I’ll keep you posted. That’s it for now, here’s the code of the model and the YouTube video I made (but couldn’t load in the post, not sure why)

[EDIT]: I've finally made it and load the video here too! Enjoy

Earth Moon system orbiting around the Sun and VPython

Hello everyone! A few days ago my friend, as a joke, bet me to reproduce the Moon orbiting around the Earth. Well, I was a little afraid the model could be a little slow using matplotlib and its animation functions. Fortunately I found out VPython, a great 3D package: click here to get to their homepage. VPython, in my opinion, provides one of the simplest yet most brilliant package for 3D graphics. It’s shockingly easy to learn: it took less than an hour for me to learn the basics and start doing something interesting. Furthermore if you’re used to vectors and working with them, VPython provides simple functions for vectors calculations.
Check the model below (Earth and Moon orbiting around the Sun). I also made a video:

Hope this was entertaining! Next comes the whole Solar System!

Wednesday, 17 December 2014

Gini coefficient, concentration measurement: an implementation in R

Another subject we took in the statistics class was the Gini index.

Gini index or ratio or coefficient is used to calculate how much a certain transferable phenomenon such as income or stocks for instance, is concentrated.

For example, say you are evaluating a company and you’d like to know more about how the shares are divided among the shareholders. You could use Gini index for that!

I’ve calculated the index using R and random data you can download here. In case you’d like to know more about Gini index check here.


Here my simple R implementation of the index.

Here below are the results



It looks like the data I used shows a 24% concentration. Cool!

Pearson’s chi-squared test: a simple implementation in R (test of independence)

Hi everyone! Today I found my old statistics workbooks and start wondering what I could get out of them.

Statistics can look pretty boring when using only pen and paper, since many times you’re just making a lot of repetitive calculations. However, the results of those calculations might of course be interesting.

Person’s chi-squared test is a simple test, as my professor put it, one of the first tests you should be performing when analysing a double entry table. You might be asking yourself why. Well, the answer is that this test looks for connection between the two variables in the table. As you might know connection is different from dependence. Dependence is of course a stronger bond and kind of a “one way bond” whilst connection is sort of a “double way bond”. If there’s no or little connection, then you might want to change variables in play since there is nothing but little interest in performing further tests on the same set of data.

Check on Wikipedia for more information on the theory.

Here is the R code I used to implement the test on the raw data at the bottom of the page.

Below is the output, it seems there’s a feeble connection between the two phenomena we studied.


The data I used for the simulation are available to be downloaded here.

Saturday, 13 December 2014

Here again Java

Java, Java and Java again! Here is my last script in Java, which I wrote today.

The script is quite simple, it is composed of three classes and lets you handle some basic calculations with lines and parabolas on the real plane and calculate the area under the curve (definite integrals).

Optionally you can plot the data.


Here is the main class with a simple example:

And the integral class which handles all the background operations

Optionally, should you want to output something to a .txt file, here is the class that can help in doing that:

Bye until the next project! Occhiolino

Thursday, 11 December 2014

Java again, in English this time though! ;)

As i said earlier in my article in German, I studied Java only once in a while, since I had so many other things to do and study. Anyway here is another bit of code I’ve written in order to practice what I’ve learnt.

This script is a simple program to keep track of your inventory. Frankly speaking, an excel spreadsheet would have been faster and easier but where’s the fun in it? Occhiolino

There are three classes: the main class, the products class and the inventory class. In the products class lies all the code related to the object “product” while in the inventory class you can find all the code needed to run the inventory. Finally the main class executes the main program and (as it should be I guess).

For the sole purpose of exercising myself with Java, I tried to generate a inventory of food with three items. However you can easily add as many inventories as you like and figure out a way to speed up the process of adding items to each inventory.

Perhaps in the future I’ll add the option to print out a bar chart or something similar as soon as I have time to study Java. As for now, I find it quite interesting and demanding since it asks every time for the type of the data you are going to work with and the arrays are different from python’s lists. That can be a small issue for someone who got used to Python as myself. Furthermore Java libraries are hugely vast and that can be overwhelming at the beginning. Nonetheless I hope to get better at it soon! Sorriso

Here is the products class

public class Products 
private String name;
private int id;
private int quantity;
private double price;

//Constructor of the class
public Products(String name,int id, int quantity, double d)
{ = name; = id;
this.quantity = quantity;
this.price = d;

//Quantity setter and update method
public void changeQuantity(int q,String s)
this.quantity += q;
}else if(s.equals("subtract"))
this.quantity -= q;
System.out.println("Warehouse stock updated succesfully!");

//Price setter
public void changePrice(double p)
this.price = p;
System.out.println("Price changed succesfully!");

//Get all the info on the product
public void getInfoProduct()
System.out.println("As for product "" with id: ";
System.out.println("Quantity available: "+this.quantity);
System.out.println("Price "+this.price);
System.out.println("Total value in stock: " + this.getTotalValue());

//This function returns the total value of the stock for the product
public double getTotalValue()
double totalValue = this.price*this.quantity;
return totalValue;


The main class

public class mainClass 

public static void main(String[] args)
//we create the products
Products bread = new Products("Bread",0,10,0.5);
Products ooil = new Products("Olive oil",1,20,4.00);
Products oranges = new Products("Oranges",2,5,2.50);

//we create the inventory and add products to it
Inventory inventory1 = new Inventory("Inventory 1");

//We print info on the products

//And the total value of the inventory
System.out.println("\nTotal value of the inventory is "+inventory1.getInventoryValue());

Finally, the inventory class

//Dynamics arrays. I found them similar to those in Python
import java.util.ArrayList;
import java.util.List;

public class Inventory
private String name;
//Products[] productsInStock = new Products[]{};
List<Products> productsInStock = new ArrayList<Products>();
private int totalItems = 0;

//Constructor of the class
public Inventory(String name)
{ = name;

//Add a product object to the inventory array
public void addProduct(Products p)
this.totalItems += 1;
System.out.println("Product added to inventory "" succesfully!");

//Get the total value of the inventory
public double getInventoryValue()
double valueToReturn=0;
int lenAr = this.totalItems;
for(int i=0;i<lenAr;i++)
valueToReturn += this.productsInStock.get(i).getTotalValue();
return valueToReturn;

Here below is the output

Product added to inventory Inventory 1 succesfully!
Product added to inventory Inventory 1 succesfully!
Product added to inventory Inventory 1 succesfully!
As for product Bread with id: 0
Quantity available: 10
Price 0.5
Total value in stock: 5.0
As for product Olive oil with id: 1
Quantity available: 20
Price 4.0
Total value in stock: 80.0
As for product Oranges with id: 2
Quantity available: 5
Price 2.5
Total value in stock: 12.5

Total value of the inventory is 97.5

Hope this was interesting.

Es ist endlich Zeit mit Java zu spielen

Heute hatte ich Lust ein bisschen zum Thema Java zu schreiben, und der Artikel auf Deutsch zu schreiben um meine Kenntnisse der deutschen Sprache zu verbessern.

Erst muss ich mich entschuldigen, weil es im Artikel Fehler seien könnten.
Ich studiere Deutsch seit zwei Jahren als Hobby, aber ich weiss noch nicht, wenn mein Deutsch gut genug ist, um einen kleinen Artikel zu Schreiben. Auf jeden Fall, wollte ich es ausprobieren, weil man sich nur mit Praxis verbessert.

Obwohl man in einem kleinen Artikel wie meiner nicht so viele Wörter braucht, kann es schwierig sein gut Deutsch zu schreiben. Aber ich denke, dass die komplizierte Grammatik der deutsche Sprache ein sehr attraktiver Aspekt sein kann.

Ich habe genug über die Sprache geredet, jetzt reden wir über Java.

Ich bin kein Experte von Java, ich habe seit dem letzten Sommer Java studiert aber nicht jeden Tag sondern nur ab und zu, wenn Zeit da war. Jedenfalls ist hier eines der ersten Projekte. Es ist ein sehr kurzes Skript, das euch eine Grafik zeigen kann.

Um das Skript zu machen, habe ich Eclipse benutzt. Auf dieser website kann man Eclipse finden.

Hier ist das Skript

Das Skript benutzt das Class “UniformlyAcceleratedMotion”, das hier unter ist:

Hier ist die Graphik:


Ich hoffe, dass dieser Artikel lustig war. Auf wiedersehen, bis zum nächsten Mal.
Vielen Dank!

Friday, 5 December 2014

A floating ball dropped in a water current

While writing the previous script on simple harmonic motion I found a funny simulation.
Here it is, do you remember what happens when you drop a ball in flowing water? (Actually the fluid simulated looks something more dense than water).

Basic physics and Python: simple harmonic motion

Here is simple harmonic motion simulation with a spring and a bouncing ball.
Springs are a classic example of harmonic motion, on Wikipedia you can get a grasp of the basics. Among other assumption, in my simulation I’ve assumed an ideal spring and that there is no friction (and therefore the motion will not stop by itself) however, if you like, you can implement friction easily.

Here is the spring simulation

Below is the bouncing ball:

A simpler example of simple harmonic motion with a spring (video):

Hope this was interesting.

Thursday, 4 December 2014

Animated graphs with matplotlib

Recently I have had not so much time to dedicate to my blog, however, today I had some spare time and decided to learn how to code animated graphs with Python and matplotlib.
Apparently it’s really simple, however a little bit of practice is needed, here below are three pieces of code where I coded and plotted the graph of (many) branch of parabolas, a random walk, and a simulated stock (a random walk again!).
Here is the half parabolas

Below is the random walk and the video:

video of the random walk:

and finally the stock simulation:

Hope this was interesting Sorriso.

Saturday, 18 October 2014

New version and bug fixes for LED lights savings calculator application

Hi everyone, I found some naive bugs and some improvements which I could add to my application, therefore here they are:

Bug fixed:
-A bug which did not let the user calculate savings in case the plant was fully financed and yearly net savings were negative
-Other minor bugs

Functionalities added:
-Now you can select the number of yearly payments should you decide to ask for a full financing for your LED lights. You can leave this field blank, in that case a default number of payments is calculated according to costs and annual energy savings.

Click here to download the improved version.

Hope this was useful.

Monday, 13 October 2014

A simple program to calculate LED lights savings


EDIT: For the most updated application, please check the following page:
New version and bug fixes for LED lights savings calculator application

A week ago, a friend asked for a simple program which could help him calculate savings consumptions fast and give a glance at the overall picture.

If you did not know, by replacing your old bulbs and lamps with LED lights you can save up to 90% depending on the replaced lamp. For instance, if you replace neon with LED, savings are at least 45%, however they might be higher due to neon maintenance and hidden consumptions. If you replace incandescent bulbs with LED you save at least 80%.

Although initial price may be higher, LED lamps will surely pay off in the long run, since they are expected to last 50.000 hours!

This simple program lets you enter the data on your actual lights and the LED, then it calculates:
-Savings at each year, for 10 years
-Amortization time if the lamps are paid in one shot
-Financing of the LED lamps with a basic constant pay-out mortgage assuming a 5% interest rate
-Immediate savings if the LED lamps are financed

Click here for the python source code. Again, the program is pretty simple since only a sketch is needed. It surely can be improved. If you have any suggestion please let me know.

Here are some screenshots and a calculation example

Immagine 2

The result of a simple replacement: 200 neon tubes with 200 LED tubes (assuming a cost of 26 euro/pc for LED tube.

Looks really like LEDs are the lights of the future.

Note: some data may be incorrect in your country (for example energy cost, tube cost, etc..), if you should ever use this program please check each data you input.

Sunday, 5 October 2014

A simple approximating algorithm for Financial Mathematics

Today while I was applying some of my knowledge of Financial Mathematics, I came across a weird problem. Ok I guess that’s not that weird after all, however I did not find at first, a formula or some trick to get to my goal and therefore I decided to use a simple approximating algorithm.

Say you have some data on a fixed-rate mortgage, a really basic mortgage where both the interest rate and the annual payment are fixed. By the way, if you’d like to know more on these mortgage check the wikipedia page here.

Apparently, the expression used to determine the annual payment, given the initial conditions, should be the following:




Now, suppose that you have everything, the annual constant payment (R), the initial capital (C0), the number of years (n) and you want to find the interest rate applied (i).

At first, it may appear difficult to deal with this problem analytically, so my first idea to get around this was the following: first, define k as the ratio of the initial capital to the annual payment C0/R, then rearrange the equation in terms of k as follows


Now the problem sums up to this: find the i that satisfies the following system of equations


Eventually, here is an algorithm in python to solve the system above

And here is the final plot


Hope this was interesting. Here is Wolfram Alpha’s answer for your reference. For some reason it outputs 0.079 while I get 0.0724 which was the random rate I used to build this simple example. Perhaps some mistake occurred. If you find out please let me know.

This article is for educational purpose only. The numbers are invented. The article may well contain mistakes and errors. You should never use this article for purposes different from the educational one. The author is not responsible for any consequence or loss due to inappropriate use.

Saturday, 27 September 2014

Approximating differential equation with Euler’s method

It looks like it’s time for differential equations.

So, let’s say that you have a differential equation like the one below and you are striving trying to find an analytical solution. Chill out, maybe take a walk outside or ride your bike and calm down: sometimes an analytical solution does not exist. Some differential equations simply don’t have an analytical solution and might drive you mad trying to find it.

Fortunately, there is a method to find the answer you may need, or rather, an approximation of it. This approximation sometimes can be enough of a good answer.

While there are different methods to approximate a solution, I find Euler’s method quite easy to use and implement in Python. Furthermore, this method has just been explained by Sal from Khan Academy in a video on YouTube which is a great explanation in my opinion. You can find it here.

Here below is the differential equation we will be using:


and the solution (in this case it exists)


Here is the Python code algorithm. Note that the smaller the step (h), the better the approximation (although it may take longer).

Here are some approximations using different values of h:




Hope this was interesting.

Sunday, 21 September 2014

Generate slope fields in R and Python

Here is a short post on how to generate a quick slope field in R and Python.

If you do not know what a slope field is, well I am not the best person to explain it to you since it is a relative new concept for me as well. From what I’ve understood on the subject, a slope field is a graphical way to sort out a first-order differential equation and get a grasp at the solution without having to solve it analytically.

As the Wikipedia’s page says, “it may be used to qualitatively visualize solutions, or to numerically approximate them.”

In general, I feel safe saying that a slope field is some sort of a graphical approach to a differential equation.

Say you have the following differential equation:


drawing the slope field would look something like this:
In Python (without arrows)

and in R (with arrows, x=f and y=h)

Of course these plots are just very quick and can be improved.

Here is the Python code I used to draw them.

And the R code

Here is a beautiful slope field for the following differential equation:


In Python
x y

If you need a quick tool for drawing slope fields, this online resource is good, click here.

Hope this was interesting.

Tuesday, 9 September 2014

Multivariable gradient descent

This article is a follow up of the following:
Gradient descent algorithm

Here below you can find the multivariable, (2 variables version) of the gradient descent algorithm. You could easily add more variables. For sake of simplicity and for making it more intuitive I decided to post the 2 variables case. In fact, it would be quite challenging to plot functions with more than 2 arguments.

Say you have the function f(x,y) = x**2 + y**2 –2*x*y plotted below (check the bottom of the page for the code to plot the function in R):


Well in this case, we need to calculate two thetas in order to find the point (theta,theta1) such that f(theta,theta1) = minimum.

Here is the simple algorithm in Python to do this:

This function though is really well behaved, in fact, it has a minimum each time x = y. Furthermore, it has not got many different local minimum which could have been a problem. For instance, the function here below would have been harder to deal with.


Finally, note that the function I used in my example is again, convex.
For more information on gradient descent check out the wikipedia page here.
Hope this was useful and interesting.

R code to plot the function

Gradient descent algorithm

Today I’m going to post a simple Python implementation of gradient descent, a first-order optimization algorithm. In Machine Learning this technique is pretty useful to find the values of the parameters you need. To do this I will use the module sympy, but you can also do it manually, if you do not have it.

The idea behind it is pretty simple. Imagine you have a function f(x) = x^2 and you want to find the value of x, let’s call it theta, such that f(theta) is the minimum value that f can assume. By iterating the following process a sufficient number of times, you can obtain the desired value of theta:




Now, for this method to work, and theta to converge to a value, some conditions must be met, namely:
-The function f must be convex
-The value of alpha must not be too large or too small, since in the first case you’ll end up with the value of theta diverging and in the latter you’ll approach the desired value really slowly
-Depending on the function f, the value of alpha can change significantly.

Note that, if the function has local minimums and not just an absolute minimum, this optimization algorithm may well end “trapped” in a local minimum and not find your desired global minimum. For the sake of argument, suppose the function below goes to infinity as x gets bigger and that the global minimum is somewhat near 1. With the algorithm presented here, you may well end up with x = –0.68 or something like that as an answer when you are looking roughly for x = 0.95.


In this case of course, it is trivial to find out the value and you don’t even need derivatives. However, for a different function it may not be that easy, furthermore for multivariable functions it may be even harder (in the next article I will cover multivariable functions).

Here is the Python code of my implementation of the algorithm

Hope this was interesting and useful.

Thursday, 4 September 2014

Generate words through the use of Markov chains

In general, computer programs are believed to be bad at being creative and doing tasks which require “a human mind”. Dealing with the meaning of text, words and sentences is one of these tasks. That’s not always the case. For instance sentiment analysis is a branch of Machine Learning where computer programs try to convey the overall feeling coming from tons of articles.

But here we are talking a lot more simpler: by using Markov chains and some statistics, I developed a simple computer program which generates words. The model works as follow:
As a first step, the program is fed with a long text in the selected language. The longer the text, the better. Next, the program analyses each word and gets the probability distributions for the following:
-Length of the word
-First character (or letter)
-Character (or letter) next to a given one
-Last character (or letter)

Then, once gathered all this data, the following function is run in order to generate words:


Each part is then glued together and returned by the function.

On average, 10% of generated words are English (or French, German or Italian) real words or prepositions or some kind of real sequence of characters.

The most interesting aspect, in my opinion, is the fact that by shifting the language of the text fed into the program, one can clearly see how the sequences of characters change dramatically, for instance, by feeding English, the program will be more likely to write “th” in words, by using German words will often contain “ge” and by using Italian words will often end by a vowel.

Here is the code. Note that in order to speed up the word checking process, I had to use the NLTK package which is availably only in Python 2. To avoid the use of Python 2 you could check each word using urllib and an online dictionary by parsing the web page but this way is tediously slow. It would take about 15 minutes to check 1000 words. By using NLTK you can speed up the checking process.

Hope this was interesting.

Saturday, 30 August 2014

Markets, stocks simulations and Markov chains

This article is some sort of continuation from this one.

Our previous model for stock simulations did not take in account the following idea:
when a stock (or the market) is going up, then it should be (intuitively) at least, more likely that it will continue to go up. Or at the very least, as it is the case for a football game, it does not feel right to believe that the probability of either of the two possible outcomes is exactly 50%.

The idea behind Markov chains is really versatile, we can apply it also to the markets.
With a “bit” of study (I’m being sarcastic here), you can come up with something pretty complicated like this, however, the model I’m going to show here is much more naive and easier.

Suppose a Markov chain with two states, market up and market down. Once you found the probabilities of each state, you can easily simulate a random walk (based on a Markov chain of course).

Here is the code for this model:

The graphs below represent respectively, 2, 200 and 500 random paths.

2 random walks

200 random walks


500 random walks


Hope this was interesting.

This article is for educational purpose only. The numbers are invented. The author is not responsible for any consequence or loss due to inappropriate use. It may contain mistakes and errors. You should never use this article for purposes different from the educational one.

Monday, 25 August 2014

A first really shy approach to Machine Learning using Python

The day before yesterday I came across Machine Learning: WOW… I got stuck at my pc for an hour wondering about and watching the real applications of this great subject.

I was getting really excited! Then, after an inspiring vide on YouTube, I decided it was time to act. My fingers wanted desperately to type some “smart” code so I decided to write a program which could  recognize the language into which a given text is written.

I do not know if this is actually a very primitive kind of Machine Learning program (I somehow doubt it) therefore I apologize to all those who know more on the subject but let me dream for nowSorriso.

Remember the article on letter frequency distribution across different languages?? Back then I knew it would be useful again (although I did not know for what)!! If you would like to check it out or refresh your memory, here it is.

Name of the program: Match text to language

This simple program aims to be an algorithm able to distinguish
written text by recognizing what language a text
was written in.

The underlying hypothesis of this model are the following:
1. Each language has a given characters distribution which is different from the others. Characters distributions are generated by choosing randomly Wikipedia pages in each language.
2. Shorter sentences are more likely to contain common words that uncommon one.

The first approach to build a program able to do such a task was to build a character distribution for each of the languages used using the code in the frequency article. Next, given a string, (sentence) the program should be able to guess the language by comparing the characters distribution in the sentence with the actual distributions of the languages.

This approach, for sentences longer than 400 characters seems to work fine. However, if the sentence were to be shorter than 400 characters, a mismatch might occur. In order to avoid this, I have devised a naive approach: the shorter the sentence, the more likely the words in it are the most common. Therefore, for each language,a list of 50 most common words has been loaded and is used to double check the first guess based on the character frequency only in case the length of the sentence is less than a given number of characters (usually 400).

Note that this version of the program assumes that each language distribution has already been generated, stored in .txt format and it simply loads it from a folder. You can find and download the distributions here.

So far the program seem to work on text of different length. Here below are some results:

In these first two examples I used bigger sample sentences

Immagine 001

Immagine 002

In this last example, the sentence was really short, it was just 37 characters, something like: “Diese ist eine schoene Satze auf Deutsch”. In this case it was hard to draw a distribution which could match the German one. In fact the program found French and was really far away from the right answer indeed. The double-check algorithm kicked in the right answer (Lang checked).

Immagine 004

Hope this was interesting.

Weather forecast through Markov chains and Python

A Markov chain is a mathematical system that undergoes transitions from one state to another on a state space. It is essentially a kind of random process without any memory. This last statement, emphasizes the idea behind this process: “The future is independent from the past given the present”. In short, we could say that, the next step of our random process depends only on the very last step occurred. (Note that we are operating in discrete time in this case).

Let’s say that we would like to build a statistical model to forecast the weather. In this case, our state space, for the sake of simplicity, will contain only 2 states: bad weather (cloudy) and good weather (sunny). Let’s suppose that we have made some calculations and found out that tomorrow’s weather somehow relies on today’s weather, according  to the matrix below. Note that P(A|B) is the probability of A given B.


Markov chain weather visual

Therefore, if today’s weather is sunny, there is a P(Su|Su) chance that tomorrow will also be sunny, and a P(C|Su) chance that it will be Cloudy. Note that the two probabilities must add to 1.

Let’s code this system in Python:

Obviously the real weather forecast models are much more complicated than this one, however Markov chains are used in a very large variety of areas and weather forecast is one on them. Other real world applications include:
-Machine learning (in general)
-Speech recognition and completion
-Algorithmic music composition
-Stock market and Economics and Finance in general

For more information on Markov chains, check out the Wikipedia page.

If you are interested in Markov chains, I suggest you to check these two video series on YouTube which are (in my opinion) good explanations of the subject.
-Brandon Foltz’s Finite Math playlist, very clear explanation with real world examples and the math used is fairly simple. You just need to know a bit of matrices, operations on matrices and probability (but if you are here I guess you have no problems on this)
-Mathematicalmonk’s playlist on Machine Learning, where a more technical (formal) explanation is given in the videos on Markov chains, starting from here.

Hope this was interesting and useful.