Loading [MathJax]/extensions/MathZoom.js

Sunday, 28 December 2014

Handwritten number recognition with Python (Machine Learning)

Here I am again with Machine Learning! This time I’ve achieved a great result though (for me at least!). By using another great dataset from UCI I was able to write a decent ML script which scored 95% in the testing part! I am really satisfied with the result.

Here is a sample of what the script should be able to read (in the example the number 9):

figure_1

Some numbers, as the one above, were clear, others not so clear, since they were handwritten and then somehow (I do not know how) converted into digital images.

I had a hard time figuring out how the attributes in the dataset were coded but in the end I managed to figure it out! SorrisoI guess making up such a dataset was a really long and boring work.

Anyway here is my script and below you can find the result of the test on the last 50 numbers or so.

import matplotlib.pyplot as plt
import numpy as np
import os
# Import of support vector machine (svm)
from sklearn import svm
"""--------------------------------SETTINGS---------------------------------"""
# Load data from .txt file
os.chdir("C:\\")
file = open("semeion.txt","r")
data = file.read()
file.close()
# Get index max (number of rows in the data)
def getIndexMax(data):
dataSplitted = data.split("\n")
return len(dataSplitted)
# Prepare data for fitting function
# Answer: if True, returns answers (labels)
# Training: if True, returns training samples (with no answers, only samples)
# Last: if True, returns only last training sample/answer. Useful for testing
def returnDataToUse(data,index,answers=False,training=False,last=False):
dataSplitted = data.split("\n")
# Check that the index is not bigger than our dataset
if index > len(dataSplitted):
print("Index out of bounds, index max:",len(dataSplitted))
return 0
# This bit of code returns answers
if answers and not training:
firstLine = dataSplitted[0][:-1].split(" ")
firstAnsw = firstLine[256:]
firstAnsw2 = [int(i) for i in firstAnsw]
firstAnswInt = firstAnsw2.index(1)
correctAnswers = [firstAnswInt]
i = 1
while i < index:
temp = dataSplitted[i][:-1].split(" ")
temp2 = temp[256:]
temp3 = [int(k) for k in temp2]
temp4 = temp3.index(1)
correctAnswers.append(temp4)
i += 1
completeAnswers = np.array(correctAnswers)
if last:
return completeAnswers[-1]
else:
return completeAnswers
# This bit of code returns pure samples
if training and not answers:
firstLine = dataSplitted[0][:-1].split(" ")
firstTraining = firstLine[:256]
trainingArray = np.array([float(i) for i in firstTraining])
i = 1
while i < index:
temp = dataSplitted[i][:-1].split(" ")
temp2 = temp[:256]
temp3 = np.array([float(k) for k in temp2])
trainingArray = np.vstack((trainingArray,temp3))
i += 1
if last:
return trainingArray[-1]
else:
return trainingArray
# This function displays the image of the number (sample at row x)
# and prints the answer the predictor should give us back
def displayImage(data,row):
# Split each row
dataSplitted = data.split("\n")
# Get the 'rowth' row
strings = dataSplitted[row]
# Split row into numbers(string), and avoid blank at the end
stringsSplitted = (strings[:-1]).split(" ")
# Get target and convert it into numbers, then in a numpy array
risp = stringsSplitted[256:]
rispInt = [int(i) for i in risp]
rispNp = np.array(rispInt)
# Print original data and number to guess in readable format
print(rispInt)
print("Number to guess:",rispInt.index(1),"\n")
# Training array converted into float numbers
training = stringsSplitted[:256]
trainingFloat = [float(i) for i in training]
# Building 16x16 (image) array
#.
i = 16
k = 0
img = np.array(trainingFloat[:16])
while i <= len(trainingFloat):
#print(i)
#print(k)
temp = np.array(trainingFloat[k:i])
img = np.vstack((img,temp))
k = i
i += 16
# Plot image
plt.imshow(img,cmap=plt.cm.gray_r,interpolation="nearest")
plt.show()
"""------------------------------TRAINING------------------------------------"""
# FIX THE NUMBER OF TRAINING SAMPLES
trainingSamples = 1500
# Gamma: gradient descent parameter
clf = svm.SVC(gamma=0.01, C=100)
# Index max
print("Maximum index:",getIndexMax(data),"\n")
answerArray = returnDataToUse(data,trainingSamples,answers=True)
trainingAr = returnDataToUse(data,trainingSamples,training=True)
x,y = trainingAr,answerArray
#Fit the data
print("Training...")
clf.fit(x,y)
"""------------------------------Sample prediction--------------------------"""
# CHOOSE AN EXAMPLE TO PREDICT
example = 1555
predictQ = returnDataToUse(data,example,training=True,last=True)
predictA = returnDataToUse(data,example,answers=True,last=True)
print("Prediction:",clf.predict(predictQ))
print("Actual answer",predictA,"\n")
# Display the actual image
displayImage(data,example)
"""------------------------------Testing Function----------------------------"""
# Actual testing on residual samples (=samples not used for training)
print("Testing...")
correct = 0
wrong = 0
j = example+1
while j < getIndexMax(data):
q = returnDataToUse(data,j,training=True,last=True)
a = returnDataToUse(data,j,answers=True,last=True)
p = clf.predict(q)
if a == p[0]:
correct += 1
else:
wrong += 1
j += 1
print("Statistics, correct answers:",correct/(correct+wrong))

This time I got 89% success rate! Pretty good I guess! I wonder whether I could train Python to recognize other things, maybe faces or other! Well first of all I have to figure out how to convert a picture into readable numpy arrays. Readable for Python of course!! If you have any suggestion please do leave a comment! Sorriso

# >>> ================================ RESTART ================================
# >>>
# Maximum index: 1593
#
# Training...
# Prediction: [9]
# Actual answer 9
#
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
# Number to guess: 9
#
# Testing...
# Statistics, correct answers: 0.8918918918918919
# >>>

Here below is the citation of the source where I found the dataset “Semeion Handwritten Digits Data Set”:


Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


and


Semeion Research Center of Sciences of Communication, via Sersale 117, 00128 Rome, Italy
Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy.


 


Hope this was interesting!

Poker hands recognition (Machine Learning)

A few months ago I downloaded the scikit-learn package for Python, for those of you who might not be aware, scikit-learn is a powerful yet very simple package useful to apply machine learning. Basically they give you “all” the algorithms you may need and you “only” have to get the data and make it ready to feed into the scikit-learn algorithm.

Anyway, I only recently had time to check it out and write some code, furthermore only recently I found a great site full of sample datasets dedicated to machine learning (check it out here). Since the data from this great site is more or less in the right shape for being ready to import (.txt files or similar) the only task left to the programmer is to put it into numpy arrays.

This is one of my first attempt at generating a “real” machine learning program.

I won’t post the pre-settings script since it is pretty boring, instead I’ll briefly describe the pre-setting: basically, using a simple script I’ve splitted the training samples from the database in two .txt files trainingP.txt and target.txt respectively. TrainingP.txt contains the questions (the hands of poker) and target.txt contains the answers (=the score, that’s to say poker, full house etc… coded into numbers according to the description given in the database description file).

Below is the ml script: it is composed of 3 parts: setting, training and testing

setting: import the training sets and fit them into numpy arrays of the correct shape

training: fit the data into the model using scikit-learn

testing: test the algorithm and check what it has learned so far! Some statistics are printed, for reference. Check the results below!

import numpy as np
import sklearn
import os
# Import of support vector machine (svm)
from sklearn import svm
"""---------------------------- SETTING -----------------------------------"""
# Changing current working directory
os.chdir("C:\\")
# Set up ML and scikit-learn
clf = svm.SVC(gamma=0.01, C=100)
# Training data (digits: questions, target: answers)
digits = open("trainingP.txt").read().split("\n")
target = open("target.txt").read().split("\n")
# Initializing numpy arrays
digitsD = np.array(np.fromstring(digits[0],sep=","))
# Completing numpy array for digits
for line in digits[1:]:
try:
sample = np.fromstring(line,sep=",")
digitsD = np.vstack((digitsD,sample))
except:
pass
# Converts strings into integers (not necessary if you
# choose option 1 below)
t = [int(i) for i in target]
# 1 : All strings
# 2 : All integers
t = np.array(target) # 1
#t = np.array(t) # 2
"""---------------------------- TRAINING -----------------------------------"""
# Model fitting (training)
x,y = digitsD,t
print("Training...")
clf.fit(x,y)
# Answer: 1
print("Prediction:",clf.predict(np.array([3,11,1,9,2,5,3,5,4,1])))
"""---------------------------- TESTING -----------------------------------"""
# Load questions and answers
rawQuestions = open("testQuestions.txt").read().split("\n")
rawAnswers = open("Answers.txt").read().split("\n")
# Initialize numpy arrays
questions = np.array(np.fromstring(rawQuestions[0],sep=","))
answers = np.array(rawAnswers)
for q in rawQuestions[1:]:
try:
q1 = np.fromstring(q,sep=",")
questions = np.vstack((questions,q1))
except:
pass
# Testing function
def tryAnswer(qu,an):
machineAnswer = clf.predict(qu)
if machineAnswer == an:
#print("Correct!")
return True
else:
#print("Wrong!")
return False
print("Testing...")
# Start the test
j = 0
right = 0
wrong = 0
while j < 10000:
result = tryAnswer(questions[j],answers[j])
if result:
right += 1
else:
wrong += 1
j += 1
# Print some statistics
print("Statistics, correct rate:",right/(right+wrong))

So far the best score is 56.4%, I wonder if I did everything correctly! Anyway soon I will post a script with better results and achievements!

# >>> ================================ RESTART ================================
# >>>
# Training...
# Prediction: ['0']
# Testing...
# Statistics, correct rate: 0.564
# >>>

Below is the citation of the source of the database, according to their citation policy.


The name of the dataset is Poker Hand and it is from:


Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Friday, 26 December 2014

Fluid dynamics: pressure drop modelling

Likely the last post of the year, on a rather intriguing subject: fluid dynamics.

Recently I’ve been asked to model a simulation for a series of pipes through which water flows out of a reservoir. The task seems quite easy at first, however many factors must be taken into account. Pressure drop due to friction is one of the things that need to be addressed. Therefore, I documented myself a little on the subject and developed a simple model to take friction into account. Pressure drop depends both by the speed of the water and the pipes diameter.

Here is a short piece of code to calculate pressure drop. Note that we are using the average speed of the water. For each average speed we can draw a line that shows the pressure drop versus the pipe’s diameter.

figure_1

Considering the fact that this was my first attempt I think the results are pretty good when compared with an actual diagram such as this (note that on this graph you’ll need to multiply pressure drop by 10 to compare pears to pears and apples to apples):

1

Here is the code I used. I should mention, I had a hard time finding out how to do this and furthermore I couldn’t find a comprehensive piece lecture where fluid dynamics is analysed in depth. Online there is a lot of material on Bernoulli’s equation and Torricelli’s special case but it is definitely harder to find out how to model real fluids, friction and fluid dynamics in general. Hope someone will upload some good material.

from matplotlib import pyplot as plt
import math
# Density of the water
rho = 1000
# Kinematic viscosity of water
nu = 1.01 * math.pow(10,-6)
# Returns pressure drop Pa/m
# Reynolds' number tells us which
# expression to use in the two different scenarios
# v velocity of the fluid
# D pipes' diameter
def pressureDrop(v,D):
Re = (v*D)/nu
if Re < 2000:
print("Laminar flow",v,D)
Fa = 64/Re
else:
print("Turbolent flow")
Fa = 0.316 * math.pow(Re,-0.25)
r = Fa*(1/D)*rho*math.pow(v,2)/2
return r
pressDropv = {}
diaPipe = []
veloc = []
v = 0.5
d = 0.05
while v < 2.5:
veloc.append(v)
v += 0.5
while d < 0.100:
diaPipe.append(d)
d += 0.01
pressDropd = {}
pressDropv = {}
for i in veloc:
pressDropv[i] = list()
for j in diaPipe:
pressDropv[i].append(pressureDrop(i,j))
def plotAllD():
try:
x = diaPipe
plt.title("Pressure drops given a fixed fluid velocity")
plt.ylabel("Pressure drop Pa/m")
plt.xlabel("Pipes' diameter m")
for v in veloc:
y = pressDropv[v]
plt.plot(x,y,label="Fluid velocity "+str(v)+" m/s")
plt.legend(loc="best", fontsize="small")
plt.grid()
plt.show()
except Exception as e:
print(e)
plotAllD()

Saturday, 20 December 2014

And here comes the whole Solar System! ;)

Modelling Sun, Earth and Moon only? Naah that’s too naive. Let’s add some other planets! Occhiolino
The code on my previous article was rather ugly. It did not follow an OOP style and it was mostly non re-usable. Aside from the content of the piece of code I posted in this article, I’m quite satisfied with how I set it: a class for planets and a class to handle time. Sure, perhaps this is not the smartest way of using a class, however for someone like me, who programs mainly as a hobby, that’s fine I guess.
This piece of code models and simulates the whole Solar System (without the Sun, you can add it though!) and gives you, aside from the code, a hint on how slow/fast planets are orbiting around the Sun. note that I use large intervals of time (10000 seconds or similar). This model for sure can be improved. For instance the colours could be improved and, jokes aside, perhaps elliptical orbits could be implemented instead of circular. To do that I still need to work on my knowledge of gravity and its implementation in Python.
solar sys

Anyway I’ll keep you posted. That’s it for now, here’s the code of the model and the YouTube video I made (but couldn’t load in the post, not sure why)

[EDIT]: I've finally made it and load the video here too! Enjoy



import math
from bigfloat import *
import matplotlib.pyplot as plt
from visual import *
# A class to handle the time ranges
class timeHoursSeconds(object):
def __init__(self,s,h,d,y):
self.s = s
self.h = h
self.d = d
self.y = y
def fromStoHours(self):
h = self.s/60/60
return h
def fromStoDays(self):
d = self.s/60/60/24
return d
def fromStoYears(self):
y = self.s/60/60/24/365
return y
def fromDaysToS(self):
s = self.d*24*60*60
return s
def fromDaysToH(self):
h = self.d * 24
return h
def fromDaysToY(self):
y = self.d/365
return y
class planet(object):
G = 6.67 * math.pow(10,-11)
sunM = 1.989 * math.pow(10,30)
eaM = 5.973 * math.pow(10,24)
RTL = 384400000
def __init__(self,name,mass,RS,theta0,radius):
self.name = name
self.mass = mass
self.RS = RS
self.theta0 = theta0
self.radius = radius
def gravitationalForce(self,m2=1):
if m2 ==1:
f = self.G * (self.mass*self.sunM)/math.pow(self.RS,2)
else:
f = self.G * (self.mass*self.eaM)/math.pow(self.RTL,2)
return f
def angularVelocity(self,m2=1):
w = math.sqrt(self.gravitationalForce(m2=m2)/(self.mass*self.RS))
return w
def velocity(self,m2=1):
v = self.angularVelocity(m2=1) * self.RS
return v
def angularPosition(self,t,m2=1):
theta = self.theta0 + self.angularVelocity(m2=m2) * t
return theta
def varAngularPosition(self,t,dt,m2=1):
dtheta = self.angularPosition(t+dt,m2=m2)-self.angularPosition(t,m2=m2)
return dtheta
def periodAroundSun(self,m2=1):
p = timeHoursSeconds(2*math.pi/self.angularVelocity(m2=m2),0,0,0)
return p
def picture(self,x,y,z,col,trail):
if col == 1:
return sphere(pos=vector(x,y,z),color=color.red,radius=self.radius,make_trail=trail)
elif col == 2:
return sphere(pos=vector(x,y,z),color=color.blue,radius=self.radius,make_trail=trail)
elif col == 3:
return sphere(pos=vector(x,y,z),color=color.green,radius=self.radius,make_trail=trail)
elif col == 4:
return sphere(pos=vector(x,y,z),color=color.cyan,radius=self.radius,make_trail=trail)
elif col == 5:
return sphere(pos=vector(x,y,z),color=color.yellow,radius=self.radius,make_trail=trail)
else:
return sphere(pos=vector(x,y,z),color=color.white,radius=self.radius,make_trail=trail)
mercury = planet("Mercury",3.302 * math.pow(10,23),57910000000,0,0.3)
venus = planet("Venus",4.8685 * math.pow(10,24),108200000000,0,0.4)
earth = planet("Earth",5.973 * math.pow(10,24),149600000000,0,0.5)
# As for the Moon, input Earth-Moon distance
moon = planet("Moon",7.347 * math.pow(10,22),384400000,0,0.2)
mars = planet("Mars",6.4185 * math.pow(10,23),227900000000,0,0.45)
jupiter = planet("Jupiter",1.8986 * math.pow(10,27),778500000000,0,.8)
saturn = planet("Saturn",5.6846 * math.pow(10,26),1433000000000,0,0.7)
uranus = planet("Uranus",8.6832 * math.pow(10,25),2877000000000,0,0.6)
neptune = planet("Neptune",1.0243 * math.pow(10,26),4503000000000,0,0.6)
# Simulation data
years = timeHoursSeconds(0,0,3655,0)
seconds = years.fromDaysToS()
print("Years: ",years.y)
print("Days: ",years.d)
print("Seconds: ",seconds)
t = 0
dt = timeHoursSeconds(10000,0,0,0)
# Planets
merc = mercury.picture(1.5,0,0,1,True)
ven = venus.picture(3,0,0,3,True)
ea = earth.picture(5,0,0,2,True)
mar = mars.picture(7,0,0,3,True)
jup = jupiter.picture(9,0,0,5,True)
sat = saturn.picture(11,0,0,6,True)
ur = uranus.picture(13,0,0,3,True)
nep = neptune.picture(15,0,0,2,True)
planetsf = [merc,ven,ea,mar,jup,sat,ur,nep]
planets = [mercury,venus,earth,mars,jupiter,saturn,uranus,neptune]
# The Moon
v = vector(0.9,0,0)
mo = moon.picture(5+0.9,0,0,10,True)
for k in planets:
revp = k.periodAroundSun()
print("Planet name: ",k.name)
print(k.name," mass: ",k.mass," kg")
print(k.name," distance from the sun: ",k.RS/1000," Km")
print(k.name," angular velocity: ",k.angularVelocity()," rad/s")
print(k.name," period around the sun: ",revp.fromStoYears()," terrestrial year/s")
print("\n")
# Our program
while t < seconds:
rate(50)
for plan in range(len(planets)):
planetsf[plan].pos = rotate(planetsf[plan].pos,angle=planets[plan].varAngularPosition(t,dt.s),axis=(0,0,1))
v = rotate(v,angle=moon.varAngularPosition(t,dt.s,m2=2),axis=(0,0,1))
mo.pos = ea.pos + v
t += dt.s
view raw solarSystem.py hosted with ❤ by GitHub

Earth Moon system orbiting around the Sun and VPython

Hello everyone! A few days ago my friend, as a joke, bet me to reproduce the Moon orbiting around the Earth. Well, I was a little afraid the model could be a little slow using matplotlib and its animation functions. Fortunately I found out VPython, a great 3D package: click here to get to their homepage. VPython, in my opinion, provides one of the simplest yet most brilliant package for 3D graphics. It’s shockingly easy to learn: it took less than an hour for me to learn the basics and start doing something interesting. Furthermore if you’re used to vectors and working with them, VPython provides simple functions for vectors calculations.
im
Check the model below (Earth and Moon orbiting around the Sun). I also made a video:



import math
from visual import *
# Data in units according to the International System of Units
G = 6.67 * math.pow(10,-11)
# Mass of the Earth
ME = 5.973 * math.pow(10,24)
# Mass of the Moon
MM = 7.347 * math.pow(10,22)
# Mass of the Sun
MS = 1.989 * math.pow(10,30)
# Radius Earth-Moon
REM = 384400000
# Radius Sun-Earth
RSE = 149600000000
# Force Earth-Moon
FEM = G*(ME*MM)/math.pow(REM,2)
# Force Earth-Sun
FES = G*(MS*ME)/math.pow(RSE,2)
# Angular velocity of the Moon with respect to the Earth (rad/s)
wM = math.sqrt(FEM/(MM * REM))
# Velocity v of the Moon (m/s)
vM = wM*REM
print("Angular velocity of the Moon with respect to the Earth: ",wM," rad/s")
print("Velocity v of the Moon: ",vM/1000," km/s")
# Angular velocity of the Earth with respect to the Sun(rad/s)
wE = math.sqrt(FES/(ME * RSE))
# Velocity v of the Earth (m/s)
vE = wE*RSE
print("Angular velocity of the Earth with respect to the Sun: ",wE," rad/s")
print("Velocity v of the Earth: ",vE/1000," km/s")
# Initial angular position
theta0 = 0
# Position at each time
def positionMoon(t):
theta = theta0 + wM * t
return theta
def positionEarth(t):
theta = theta0 + wE * t
return theta
def fromDaysToS(d):
s = d*24*60*60
return s
def fromStoDays(s):
d = s/60/60/24
return d
def fromDaysToh(d):
h = d * 24
return h
# Graphical parameters
print("\nSimulation Earth-Moon-Sun motion\n")
days = 365
seconds = fromDaysToS(days)
print("Days: ",days)
print("Seconds: ",seconds)
v = vector(0.5,0,0)
E = sphere(pos=vector(3,0,0),color=color.cyan,radius=.3,make_trail=True)
M = sphere(pos=E.pos+v,color=color.white,radius=0.1,make_trail=True)
S = sphere(pos=vector(0,0,0),color=color.yellow,radius=1)
t = 0
thetaTerra1 = 0
dt = 5000
dthetaE = positionEarth(t+dt)- positionEarth(t)
dthetaM = positionMoon(t+dt) - positionMoon(t)
print("delta t:",dt,"seconds. Days:",fromStoDays(dt),"hours:",fromDaysToh(fromStoDays(dt)),sep=" ")
print("Variation angular position of the Earth:",dthetaE,"rad/s that's to say",degrees(dthetaE),"degrees",sep=" ")
print("Variation angular position of the Moon:",dthetaM,"rad/s that's to say",degrees(dthetaM),"degrees",sep=" ")
while t < seconds:
rate(50)
thetaEarth = positionEarth(t+dt)- positionEarth(t)
thetaMoon = positionMoon(t+dt) - positionMoon(t)
# Rotation only around z axis (0,0,1)
E.pos = rotate(E.pos,angle=thetaEarth,axis=(0,0,1))
v = rotate(v,angle=thetaMoon,axis=(0,0,1))
M.pos = E.pos + v
t += dt
view raw earthMoon.py hosted with ❤ by GitHub

Hope this was entertaining! Next comes the whole Solar System!

Wednesday, 17 December 2014

Gini coefficient, concentration measurement: an implementation in R

Another subject we took in the statistics class was the Gini index.

Gini index or ratio or coefficient is used to calculate how much a certain transferable phenomenon such as income or stocks for instance, is concentrated.

For example, say you are evaluating a company and you’d like to know more about how the shares are divided among the shareholders. You could use Gini index for that!

I’ve calculated the index using R and random data you can download here. In case you’d like to know more about Gini index check here.

Rplot

Here my simple R implementation of the index.

# Load data
tb <- read.table("C:\\b.txt",header=TRUE,sep=",")
# Add 5 new columns for analysis purposes
for(i in 1:5)
{
cbind(tb,0)
}
# Storing the number of rows and columns
nRow <- nrow(tb)
nCol <- ncol(tb)
# Cumulative frequencies
i <- 1
totalF = sum(tb[,2])
while(i <= nRow)
{
if(i==1)
{
tb[1,3] <- tb[1,2]
tb[1,4] <- tb[1,2]/1000
}else{
tb[i,3] <- tb[i-1,3]+tb[i,2]
tb[i,4] <- tb[i-1,3]/totalF + tb[i,2]/1000
}
i <- i + 1
}
i <- 1
while(i<=nRow)
{
tb[i,5] <- tb[i,1]*tb[i,2]
if(i==1)
{
tb[i,6] <- tb[i,5]
}else{
tb[i,6] <- tb[i-1,6]+tb[i,5]
}
i <- i + 1
}
i <- 1
while(i <= nRow)
{
tb[i,7] <- tb[i,6]/sum(tb[,5])
i = i +1
}
# Show and plot the data
tb
a <- c(0,1)
b <- c(0,1)
c <- c(0,tb[,4])
d <- c(0,tb[,7])
plot(a,b,main="Concentration",type="l",col="green",lwd=2)
lines(c,d,type="b",col="red",ylab="Relative freq",xlab="Relative freq",lwd=2)
# Calculate Gini's R concentration index
getR <- function(mat)
{
R <- 0.5
area = 0.5*tb[1,4]*tb[1,7]
i <- 2
while(i <= nRow)
{
area = area + 0.5*(tb[i,4]-tb[i-1,4])*(tb[i-1,7]+tb[i,7])
i = i + 1
}
acmax <- (sum(tb[,2])-1)/(2*sum(tb[,2]))
R <- (R - area)/acmax
return(R)
}
# Print data
paste("Concentration index R is: ",getR(tb)*100,"%")
view raw concMeas.R hosted with ❤ by GitHub

Here below are the results


im2


im1



It looks like the data I used shows a 24% concentration. Cool!

Pearson’s chi-squared test: a simple implementation in R (test of independence)

Hi everyone! Today I found my old statistics workbooks and start wondering what I could get out of them.

Statistics can look pretty boring when using only pen and paper, since many times you’re just making a lot of repetitive calculations. However, the results of those calculations might of course be interesting.

Person’s chi-squared test is a simple test, as my professor put it, one of the first tests you should be performing when analysing a double entry table. You might be asking yourself why. Well, the answer is that this test looks for connection between the two variables in the table. As you might know connection is different from dependence. Dependence is of course a stronger bond and kind of a “one way bond” whilst connection is sort of a “double way bond”. If there’s no or little connection, then you might want to change variables in play since there is nothing but little interest in performing further tests on the same set of data.

Check on Wikipedia for more information on the theory.

Here is the R code I used to implement the test on the raw data at the bottom of the page.

# Load data
tb <- read.table("C:\\a.txt",header=TRUE,sep=",",row.names=1)
# Add a column and a row
tb <- cbind(tb,0)
tb <- rbind(tb,0)
# Store data on the table
nRow <- nrow(tb)
nCol <- ncol(tb)
# Add row totals
rownames(tb)[nRow] <- "Row sum"
i <- 1
while(i <= nRow)
{
partialRowsum <- sum(tb[i,])
tb[i,nCol] <- partialRowsum
i <- i + 1
}
# Add column totals
colnames(tb)[nCol] <- "Columns sum"
i <- 1
while(i <= nCol)
{
partialColsum <- sum(tb[i])
tb[nRow,i] <- partialColsum
i <- i + 1
}
# This function returns a stocastic matrix
# conditioned on the given one.
stoc <- function(mat)
{
stocIndipTable <- matrix(0,nRow,nCol)
k<-1
while(k<=nRow)
{
j <- 1
while(j<=nCol)
{
if((j==nCol) || (k==nRow))
{
stocIndipTable[k,j] <- tb[k,j]
j = j + 1
}else{
stocIndipTable[k,j] <- tb[k,nCol]*tb[nRow,j]/tb[nRow,nCol]
j = j + 1
}
}
k = k + 1
}
return(stocIndipTable)
}
# Contingency table
ctgTable <- tb-stoc(tb)
# This function calculates the normalized
# Pearson's chi squared (not constrained)
calculatePearson <- function(cgt)
{
rawP <- cgt^2/stoc(tb)
pearson <- sum(rawP)
maxP <- min((nRow-2),(nCol-2)) * tb[nRow,nCol]
return(pearson/maxP)
}
# Print information on the table
paste("This is a ",dim(tb)[1]," by ",dim(tb)[2]," matrix.")
print(colnames(tb))
print(rownames(tb))
# Pearson's Chi squared
p <- calculatePearson(ctgTable) * 100
if(p>0)
{
print("There is connection between the two phenomena in the table")
print(paste("Chi squared connection index: ",p," %"))
}else{
print("Check process! Chi squared negative!")
}
view raw chisqrd.R hosted with ❤ by GitHub

Below is the output, it seems there’s a feeble connection between the two phenomena we studied.


im


The data I used for the simulation are available to be downloaded here.

Saturday, 13 December 2014

Here again Java

Java, Java and Java again! Here is my last script in Java, which I wrote today.

The script is quite simple, it is composed of three classes and lets you handle some basic calculations with lines and parabolas on the real plane and calculate the area under the curve (definite integrals).

Optionally you can plot the data.

file

Here is the main class with a simple example:

public class Principal
{
public static void main(String[] args)
{
// Let's generate our line and parabola
Integral f1 = new Integral(1,2,3);
System.out.println(f1.description());
System.out.println("\nIntersections between line and parabola");
f1.findIntersections();
System.out.println("\nIntersections of line with x axis");
f1.findIntersectionsWithXaxis("line");
System.out.println("\nIntersections of parabola with x axis");
f1.findIntersectionsWithXaxis("parabola");
// Integral of the parabola
System.out.println("\nDefinite integral of the parabola from 0 to 10:");
System.out.println(f1.approxIntegral(0,10,100000,"parabola"));
// Plot of the function
f1.plotFunction("both",-10,10,20);
}
}

And the integral class which handles all the background operations

import javax.swing.JFrame;
import org.math.plot.*;
public class Integral
{
private double a;
private double b;
private double c;
// Constructor of the class
public Integral(double a, double b, double c)
{
this.a = a;
this.b = b;
this.c = c;
}
// Returs a string with a simple description of what this class can do
public String description()
{
String content = "This class lets you generate lines and parabolas"
+ "\nFurthermore you can:"
+ "\nCalculate definite integrals"
+ "\nEvaluate functions"
+ "\nPlot functions"
+ "\nOutput data (txt format)";
return content;
}
// Lets you evaluate the line at x and optionally print the result (print=true)
public double line(double x,boolean print)
{
double value = a*x + b;
if(print)
{
System.out.println("Line: the value of y at x="+x+" is equal to: "+value);
}
return value;
}
// Same as above but for the parabola
public double parabola(double x, boolean print)
{
double value = a*Math.pow(x,2) + b*x + c;
if(print)
{
System.out.println("Parabola: the value of y at x="+x+" is equal to: "+value);
}
return value;
}
// Evaluate definite integral with a finite number of steps
public double approxIntegral(double initialp, double finalp, double steps, String function)
{
double sum = 0;
double interval = (finalp-initialp)/steps;
if(function.equals("line"))
{
while(initialp < finalp)
{
sum = sum + interval*this.line(initialp,false);
initialp += interval;
}
return sum;
}else if(function.equals("parabola"))
{
while(initialp < finalp)
{
sum = sum + interval*this.parabola(initialp,false);
initialp += interval;
}
return sum;
}else
{
String error = "Error, please insert either of the following:\nparabola\nline";
System.out.println(error);
return 0;
}
}
// Plot the functions
public void plotFunction(String function,double from, double to,int npoints)
{
if((from >= to) || (from==to) || (npoints==0))
{
System.out.println("Check input please! Something is wrong.");
return;
}
double step = Math.abs((to-from)/npoints);
Plot2DPanel plot = new Plot2DPanel();
double x[] = new double[npoints];
double y[] = new double[npoints];
if(function.equals("line"))
{
for(int i=0; i < npoints; i++ )
{
x[i] = from;
y[i] = this.line(from,false);
from += step;
}
plot.addLinePlot("Line 1",x,y);
}else if(function.equals("parabola"))
{
for(int i=0; i < npoints; i++ )
{
x[i] = from;
y[i] = this.parabola(from,false);
from += step;
}
plot.addLinePlot("Parabola 1",x,y);
}else if(function.equals("both"))
{
double y1[] = new double[npoints];
for(int i=0; i < npoints; i++)
{
x[i] = from;
y[i] = this.parabola(from, false);
y1[i] = this.line(from, false);
from += step;
}
plot.addLinePlot("Parabola 1",x,y);
plot.addLinePlot("Line1",x,y1);
}else
{
System.out.println("Please check your input, function's name seems not ok.");
return;
}
// Plot name and frame settings
plot.setAxisLabel(0,"X");
plot.setAxisLabel(1,"Y");
JFrame frame = new JFrame("Graph");
frame.setContentPane(plot);
frame.setVisible(true);
return;
}
// Find intersections between line and parabola
void findIntersections()
{
double delta = Math.pow(this.b-this.a, 2)-4*this.a*(this.c-this.b);
if(delta < 0)
{
System.out.println("Delta is negative, no solutions!");
}else
{
double x1 = (this.a - this.b + Math.sqrt(delta))/(2*this.a);
double x2 = (this.a - this.b - Math.sqrt(delta))/(2*this.a);
System.out.println("Intersections at:"
+ "\nx1: "+x1 + "y1: "+this.parabola(x1,false)
+ "\nx2: "+x2 + "y2: "+this.parabola(x2,false));
}
}
// Self explanatory ;)
void findIntersectionsWithXaxis(String function)
{
if(function.equals("parabola"))
{
if(this.c==0)
{
double x2 = -this.b/this.a;
System.out.println("x1: 0 y1: 0");
System.out.println("x2: "+x2+" y2: "+this.parabola(x2,false));
}
else
{
double delta = Math.pow(this.b, 2)-4*this.a*this.c;
if(delta < 0)
{
System.out.println("Negative delta, no solutions!");
return;
}else
{
double x1 = (-this.b + Math.sqrt(delta))/(2*this.a);
double x2 = (-this.b - Math.sqrt(delta))/(2*this.a);
System.out.println("x1: "+x2+"y1: "+this.parabola(x1,false));
System.out.println("x2: "+x2+"y2: "+this.parabola(x2,false));
}
}
}else if(function.equals("line"))
{
double x1 = -this.b/this.a;
System.out.println("x1: "+x1+"y1: "+this.line(x1,false));
}
}
}

Optionally, should you want to output something to a .txt file, here is the class that can help in doing that:

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class GetOutData
{
private String content;
// if you need to print a string
public GetOutData(String output)
{
this.content = output;
}
public void writeString(String path)
{
try
{
File file = new File(path);
if(!file.exists())
{
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
bw.write(this.content);
bw.close();
}catch(IOException e)
{
e.printStackTrace();
}
}
}

Bye until the next project! Occhiolino

Thursday, 11 December 2014

Java again, in English this time though! ;)

As i said earlier in my article in German, I studied Java only once in a while, since I had so many other things to do and study. Anyway here is another bit of code I’ve written in order to practice what I’ve learnt.

This script is a simple program to keep track of your inventory. Frankly speaking, an excel spreadsheet would have been faster and easier but where’s the fun in it? Occhiolino

There are three classes: the main class, the products class and the inventory class. In the products class lies all the code related to the object “product” while in the inventory class you can find all the code needed to run the inventory. Finally the main class executes the main program and (as it should be I guess).

For the sole purpose of exercising myself with Java, I tried to generate a inventory of food with three items. However you can easily add as many inventories as you like and figure out a way to speed up the process of adding items to each inventory.

Perhaps in the future I’ll add the option to print out a bar chart or something similar as soon as I have time to study Java. As for now, I find it quite interesting and demanding since it asks every time for the type of the data you are going to work with and the arrays are different from python’s lists. That can be a small issue for someone who got used to Python as myself. Furthermore Java libraries are hugely vast and that can be overwhelming at the beginning. Nonetheless I hope to get better at it soon! Sorriso

Here is the products class

public class Products 
{
private String name;
private int id;
private int quantity;
private double price;

//Constructor of the class
public Products(String name,int id, int quantity, double d)
{
this.name = name;
this.id = id;
this.quantity = quantity;
this.price = d;
}

//Quantity setter and update method
public void changeQuantity(int q,String s)
{
if(s.equals("add"))
{
this.quantity += q;
}else if(s.equals("subtract"))
{
this.quantity -= q;
}
System.out.println("Warehouse stock updated succesfully!");
}

//Price setter
public void changePrice(double p)
{
this.price = p;
System.out.println("Price changed succesfully!");
}

//Get all the info on the product
public void getInfoProduct()
{
System.out.println("As for product "+this.name+" with id: "+this.id);
System.out.println("Quantity available: "+this.quantity);
System.out.println("Price "+this.price);
System.out.println("Total value in stock: " + this.getTotalValue());
}

//This function returns the total value of the stock for the product
public double getTotalValue()
{
double totalValue = this.price*this.quantity;
return totalValue;
}

}

The main class


public class mainClass 
{

public static void main(String[] args)
{
//we create the products
Products bread = new Products("Bread",0,10,0.5);
Products ooil = new Products("Olive oil",1,20,4.00);
Products oranges = new Products("Oranges",2,5,2.50);

//we create the inventory and add products to it
Inventory inventory1 = new Inventory("Inventory 1");
inventory1.addProduct(bread);
inventory1.addProduct(ooil);
inventory1.addProduct(oranges);

//We print info on the products
bread.getInfoProduct();
ooil.getInfoProduct();
oranges.getInfoProduct();

//And the total value of the inventory
System.out.println("\nTotal value of the inventory is "+inventory1.getInventoryValue());
}
}

Finally, the inventory class


//Dynamics arrays. I found them similar to those in Python
import java.util.ArrayList;
import java.util.List;

public class Inventory
{
private String name;
//Products[] productsInStock = new Products[]{};
List<Products> productsInStock = new ArrayList<Products>();
private int totalItems = 0;

//Constructor of the class
public Inventory(String name)
{
this.name = name;
}

//Add a product object to the inventory array
public void addProduct(Products p)
{
productsInStock.add(p);
this.totalItems += 1;
System.out.println("Product added to inventory "+this.name+" succesfully!");
}

//Get the total value of the inventory
public double getInventoryValue()
{
double valueToReturn=0;
int lenAr = this.totalItems;
for(int i=0;i<lenAr;i++)
{
valueToReturn += this.productsInStock.get(i).getTotalValue();
}
return valueToReturn;
}
}

Here below is the output


Product added to inventory Inventory 1 succesfully!
Product added to inventory Inventory 1 succesfully!
Product added to inventory Inventory 1 succesfully!
As for product Bread with id: 0
Quantity available: 10
Price 0.5
Total value in stock: 5.0
As for product Olive oil with id: 1
Quantity available: 20
Price 4.0
Total value in stock: 80.0
As for product Oranges with id: 2
Quantity available: 5
Price 2.5
Total value in stock: 12.5

Total value of the inventory is 97.5

Hope this was interesting.

Es ist endlich Zeit mit Java zu spielen

Heute hatte ich Lust ein bisschen zum Thema Java zu schreiben, und der Artikel auf Deutsch zu schreiben um meine Kenntnisse der deutschen Sprache zu verbessern.

Erst muss ich mich entschuldigen, weil es im Artikel Fehler seien könnten.
Ich studiere Deutsch seit zwei Jahren als Hobby, aber ich weiss noch nicht, wenn mein Deutsch gut genug ist, um einen kleinen Artikel zu Schreiben. Auf jeden Fall, wollte ich es ausprobieren, weil man sich nur mit Praxis verbessert.

Obwohl man in einem kleinen Artikel wie meiner nicht so viele Wörter braucht, kann es schwierig sein gut Deutsch zu schreiben. Aber ich denke, dass die komplizierte Grammatik der deutsche Sprache ein sehr attraktiver Aspekt sein kann.

Ich habe genug über die Sprache geredet, jetzt reden wir über Java.

Ich bin kein Experte von Java, ich habe seit dem letzten Sommer Java studiert aber nicht jeden Tag sondern nur ab und zu, wenn Zeit da war. Jedenfalls ist hier eines der ersten Projekte. Es ist ein sehr kurzes Skript, das euch eine Grafik zeigen kann.

Um das Skript zu machen, habe ich Eclipse benutzt. Auf dieser website kann man Eclipse finden.

Hier ist das Skript

import javax.swing.JFrame;
import org.math.plot.*;
import java.util.ArrayList;
import java.util.List;
public class Principale
{
public static void main(String[] args)
{
UniformlyAcceleratedMotion a = new UniformlyAcceleratedMotion(0,10,2);
System.out.println(a.getPosition(2));
System.out.println(a.getSpeed(2));
a.printGeneralSettings();
// List position = new ArrayList();
double space[];
space = new double[10];
double speedd[];
speedd = new double[10];
double time[];
time = new double[10];
for(int i=0;i<10;i++)
{
space[i] = a.getPosition(i);
speedd[i] = a.getSpeed(i);
time[i]=i;
}
// plot2DPanel object
Plot2DPanel plot = new Plot2DPanel();
// add line
plot.addLinePlot("Space",time,space);
plot.addLinePlot("and Speed",time,speedd);
// Plot name and frame settings
JFrame frame = new JFrame("Uniformly accelerated motion");
frame.setContentPane(plot);
frame.setVisible(true);
}
}

Das Skript benutzt das Class “UniformlyAcceleratedMotion”, das hier unter ist:

public class UniformlyAcceleratedMotion
{
private double v0;
private double x0;
private double a;
public UniformlyAcceleratedMotion(double x0, double v0, double a)
{
this.x0 = x0;
this.v0 = v0;
this.a = a;
}
public double getPosition(double t)
{
if(t <= 0)
{
return this.x0;
}else
{
double position = this.x0 + this.v0 * t + 0.5*this.a * Math.pow(t,2);
return position;
}
}
public double getSpeed(double t)
{
if(t<=0)
{
return this.v0;
}
else
{
double speed = this.v0 + this.a * t;
return speed;
}
}
public void printGeneralSettings()
{
System.out.println("Initial position: "+this.x0 + " m");
System.out.println("Initial speed: "+this.v0 + " m/s");
System.out.println("Acceleration (constant): "+this.a + " m/s^2");
}
}

Hier ist die Graphik:


Script


Ich hoffe, dass dieser Artikel lustig war. Auf wiedersehen, bis zum nächsten Mal.
Vielen Dank!

Friday, 5 December 2014

A floating ball dropped in a water current

While writing the previous script on simple harmonic motion I found a funny simulation.
Here it is, do you remember what happens when you drop a ball in flowing water? (Actually the fluid simulated looks something more dense than water).


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import random
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
def fun(x):
if x!=0:
return 2*np.sin(4/3*x+2)/x
else:
return 1
counter = 1
def animate(i):
global x,y,counter
counter += 0.2
ax1.clear()
plt.xlim(0,50)
plt.ylim(-2,2)
plt.plot([x for x in range(50)],[-0 for x in range(50)],lw=2,color="cyan")
plt.plot(counter,fun(counter),color="red",marker="o",ms=20)
ani = animation.FuncAnimation(fig,animate,interval=1)
plt.show()
view raw ball_water.py hosted with ❤ by GitHub

Basic physics and Python: simple harmonic motion

Here is simple harmonic motion simulation with a spring and a bouncing ball.
Springs are a classic example of harmonic motion, on Wikipedia you can get a grasp of the basics. Among other assumption, in my simulation I’ve assumed an ideal spring and that there is no friction (and therefore the motion will not stop by itself) however, if you like, you can implement friction easily.



Here is the spring simulation
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import random
fig = plt.figure(figsize=(7,7))
ax1 = fig.add_subplot(1,1,1)
ax2 = fig.add_subplot(2,1,1)
ax3 = fig.add_subplot(3,1,1)
fig.subplots_adjust(hspace=.45)
# Spring data
k = 100
m = 20
w = np.sqrt(k/m)
phi = 2
A = 2
period = 2*np.pi/w
frequency = 1/period
print("Period:",period,"seconds",sep=" ")
print("Frequency:",frequency,"Hz",sep=" ")
print("k: ",k,"N/m",sep=" ")
def fun(t):
global w,phi,A
return A*np.sin(w*t+phi)
def vel(t):
global w,phi,A
return A*w*np.cos(w*t+phi)
def acceleration(t):
global w,phi,A
return -A*w**2*np.sin(w*t+phi)
def position(x):
x1 = x-1
x2 = x+1
y1 = 1
y2 = -1
p1 = [x1,y2]
p2 = [x2,y2]
p3 = [x2,y1]
p4 = [x1,y1]
return [p1,p2,p3,p4]
counter = 0
xt = [0]
yt = [0]
vy = [0]
acy = [0]
def animate(i):
global counter, xt,yt,A,vy,acy
ax3.clear()
plt.subplot(311)
# configure X axes
plt.xlim(-3.5,3.5)
# configure Y axes
plt.ylim(-2,2)
# labels
plt.xlabel("x")
plt.ylabel("y")
plt.title("Motion of an ideal spring")
p1 = [position(fun(counter))[0][0],position(fun(counter))[0][1]]
p2 = [position(fun(counter))[1][0],position(fun(counter))[1][1]]
p3 = [position(fun(counter))[2][0],position(fun(counter))[2][1]]
p4 = [position(fun(counter))[3][0],position(fun(counter))[3][1]]
x = [p1[0],p2[0],p3[0],p4[0],p1[0]]
y = [p1[1],p2[1],p3[1],p4[1],p1[1]]
linex = [-4,p1[0]]
liney = [0,0]
plt.plot(x,y,lw=5,color="blue")
plt.plot(linex,liney,color="red",ls=":",lw=5)
plt.subplot(312)
xt.append(counter)
vy.append(vel(counter))
plt.title("Velocity")
plt.xlim(0,15)
plt.ylim(-A*w-0.5,A*w+0.5)
plt.plot(xt,vy,lw=1,color="green")
plt.plot([0,15],[0,0],lw=0.5,color="black")
plt.subplot(313)
acy.append(acceleration(counter))
plt.title("Acceleration")
plt.xlim(0,15)
plt.ylim(-A*w**2-0.5,A*w**2+0.5)
plt.plot(xt,acy,lw=1,color="green")
plt.plot([0,15],[0,0],lw=0.5,color="black")
counter += 0.1
ani = animation.FuncAnimation(fig,animate,interval=2)
plt.show()

Below is the bouncing ball:


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import random
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
def fun(x):
return 2*np.sin(2*x+2)
counter = 1
def animate(i):
global x,y,counter
counter += 0.2
ax1.clear()
plt.xlim(0,50)
plt.ylim(-3,3)
plt.plot([x for x in range(50)],[-2.15 for x in range(50)],lw=2,color="black")
plt.plot(counter,fun(counter),marker="o",ms=20)
ani = animation.FuncAnimation(fig,animate,interval=1)
plt.show()

A simpler example of simple harmonic motion with a spring (video):


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import random
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
def fun(t):
return 2*np.sin(4/3*t+2)
x = [0]
y = [fun(x[0])]
def position(x):
x1 = x-1
x2 = x+1
y1 = 1
y2 = -1
p1 = [x1,y2]
p2 = [x2,y2]
p3 = [x2,y1]
p4 = [x1,y1]
return [p1,p2,p3,p4]
counter = 0
def animate(i):
global x,y,counter
ax1.clear()
# configure X axes
plt.xlim(-5,5)
plt.xticks([-4,-3,-2,-1,0,1,2,3,4])
# configure Y axes
plt.ylim(-5,5)
# labels
plt.xlabel("x")
plt.ylabel("y")
plt.title("Simple harmonic motion")
# plt.gca().add_line(line)
p1 = [position(fun(counter))[0][0],position(fun(counter))[0][1]]
p2 = [position(fun(counter))[1][0],position(fun(counter))[1][1]]
p3 = [position(fun(counter))[2][0],position(fun(counter))[2][1]]
p4 = [position(fun(counter))[3][0],position(fun(counter))[3][1]]
x = [p1[0],p2[0],p3[0],p4[0],p1[0]]
y = [p1[1],p2[1],p3[1],p4[1],p1[1]]
linex = [-4,p1[0]]
liney = [0,0]
plt.plot(x,y,lw=5,color="blue")
plt.plot(linex,liney,color="red",ls=":",lw=5)
counter += 0.2
ani = animation.FuncAnimation(fig,animate,interval=1)
plt.show()

Hope this was interesting.

Thursday, 4 December 2014

Animated graphs with matplotlib

Recently I have had not so much time to dedicate to my blog, however, today I had some spare time and decided to learn how to code animated graphs with Python and matplotlib.
Apparently it’s really simple, however a little bit of practice is needed, here below are three pieces of code where I coded and plotted the graph of (many) branch of parabolas, a random walk, and a simulated stock (a random walk again!).
Here is the half parabolas



import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
x = [i for i in range(20)]
y = [np.sqrt(i**3) for i in x]
def animate(i):
global x,y
r = np.random.normal(0,1,1)
xar = x
yar = y + r*y
#ax1.clear()
ax1.plot(xar,yar,marker='o')
ani = animation.FuncAnimation(fig,animate,interval=1000)
plt.show()


Below is the random walk and the video:


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
def randomWalk():
r = np.random.normal(0,1,1)
if r < 0:
return -1
elif r > 0:
return 1
else:
return 0
counter = 0
start = [0]
t = [0]
def animate(i):
global t,u,sd,counter
x = t
y = start
counter += 1
x.append(counter)
y.append(start[counter-1] + randomWalk())
plt.plot(x[counter-1],y[counter-1],marker='o',color="r")
ani = animation.FuncAnimation(fig,animate,interval=1)
plt.show()

video of the random walk: https://www.youtube.com/watch?v=3pAHwt1ioz4

and finally the stock simulation:


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import random
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
def getNewPrice(s,mean,stdev):
r = np.random.normal(0,1,1)
priceToday = s + s*(mean/255+stdev/np.sqrt(225)*r)
return priceToday
z = np.random.normal(0,1,255)
u = 0.1
sd = 0.3
counter = 0
price = [100]
t = [0]
def animate(i):
global t,u,sd,counter
x = t
y = price
counter += 1
x.append(counter)
y.append(getNewPrice(price[counter-1],u,sd))
ax1.clear()
plt.plot(x,y,color="blue")
ani = animation.FuncAnimation(fig,animate,interval=50)
plt.show()


Hope this was interesting Sorriso.

Saturday, 18 October 2014

New version and bug fixes for LED lights savings calculator application

Hi everyone, I found some naive bugs and some improvements which I could add to my application, therefore here they are:

Bug fixed:
-A bug which did not let the user calculate savings in case the plant was fully financed and yearly net savings were negative
-Other minor bugs

Functionalities added:
-Now you can select the number of yearly payments should you decide to ask for a full financing for your LED lights. You can leave this field blank, in that case a default number of payments is calculated according to costs and annual energy savings.

Click here to download the improved version.

Hope this was useful.

Monday, 13 October 2014

A simple program to calculate LED lights savings

 

EDIT: For the most updated application, please check the following page:
New version and bug fixes for LED lights savings calculator application


A week ago, a friend asked for a simple program which could help him calculate savings consumptions fast and give a glance at the overall picture.

If you did not know, by replacing your old bulbs and lamps with LED lights you can save up to 90% depending on the replaced lamp. For instance, if you replace neon with LED, savings are at least 45%, however they might be higher due to neon maintenance and hidden consumptions. If you replace incandescent bulbs with LED you save at least 80%.

Although initial price may be higher, LED lamps will surely pay off in the long run, since they are expected to last 50.000 hours!

This simple program lets you enter the data on your actual lights and the LED, then it calculates:
-Savings
-Savings at each year, for 10 years
-Amortization time if the lamps are paid in one shot
-Financing of the LED lamps with a basic constant pay-out mortgage assuming a 5% interest rate
-Immediate savings if the LED lamps are financed

Click here for the python source code. Again, the program is pretty simple since only a sketch is needed. It surely can be improved. If you have any suggestion please let me know.

Here are some screenshots and a calculation example

Immagine 2

The result of a simple replacement: 200 neon tubes with 200 LED tubes (assuming a cost of 26 euro/pc for LED tube.

# Name: MyOffice1. Plant type: Private. Energy cost: 0.18euro/kw
#
# LED lights data:
# 200 LED Tube 1200mm 18.0 watt. Working hours: 10.0. Days a year: 320
#
# Old lights data:
# 200 Neon Tube 1200mm 36.0 watt. Working hours: 10.0. Days a year: 320
#
#
# Total old lights consumption: 23040.0 kw/year
# Total LED lights consumption: 11520.0 kw/year
# Savings with LED lights (annual, excluding maintenance): 50.0%
# Equivalent to an annual money savings of: 2073.6 euro/year
#
#
# Total savings at year 10 (euro/year): 2073.6
# Total savings at year 9 (euro/year): 2473.6
# Total savings at year 8 (euro/year): 2073.6
# Total savings at year 7 (euro/year): 2073.6
# Total savings at year 6 (euro/year): 2473.6
# Total savings at year 5 (euro/year): 2073.6
# Total savings at year 4 (euro/year): 2073.6
# Total savings at year 3 (euro/year): 2473.6
# Total savings at year 2 (euro/year): 2073.6
# Energy savings at year 1 (euro): 2073.6
#
# LED lights cost: 5200.0 euro.
# Installation costs: 0 euro.
# Amortization in 2.51 years. (Considering consumption savings only).
#
###################################################################
# Annual payment due: 1466.46 euro
# Annual immediate savings: 607.14 euro
# Interest rate applied 5.0%
# Number of payments (years): 4
view raw LED_savings.py hosted with ❤ by GitHub

Looks really like LEDs are the lights of the future.


Note: some data may be incorrect in your country (for example energy cost, tube cost, etc..), if you should ever use this program please check each data you input.

Sunday, 5 October 2014

A simple approximating algorithm for Financial Mathematics

Today while I was applying some of my knowledge of Financial Mathematics, I came across a weird problem. Ok I guess that’s not that weird after all, however I did not find at first, a formula or some trick to get to my goal and therefore I decided to use a simple approximating algorithm.

Say you have some data on a fixed-rate mortgage, a really basic mortgage where both the interest rate and the annual payment are fixed. By the way, if you’d like to know more on these mortgage check the wikipedia page here.

Apparently, the expression used to determine the annual payment, given the initial conditions, should be the following:

clip_image002

clip_image002[6]

clip_image002[12]

Now, suppose that you have everything, the annual constant payment (R), the initial capital (C0), the number of years (n) and you want to find the interest rate applied (i).

At first, it may appear difficult to deal with this problem analytically, so my first idea to get around this was the following: first, define k as the ratio of the initial capital to the annual payment C0/R, then rearrange the equation in terms of k as follows

clip_image002[14]

Now the problem sums up to this: find the i that satisfies the following system of equations

clip_image002[16]

Eventually, here is an algorithm in python to solve the system above

# This simple script lets you find i given initial capital,
# number of payments and the amount of the yearly payment.
import numpy as np
from matplotlib import pyplot as plt
import time
z = np.linspace(0.00001,1,100)
# Function f(i,n=5)
def interestRate(i,n=5):
if i == 0:
return 0
else:
return (1-(1+i)**(-n))/i
f = []
flat = []
# Initial capital to yearly payment ratio
ratio = 9520/2336.78
for k in z:
f.append(interestRate(k))
flat.append(ratio)
iToFind = 0
derivedRatio = 0
while abs(derivedRatio-ratio) > 0.001:
iToFind += 0.00001
derivedRatio = interestRate(iToFind)
#print(iToFind)
print("i found! It is roughly equal to: ",iToFind)
plt.plot(z,f)
plt.plot(z,flat)
plt.plot(iToFind,interestRate(iToFind),marker="o")
plt.title("i found!")
plt.show()
view raw fin_math.py hosted with ❤ by GitHub

And here is the final plot


plot


Hope this was interesting. Here is Wolfram Alpha’s answer for your reference. For some reason it outputs 0.079 while I get 0.0724 which was the random rate I used to build this simple example. Perhaps some mistake occurred. If you find out please let me know.


Disclaimer
This article is for educational purpose only. The numbers are invented. The article may well contain mistakes and errors. You should never use this article for purposes different from the educational one. The author is not responsible for any consequence or loss due to inappropriate use.

Saturday, 27 September 2014

Approximating differential equation with Euler’s method

It looks like it’s time for differential equations.

So, let’s say that you have a differential equation like the one below and you are striving trying to find an analytical solution. Chill out, maybe take a walk outside or ride your bike and calm down: sometimes an analytical solution does not exist. Some differential equations simply don’t have an analytical solution and might drive you mad trying to find it.

Fortunately, there is a method to find the answer you may need, or rather, an approximation of it. This approximation sometimes can be enough of a good answer.

While there are different methods to approximate a solution, I find Euler’s method quite easy to use and implement in Python. Furthermore, this method has just been explained by Sal from Khan Academy in a video on YouTube which is a great explanation in my opinion. You can find it here.

Here below is the differential equation we will be using:

clip_image002

and the solution (in this case it exists)

clip_image002[5]

Here is the Python code algorithm. Note that the smaller the step (h), the better the approximation (although it may take longer).

# Euler's method for approximating differential equations
import math
import matplotlib.pyplot as plt
# Your differential equation. It should be in the form: dy/dx= f(t,y)
# Remember that f returns the value of the slope in that point
def f(t,y):
return 2-math.e**(-4*t)-2*y
def solution(t):
return 1+0.5*math.e**(-4*t)-0.5*math.e**(-2*t)
t0 = 0
y0 = 1
# Step size, the smaller the better, however it may take longer to compute
h = 0.005
x = [t0]
y = [y0]
y2 = [y0]
errors = []
while t0 < 5:
m = f(t0,y0)
y1 = y0 + h*m
t1 = t0+h
x.append(t1)
y.append(y1)
solut = solution(t1)
y2.append(solut)
error = abs(y1-solut)/y1*100
errors.append(error)
print("Actual t","predicted y","Percentage error %",sep=" ")
print(t1,y1,error)
t0=t1
y0=y1
print("Mean of error:",sum(errors)/len(errors),"%",sep=" ")
plt.plot(x,y2,label="Function")
plt.plot(x,y,label="Approximation")
plt.legend(loc='upper right')
plt.title("Euler's method approximation, h="+str(h))
plt.show()

Here are some approximations using different values of h:


5


05


005



Hope this was interesting.

Sunday, 21 September 2014

Generate slope fields in R and Python

Here is a short post on how to generate a quick slope field in R and Python.

If you do not know what a slope field is, well I am not the best person to explain it to you since it is a relative new concept for me as well. From what I’ve understood on the subject, a slope field is a graphical way to sort out a first-order differential equation and get a grasp at the solution without having to solve it analytically.

As the Wikipedia’s page says, “it may be used to qualitatively visualize solutions, or to numerically approximate them.”

In general, I feel safe saying that a slope field is some sort of a graphical approach to a differential equation.

Say you have the following differential equation:

clip_image002[8]

drawing the slope field would look something like this:
In Python (without arrows)
figure_1_3figure_2_1

and in R (with arrows, x=f and y=h)
Rplot

Of course these plots are just very quick and can be improved.

Here is the Python code I used to draw them.

import numpy as np
from matplotlib import pyplot as plt
# Differential equation
# diff = y'= y/x (or say x+y)
def diff(x,y):
return y/x # try also x+y
x = np.linspace(-10,10,50)
y = np.linspace(-10,10,50)
# use x,y
for j in x:
for k in y:
slope = diff(j,k)
domain = np.linspace(j-0.07,j+0.07,2)
def fun(x1,y1):
z = slope*(domain-x1)+y1
return z
plt.plot(domain,fun(j,k),solid_capstyle='projecting',solid_joinstyle='bevel')
plt.title("Slope field y'")
plt.grid(True)
plt.show()
print("End of the program")

And the R code

# Our differential equation
diff <- function(x,y)
{
return(x/y) #Try also x+y
}
# Line function
TheLine <- function(x1,y1,slp,d)
{
z = slope*(d-x1)+y1
return(z)
}
# Domains
x = seq(-20,20,0.5)
y = seq(-20,20,0.5)
# Points to draw our graph
f = c(-5,5)
h = c(-5,5)
plot(f,h,main="Slope field")
# Let's generate the slope field
for(j in x)
{
for(k in y)
{
slope = diff(j,k)
domain = seq(j-0.07,j+0.07,0.14)
z = TheLine(j,k,slope,domain)
arrows(domain[1],z[1],domain[2],z[2],length=0.08)
}
}
view raw slope_field_R.R hosted with ❤ by GitHub

Here is a beautiful slope field for the following differential equation:

clip_image002[10]

In Python
x y


If you need a quick tool for drawing slope fields, this online resource is good, click here.


Hope this was interesting.

Tuesday, 9 September 2014

Multivariable gradient descent

This article is a follow up of the following:
Gradient descent algorithm

Here below you can find the multivariable, (2 variables version) of the gradient descent algorithm. You could easily add more variables. For sake of simplicity and for making it more intuitive I decided to post the 2 variables case. In fact, it would be quite challenging to plot functions with more than 2 arguments.

Say you have the function f(x,y) = x**2 + y**2 –2*x*y plotted below (check the bottom of the page for the code to plot the function in R):

im

Well in this case, we need to calculate two thetas in order to find the point (theta,theta1) such that f(theta,theta1) = minimum.

Here is the simple algorithm in Python to do this:

from sympy import *
x = Symbol('x')
y = Symbol('y')
z = Symbol('z')
f = x**2 + y**2 - 2*x*y
# First partial derivative with respect to x
fpx = f.diff(x)
# First partial derivative with respect to y
fpy = f.diff(y)
# Gradient
grad = [fpx,fpy]
# Data
theta = 830 #x
theta1 = 220 #y
alpha = .01
iterations = 0
check = 0
precision = 1/1000000
printData = True
maxIterations = 1000
while True:
temptheta = theta - alpha*N(fpx.subs(x,theta).subs(y,theta1)).evalf()
temptheta1 = theta1 - alpha*N(fpy.subs(y,theta1)).subs(x,theta).evalf()
#If the number of iterations goes up too much, maybe theta (and/or theta1)
#is diverging! Let's stop the loop and try to understand.
iterations += 1
if iterations > maxIterations:
print("Too many iterations. Adjust alpha and make sure that the function is convex!")
printData = False
break
#If the value of theta changes less of a certain amount, our goal is met.
if abs(temptheta-theta) < precision and abs(temptheta1-theta1) < precision:
break
#Simultaneous update
theta = temptheta
theta1 = temptheta1
if printData:
print("The function "+str(f)+" converges to a minimum")
print("Number of iterations:",iterations,sep=" ")
print("theta (x0) =",temptheta,sep=" ")
print("theta1 (y0) =",temptheta1,sep=" ")
# Output
#
# The function x**2 - 2*x*y + y**2 converges to a minimum
# Number of iterations: 401
# theta (x0) = 525.000023717248
# theta1 (y0) = 524.999976282752

This function though is really well behaved, in fact, it has a minimum each time x = y. Furthermore, it has not got many different local minimum which could have been a problem. For instance, the function here below would have been harder to deal with.


im2


Finally, note that the function I used in my example is again, convex.
For more information on gradient descent check out the wikipedia page here.
Hope this was useful and interesting.


R code to plot the function

x <- seq(-5,5,length=100)
y <- x
fun <- function(x,y)
{
return(x**2+y**2-2*x*y)
}
z <- outer(x,y,fun)
persp(x,y,z,theta=30,phi=30,expand=0.5,col="lightgreen",ltheta=100,xlab="x",ticktype="detailed",ylab="y",zlab="z")

Gradient descent algorithm

Today I’m going to post a simple Python implementation of gradient descent, a first-order optimization algorithm. In Machine Learning this technique is pretty useful to find the values of the parameters you need. To do this I will use the module sympy, but you can also do it manually, if you do not have it.

The idea behind it is pretty simple. Imagine you have a function f(x) = x^2 and you want to find the value of x, let’s call it theta, such that f(theta) is the minimum value that f can assume. By iterating the following process a sufficient number of times, you can obtain the desired value of theta:

im

image

 

Now, for this method to work, and theta to converge to a value, some conditions must be met, namely:
-The function f must be convex
-The value of alpha must not be too large or too small, since in the first case you’ll end up with the value of theta diverging and in the latter you’ll approach the desired value really slowly
-Depending on the function f, the value of alpha can change significantly.

Note that, if the function has local minimums and not just an absolute minimum, this optimization algorithm may well end “trapped” in a local minimum and not find your desired global minimum. For the sake of argument, suppose the function below goes to infinity as x gets bigger and that the global minimum is somewhat near 1. With the algorithm presented here, you may well end up with x = –0.68 or something like that as an answer when you are looking roughly for x = 0.95.

im2

In this case of course, it is trivial to find out the value and you don’t even need derivatives. However, for a different function it may not be that easy, furthermore for multivariable functions it may be even harder (in the next article I will cover multivariable functions).

Here is the Python code of my implementation of the algorithm

from sympy import *
import numpy as np
from matplotlib import pyplot as plt
x = Symbol('x')
# Function
y = x**2
# First derivative with respect to x
yprime = y.diff(x)
# Second derivative with respect to x
ypp = yprime.diff(x)
def plotFun():
space = np.linspace(-5,5,100)
data = np.array([N(y.subs(x,value)) for value in space])
plt.plot(space, data)
plt.show()
theta = 84
theta2 = 0
alpha = .01
iterations = 0
check = 0
precision = 1/1000000
plot = True
iterationsMax = 10000
maxDivergence = 50
while True:
theta2 = theta - alpha*N(yprime.subs(x,theta)).evalf()
iterations += 1
# If we make too much iterations our program
# stops and we need to check it to be sure the
# parameters are correct and it is working properly
if iterations > iterationsMax:
print("Too much iterations")
break
# Check if theta converges to a value or not
# We allow a max of 50 divergences
if theta < theta2:
print("The value of theta is diverging")
check += 1
if check > maxDivergence:
print("Too much iterations (%s), the value of theta is diverging"%maxDivergence)
print("Please choose a smaller alpha and, or check that the function is indeed convex")
plot = False
break
# If the value of theta changes less that a certain
# tolerance, we stop the program since theta has
# converged to a value.
if abs(theta - theta2) < precision:
break
theta = theta2
if plot:
print("Number of iterations:",iterations,"value of theta:",theta2,sep=" ")
plt.plot(theta,N(y.subs(x,theta)).evalf(),marker='o',color='r')
plotFun()

Hope this was interesting and useful.

Thursday, 4 September 2014

Generate words through the use of Markov chains

In general, computer programs are believed to be bad at being creative and doing tasks which require “a human mind”. Dealing with the meaning of text, words and sentences is one of these tasks. That’s not always the case. For instance sentiment analysis is a branch of Machine Learning where computer programs try to convey the overall feeling coming from tons of articles.

But here we are talking a lot more simpler: by using Markov chains and some statistics, I developed a simple computer program which generates words. The model works as follow:
As a first step, the program is fed with a long text in the selected language. The longer the text, the better. Next, the program analyses each word and gets the probability distributions for the following:
-Length of the word
-First character (or letter)
-Character (or letter) next to a given one
-Last character (or letter)

Then, once gathered all this data, the following function is run in order to generate words:

theModel

Each part is then glued together and returned by the function.

On average, 10% of generated words are English (or French, German or Italian) real words or prepositions or some kind of real sequence of characters.

The most interesting aspect, in my opinion, is the fact that by shifting the language of the text fed into the program, one can clearly see how the sequences of characters change dramatically, for instance, by feeding English, the program will be more likely to write “th” in words, by using German words will often contain “ge” and by using Italian words will often end by a vowel.

Here is the code. Note that in order to speed up the word checking process, I had to use the NLTK package which is availably only in Python 2. To avoid the use of Python 2 you could check each word using urllib and an online dictionary by parsing the web page but this way is tediously slow. It would take about 15 minutes to check 1000 words. By using NLTK you can speed up the checking process.

import numpy as np
import random
class ListOperation(object):
def __init__(self,list1):
self.list1 = list1
# This function returns absolute frequencies of
# occurrence for the characters in listToCheck
def absFreq(self,listToCheck):
absFr = []
for k in listToCheck:
freq = 0
for j in self.list1:
if j == k:
freq += 1
absFr.append(freq)
return absFr
# This function returns relative frequencies of
# occurrence for the characters in listToCheck
def relFreq(self,listToCheck):
absFreq = self.absFreq(listToCheck)
relFr = []
for i in absFreq:
relFr.append(i/sum(absFreq))
return relFr
# This function returns a list of the letters
# next to the one entered
def getNext(self,char):
postChar = []
for i in self.list1:
for k in range(len(i)):
if i[k] == char:
try:
postChar.append(i[k+1])
except:
postChar.append(" ")
return postChar
# This function returns an object of this class
# containing a list of the first letters in the text
def getFirstLetters(self):
firstList = []
for i in self.list1:
firstList.append(i[0])
listF = ListOperation(firstList)
return listF
# This function returns an object of this class
# containing a list of the last letters in the text
def getLastLetters(self):
lastLetters = []
for i in self.list1:
length = len(i)
lastLetters.append(i[length-1])
listU = ListOperation(lastLetters)
return listU
# This function returns all the elemets of the list in
# lower case
def lowerCase(self):
listLower = []
for i in self.list1:
listLower.append(i.lower())
return listLower
# Characters used to build a distribution
alphabet = ["a","b","c","d","e","f","g","h","i","k","l",\
"m","n","o","p","q","r","s","t","u","v","w","x","z"]
# Languages supported
languages = ["english","italian","french","german"]
# This function reads data and returns a string
# you need to load a .txt document containing a
# sample of the language you selected. The longer
# the sample, the better word composition.
def readData(name):
file = open("C:\\text.txt","r")
data = file.read()
file.close()
return data
# Text uploaded
textToCheck = readData("english")
# The text is splitted
textToCheckList = textToCheck.split()
# Create an instance of the class
textObjectList = ListOperation(textToCheckList)
#------------------------------------------
# Let's find all the characters after each one
# of those in the alphabet list
postSequences = []
for letter in alphabet:
seqPost = textObjectList.getNext(letter)
postSequences.append(seqPost)
# For each postSequence we find the absolute frequency
distAbs = []
for seq in postSequences:
distAObject = ListOperation(seq)
distA = distAObject.absFreq(alphabet)
distAbs.append(distA)
# And the relative frequency. However, we
# must be caareful and delete those which
# sum up to 0 to avoid exceptions with np
distRel = []
for seq in postSequences:
distRObject = ListOperation(seq)
if sum(distRObject.absFreq(alphabet)) == 0:
pass
print("SOMETHING NOT ADDING TO 1")
else:
distR = distRObject.relFreq(alphabet)
distRel.append(distR)
#------------------------------------------
# List of letters at the beginning of a word
listFirstLetters = ListOperation(textObjectList.getFirstLetters().lowerCase())
# Distribution of the letter at the beginning of a word
absFreqFirstLetters = listFirstLetters.absFreq(alphabet)
relFreqFirstLetters = listFirstLetters.relFreq(alphabet)
#------------------------------------------
# List of letters at the end of a word
listLastLetters = ListOperation(textObjectList.getLastLetters().list1)
# Distribution of the letter at the end of a word
absFreqLastLetters = listLastLetters.absFreq(alphabet)
relFreqLastLetters = listLastLetters.relFreq(alphabet)
#------------------------------------------
# This function returns a list containing
# the length of each word in the string
def findWordLengDist(string):
wordList = string.split()
seqLength = []
for word in wordList:
if len(word) <= 18:
seqLength.append(len(word))
return seqLength
# This function returns the probability
# distribution of the characters in the
# string used as input
def generateDist(list1):
lengthsList = []
for number in list1:
if number not in lengthsList:
lengthsList.append(number)
lengthsList.sort()
nLengthsList = []
for number in lengthsList:
frequency = 0
for occurringNumber in list1:
if number == occurringNumber:
frequency += 1
nLengthsList.append(frequency)
# get relative frequency
relFreq = []
for i in nLengthsList:
relFreq.append(i/sum(nLengthsList))
# thislist = [lunghezza parole, frequenza relativa]
dataToReturn = [lengthsList,relFreq]
# These two should be equal
if len(lengthsList) != len(relFreq):
raise ValueError("""The two final lists have diff
erent length, please check the algorithm""")
return dataToReturn
##########################################
# Word composer function
##########################################
# This function outputs strings whose length
# is similar to the one of the real ones.
# Also, the structure should be at least similar
def spitText():
try:
word = ""
# Initial character is chosen based on the
# distribution (oldChar) of the first
# characters observed in the text given
oldChar = np.random.choice(alphabet,replace=True,p=relFreqFirstLetters)
word = word + oldChar
for k in range(np.random.choice(distributions[0],replace=True,p=distributions[1])-1):
newChar = np.random.choice(alphabet,replace=True,p=distRel[alphabet.index(oldChar)])
word = word + newChar
oldChar = newChar
# Same thing for the last character
lastChar = np.random.choice(alphabet,replace=True,p=relFreqLastLetters)
word += lastChar
print(word)
return word
except Exception as e:
print("Exception",e)
return "0"
distributions = generateDist(findWordLengDist(textToCheck))
generatedWords = []
# Generate 100 words
for i in range(100):
word = spitText()
generatedWords.append(word)
# And print them out to the screen
print(generatedWords)
# Checking existance of words with nltk.corpus
# and python 2.7. This works only with Python 2
from nltk.corpus import words
# Example of word existance checking:
# True if the word exists, False otherwise
"word" in words.words()
# Load words from file
file = open("C:\\words.txt","r")
wordsLoaded = file.read().split("\n")
realWords = []
for word in wordsLoaded:
if word in words.words():
realWords.append(word)
print(realWords)

Hope this was interesting.

Saturday, 30 August 2014

Markets, stocks simulations and Markov chains

This article is some sort of continuation from this one.

Our previous model for stock simulations did not take in account the following idea:
when a stock (or the market) is going up, then it should be (intuitively) at least, more likely that it will continue to go up. Or at the very least, as it is the case for a football game, it does not feel right to believe that the probability of either of the two possible outcomes is exactly 50%.

The idea behind Markov chains is really versatile, we can apply it also to the markets.
With a “bit” of study (I’m being sarcastic here), you can come up with something pretty complicated like this, however, the model I’m going to show here is much more naive and easier.

Suppose a Markov chain with two states, market up and market down. Once you found the probabilities of each state, you can easily simulate a random walk (based on a Markov chain of course).

Here is the code for this model:

import numpy as np
from matplotlib import pyplot as plt
path = "C:\\"
#data is stored in a .txt file in format DOHLCV
def readData(stock):
file = open(path+stock+'.txt',"r")
data = file.read()
file.close()
return data
#We read data and get a list with each line in string format
data = readData('stock').split('\n')
#This function returns a list with float mean values
#from two columns (1st and 4th in this case)
def getColumn(list1):
dataAr = []
for i in range(len(list1)-1):
dataToGet = (float(list1[i].split(',')[1]) + float(list1[i].split(',')[4]))/2
dataAr.append(dataToGet)
return dataAr
#Here is our data (type(data[1]) = float)
data = getColumn(data)
#Calculate percentage change
def getPercentageChange(v1,v2):
pc = (v2-v1)/abs(v1)
return pc
#Given an array of numbers, this function returns an array
#with percentage changes between each pair of values
def getPcArray(list1):
pcArray = []
for i in range(len(list1)-2):
pcArray.append(getPercentageChange(list1[i],list1[i+1]))
return pcArray
dataPerc = getPcArray(data)
#Now we need to calculate conditional probabilities
negativesP = 0
negativesN = 0
for i in range(1,len(dataPerc)):
if dataPerc[i-1] <= 0 and dataPerc[i] > 0:
negativesP += 1
elif dataPerc[i-1] <= 0 and dataPerc[i] < 0:
negativesN += 1
positivesP = 0
positivesN = 0
for i in range(1,len(dataPerc)):
if dataPerc[i-1] > 0 and dataPerc[i] > 0:
positivesP += 1
elif dataPerc[i-1] > 0 and dataPerc[i] < 0:
positivesN += 1
#pUp_givenUp = probability of going up, given that
#the previous step was upwards
pUp_givenUp = positivesP/(positivesP+positivesN)
pDown_givenUp = positivesN/(positivesP+positivesN)
pUp_givenDown = negativesP/(negativesP+negativesN)
pDown_givenDown = negativesN/(negativesP+negativesN)
#Transition matrices for our model
stateUp = [pUp_givenUp,pDown_givenUp]
stateDown = [pUp_givenDown,pDown_givenDown]
#This function simulates n random walks according
#to the parameters specified and the Markov model.
def simulate(n):
k = 0
for i in range(n):
#Initial value
initial = 100
#Possible percentage change [up,down]
variat = [0.02,-0.02]
#Initial percentage change
invar = np.random.choice(variat,replace=True)
#List of simulated data
simulatedData = []
#Simulation process
for i in range(255):
if invar >= 0:
invar = np.random.choice(variat,replace=True,p=stateUp)
else:
invar = np.random.choice(variat,replace=True,p=stateDown)
initial = initial*(1+invar)
simulatedData.append(initial)
#x axis
index = list(range(len(simulatedData)))
#Plot the data
print(k)
k += 1
plt.plot(index,simulatedData)
plt.title(str(n)+" simulated random walks")
#Let's simulate 2 random processes
simulate(2)
plt.show()
# Output (conditional probabilities)
#
# pUp_givenUp pDown_givenUp pUp_givenDown pDown_givenDown
# 0.60360 0.39639 0.44806 0.55193

The graphs below represent respectively, 2, 200 and 500 random paths.


2 random walks
2_random_waks


200 random walks


200_random_waks


500 random walks


500_random_waks


Hope this was interesting.





Disclaimer
This article is for educational purpose only. The numbers are invented. The author is not responsible for any consequence or loss due to inappropriate use. It may contain mistakes and errors. You should never use this article for purposes different from the educational one.

Monday, 25 August 2014

A first really shy approach to Machine Learning using Python

The day before yesterday I came across Machine Learning: WOW… I got stuck at my pc for an hour wondering about and watching the real applications of this great subject.

I was getting really excited! Then, after an inspiring vide on YouTube, I decided it was time to act. My fingers wanted desperately to type some “smart” code so I decided to write a program which could  recognize the language into which a given text is written.

I do not know if this is actually a very primitive kind of Machine Learning program (I somehow doubt it) therefore I apologize to all those who know more on the subject but let me dream for nowSorriso.

Remember the article on letter frequency distribution across different languages?? Back then I knew it would be useful again (although I did not know for what)!! If you would like to check it out or refresh your memory, here it is.

Name of the program: Match text to language

This simple program aims to be an algorithm able to distinguish
written text by recognizing what language a text
was written in.

The underlying hypothesis of this model are the following:
1. Each language has a given characters distribution which is different from the others. Characters distributions are generated by choosing randomly Wikipedia pages in each language.
2. Shorter sentences are more likely to contain common words that uncommon one.

The first approach to build a program able to do such a task was to build a character distribution for each of the languages used using the code in the frequency article. Next, given a string, (sentence) the program should be able to guess the language by comparing the characters distribution in the sentence with the actual distributions of the languages.

This approach, for sentences longer than 400 characters seems to work fine. However, if the sentence were to be shorter than 400 characters, a mismatch might occur. In order to avoid this, I have devised a naive approach: the shorter the sentence, the more likely the words in it are the most common. Therefore, for each language,a list of 50 most common words has been loaded and is used to double check the first guess based on the character frequency only in case the length of the sentence is less than a given number of characters (usually 400).

Note that this version of the program assumes that each language distribution has already been generated, stored in .txt format and it simply loads it from a folder. You can find and download the distributions here.

#Characters used to build a distribution
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",",",";","-"]
#Languages supported
languages = ["english","italian","french","german"]
#A useful dictionary
distribDict = dict()
#The following functon takes a list and a string of characters,
#it calculates how often a certain character appears and then
#it outputs a list with character and frequency
def frequencies(string,letters):
list_frequencies = []
for letter in letters:
freq = 0
for i in string:
if i == letter:
freq += 1
list_frequencies.append(letter)
list_frequencies.append(freq)
return list_frequencies
#This function returns a list containing 2 lists with letter
#and frequencies
def fix_lists_letter(list_1):
list_letters = []
list_letters.append(list_1[0])
list_freq = []
for i in range(1,len(list_1)):
if i % 2 == 0:
list_letters.append(list_1[i])
else:
list_freq.append(list_1[i])
if len(list_letters) != len(list_freq):
return "Some error occurred"
else:
final_list = [list_letters,list_freq]
return final_list
#This function returns the relative frequencies
def get_rel_freq(list_1):
list_to_ret = []
for i in list_1:
list_to_ret.append(i/sum(list_1))
return list_to_ret
#This function should return the distribution of the characters
#in a given text by putting together most of the functions above
def returnDistribution(strings,alphaBet):
firstC = frequencies(strings,alphaBet)
finalC = fix_lists_letter(firstC)
letters = finalC[0]
frequenc = get_rel_freq(finalC[1])
distribution = [letters,frequenc]
nChar = sum(finalC[1])
#Note: Spaces " " are NOT considered as characters
print("Number of character used:", nChar, sep=" ")
return distribution
#This function loads each distribution into the dictionary distribDict
def loadDistribDict():
try:
for lang in languages:
fileToRead = open("C:\\Users\\desktop\\lproject\\"+lang+"Dist.txt","r")
data = fileToRead.read()
dist = data.split("\n")[1].split(" ")
distList = []
for number in dist:
if number == '':
number = 0
distList.append(float(number))
distribDict[lang] = distList
fileToRead.close()
print("Loaded",lang,"character frequency distribution!",sep=" ")
except Exception as e:
print(e)
#String to test
stringToCheck = "Hallo diese ist eine schoene Satze auf deutsch"
commonEnglishWords = [" is "," the "," of "," and "," to "," that "," for "," it "," as "," with "," be "," by "," this "," are "," or "," his "," from "," at "," which "," but "," they "," you "," we "," she "," there "," have "," had "," has "," yes "]
commonGermanWords = [" ein "," das "," ist "," der "," ich "," nicht "," es "," und "," Sie "," wir "," zu "," er "," sie "," mir "," ja "," wie "," den "," auf "," mich "," dass "," hier "," wenn "," sind "," eine "," von "," dich "," dir "," noch "," bin "," uns "," kann "," dem "]
commonItalianWords = [" di "," che ", " il "," per "," gli "," una "," sono ", " ho "," lo "," ha "," le "," ti "," con "," cosa "," come "," ci "," questo "," hai "," sei "," del "," bene "," era "," mio "," solo ", " gli "," tutto "," della "," mia "," fatto "]
commonFrenchWords = [" avoir "," est "," je "," pas "," et "," aller "," les "," en "," faire "," tout "," que "," pour "," une "," mes "," vouloir "," pouvoir "," nous "," dans "," savoir "," bien "," mon ", " au "," avec "," moi "," quoi "," devoir "," oui "," comme "," ils "]
commonWordsDict = {"english":commonEnglishWords,"german":commonGermanWords,"italian":commonItalianWords,"french":commonFrenchWords}
def checkLang(string):
distToCheck = returnDistribution(string,alphabet)
distToCheckFreq = distToCheck[1]
diffDict = dict()
#For each language we calculate the difference between the
#observed distribution and the given one.
for lang in languages:
diffList =[]
for i in range(len(languages)-1):
diff = abs(distToCheckFreq[i]-distribDict[lang][i])
diffList.append(diff)
diffDict[lang]=sum(diffList)
#verifica
for lang in languages:
print(lang,diffDict[lang])
langFound = min(diffDict, key=diffDict.get)
#If the sample sentence is shorter than 420 characters then
#we may have some recognition issues which will be dealt
#here below..
langChecked = ""
correct = False
if len(string) &lt; 420:
for langKey in commonWordsDict.keys():
for word in commonWordsDict[langKey]:
if word in string:
langChecked = langKey
correct = True
break
if correct:
break
if correct:
print("Lang found: ",langFound)
print("Lang checked: ",langChecked)
langFound = langChecked
#The language found is returned here
print("\n")
return langFound
loadDistribDict()
print("\n")
print("Language found by the program: ",checkLang(stringToCheck))

So far the program seem to work on text of different length. Here below are some results:


In these first two examples I used bigger sample sentences


Immagine 001


Immagine 002


In this last example, the sentence was really short, it was just 37 characters, something like: “Diese ist eine schoene Satze auf Deutsch”. In this case it was hard to draw a distribution which could match the German one. In fact the program found French and was really far away from the right answer indeed. The double-check algorithm kicked in the right answer (Lang checked).


Immagine 004


Hope this was interesting.

Weather forecast through Markov chains and Python

A Markov chain is a mathematical system that undergoes transitions from one state to another on a state space. It is essentially a kind of random process without any memory. This last statement, emphasizes the idea behind this process: “The future is independent from the past given the present”. In short, we could say that, the next step of our random process depends only on the very last step occurred. (Note that we are operating in discrete time in this case).

Let’s say that we would like to build a statistical model to forecast the weather. In this case, our state space, for the sake of simplicity, will contain only 2 states: bad weather (cloudy) and good weather (sunny). Let’s suppose that we have made some calculations and found out that tomorrow’s weather somehow relies on today’s weather, according  to the matrix below. Note that P(A|B) is the probability of A given B.

 

Markov chain weather visual

Therefore, if today’s weather is sunny, there is a P(Su|Su) chance that tomorrow will also be sunny, and a P(C|Su) chance that it will be Cloudy. Note that the two probabilities must add to 1.

Let’s code this system in Python:

# A simple Markov chain model for the weather in Python
import numpy as np
import random as rm
import time
# Let's define the statespace
states = ["Sunny","Cloudy"]
# Possible sequences of events
transitionName = [["SuSu","SuCl"],["ClCl","ClSu"]]
# Probabilities matrix (transition matrix)
transitionMatrix = [[0.8,0.2],[0.4,0.6]]
# Check that probabilities add to 1. If not, raise ValueError
if sum(transitionMatrix[0])+sum(transitionMatrix[1]) != 2:
print("Error!!!! Probabilities MUST ADD TO 1. Check transition matrix!!")
raise ValueError("Probabilities MUST ADD TO 1")
# A functions which implements the Markov model to forecast the weather
def weatherForecast(days):
# There is no reason to start from one state or another, let's just
# pick one randomly
weatherToday = rm.choice(states)
i = 0
print("Starting weather: ",weatherToday)
while i &lt; days:
if weatherToday == "Sunny":
#numpy.random.choice(a, size=None, replace=True, p=None)
change = np.random.choice(transitionName[0],replace=True,p=transitionMatrix[0])
if change == "SuSu":
pass
else:
weatherToday = "Cloudy"
elif weatherToday == "Cloudy":
change = np.random.choice(transitionName[1],replace=True,p=transitionMatrix[1])
if change == "ClCl":
pass
else:
weatherToday = "Sunny"
print(weatherToday)
i += 1
time.sleep(0.2)
# We forecast the weather for 100 days
weatherForecast(100)

Obviously the real weather forecast models are much more complicated than this one, however Markov chains are used in a very large variety of areas and weather forecast is one on them. Other real world applications include:
-Machine learning (in general)
-Speech recognition and completion
-Algorithmic music composition
-Stock market and Economics and Finance in general


For more information on Markov chains, check out the Wikipedia page.


If you are interested in Markov chains, I suggest you to check these two video series on YouTube which are (in my opinion) good explanations of the subject.
-Brandon Foltz’s Finite Math playlist, very clear explanation with real world examples and the math used is fairly simple. You just need to know a bit of matrices, operations on matrices and probability (but if you are here I guess you have no problems on this)
-Mathematicalmonk’s playlist on Machine Learning, where a more technical (formal) explanation is given in the videos on Markov chains, starting from here.


Hope this was interesting and useful.
-