Loading [MathJax]/extensions/MathZoom.js

Saturday, 30 August 2014

Markets, stocks simulations and Markov chains

This article is some sort of continuation from this one.

Our previous model for stock simulations did not take in account the following idea:
when a stock (or the market) is going up, then it should be (intuitively) at least, more likely that it will continue to go up. Or at the very least, as it is the case for a football game, it does not feel right to believe that the probability of either of the two possible outcomes is exactly 50%.

The idea behind Markov chains is really versatile, we can apply it also to the markets.
With a “bit” of study (I’m being sarcastic here), you can come up with something pretty complicated like this, however, the model I’m going to show here is much more naive and easier.

Suppose a Markov chain with two states, market up and market down. Once you found the probabilities of each state, you can easily simulate a random walk (based on a Markov chain of course).

Here is the code for this model:

import numpy as np
from matplotlib import pyplot as plt
path = "C:\\"
#data is stored in a .txt file in format DOHLCV
def readData(stock):
file = open(path+stock+'.txt',"r")
data = file.read()
file.close()
return data
#We read data and get a list with each line in string format
data = readData('stock').split('\n')
#This function returns a list with float mean values
#from two columns (1st and 4th in this case)
def getColumn(list1):
dataAr = []
for i in range(len(list1)-1):
dataToGet = (float(list1[i].split(',')[1]) + float(list1[i].split(',')[4]))/2
dataAr.append(dataToGet)
return dataAr
#Here is our data (type(data[1]) = float)
data = getColumn(data)
#Calculate percentage change
def getPercentageChange(v1,v2):
pc = (v2-v1)/abs(v1)
return pc
#Given an array of numbers, this function returns an array
#with percentage changes between each pair of values
def getPcArray(list1):
pcArray = []
for i in range(len(list1)-2):
pcArray.append(getPercentageChange(list1[i],list1[i+1]))
return pcArray
dataPerc = getPcArray(data)
#Now we need to calculate conditional probabilities
negativesP = 0
negativesN = 0
for i in range(1,len(dataPerc)):
if dataPerc[i-1] <= 0 and dataPerc[i] > 0:
negativesP += 1
elif dataPerc[i-1] <= 0 and dataPerc[i] < 0:
negativesN += 1
positivesP = 0
positivesN = 0
for i in range(1,len(dataPerc)):
if dataPerc[i-1] > 0 and dataPerc[i] > 0:
positivesP += 1
elif dataPerc[i-1] > 0 and dataPerc[i] < 0:
positivesN += 1
#pUp_givenUp = probability of going up, given that
#the previous step was upwards
pUp_givenUp = positivesP/(positivesP+positivesN)
pDown_givenUp = positivesN/(positivesP+positivesN)
pUp_givenDown = negativesP/(negativesP+negativesN)
pDown_givenDown = negativesN/(negativesP+negativesN)
#Transition matrices for our model
stateUp = [pUp_givenUp,pDown_givenUp]
stateDown = [pUp_givenDown,pDown_givenDown]
#This function simulates n random walks according
#to the parameters specified and the Markov model.
def simulate(n):
k = 0
for i in range(n):
#Initial value
initial = 100
#Possible percentage change [up,down]
variat = [0.02,-0.02]
#Initial percentage change
invar = np.random.choice(variat,replace=True)
#List of simulated data
simulatedData = []
#Simulation process
for i in range(255):
if invar >= 0:
invar = np.random.choice(variat,replace=True,p=stateUp)
else:
invar = np.random.choice(variat,replace=True,p=stateDown)
initial = initial*(1+invar)
simulatedData.append(initial)
#x axis
index = list(range(len(simulatedData)))
#Plot the data
print(k)
k += 1
plt.plot(index,simulatedData)
plt.title(str(n)+" simulated random walks")
#Let's simulate 2 random processes
simulate(2)
plt.show()
# Output (conditional probabilities)
#
# pUp_givenUp pDown_givenUp pUp_givenDown pDown_givenDown
# 0.60360 0.39639 0.44806 0.55193

The graphs below represent respectively, 2, 200 and 500 random paths.


2 random walks
2_random_waks


200 random walks


200_random_waks


500 random walks


500_random_waks


Hope this was interesting.





Disclaimer
This article is for educational purpose only. The numbers are invented. The author is not responsible for any consequence or loss due to inappropriate use. It may contain mistakes and errors. You should never use this article for purposes different from the educational one.

Monday, 25 August 2014

A first really shy approach to Machine Learning using Python

The day before yesterday I came across Machine Learning: WOW… I got stuck at my pc for an hour wondering about and watching the real applications of this great subject.

I was getting really excited! Then, after an inspiring vide on YouTube, I decided it was time to act. My fingers wanted desperately to type some “smart” code so I decided to write a program which could  recognize the language into which a given text is written.

I do not know if this is actually a very primitive kind of Machine Learning program (I somehow doubt it) therefore I apologize to all those who know more on the subject but let me dream for nowSorriso.

Remember the article on letter frequency distribution across different languages?? Back then I knew it would be useful again (although I did not know for what)!! If you would like to check it out or refresh your memory, here it is.

Name of the program: Match text to language

This simple program aims to be an algorithm able to distinguish
written text by recognizing what language a text
was written in.

The underlying hypothesis of this model are the following:
1. Each language has a given characters distribution which is different from the others. Characters distributions are generated by choosing randomly Wikipedia pages in each language.
2. Shorter sentences are more likely to contain common words that uncommon one.

The first approach to build a program able to do such a task was to build a character distribution for each of the languages used using the code in the frequency article. Next, given a string, (sentence) the program should be able to guess the language by comparing the characters distribution in the sentence with the actual distributions of the languages.

This approach, for sentences longer than 400 characters seems to work fine. However, if the sentence were to be shorter than 400 characters, a mismatch might occur. In order to avoid this, I have devised a naive approach: the shorter the sentence, the more likely the words in it are the most common. Therefore, for each language,a list of 50 most common words has been loaded and is used to double check the first guess based on the character frequency only in case the length of the sentence is less than a given number of characters (usually 400).

Note that this version of the program assumes that each language distribution has already been generated, stored in .txt format and it simply loads it from a folder. You can find and download the distributions here.

#Characters used to build a distribution
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",",",";","-"]
#Languages supported
languages = ["english","italian","french","german"]
#A useful dictionary
distribDict = dict()
#The following functon takes a list and a string of characters,
#it calculates how often a certain character appears and then
#it outputs a list with character and frequency
def frequencies(string,letters):
list_frequencies = []
for letter in letters:
freq = 0
for i in string:
if i == letter:
freq += 1
list_frequencies.append(letter)
list_frequencies.append(freq)
return list_frequencies
#This function returns a list containing 2 lists with letter
#and frequencies
def fix_lists_letter(list_1):
list_letters = []
list_letters.append(list_1[0])
list_freq = []
for i in range(1,len(list_1)):
if i % 2 == 0:
list_letters.append(list_1[i])
else:
list_freq.append(list_1[i])
if len(list_letters) != len(list_freq):
return "Some error occurred"
else:
final_list = [list_letters,list_freq]
return final_list
#This function returns the relative frequencies
def get_rel_freq(list_1):
list_to_ret = []
for i in list_1:
list_to_ret.append(i/sum(list_1))
return list_to_ret
#This function should return the distribution of the characters
#in a given text by putting together most of the functions above
def returnDistribution(strings,alphaBet):
firstC = frequencies(strings,alphaBet)
finalC = fix_lists_letter(firstC)
letters = finalC[0]
frequenc = get_rel_freq(finalC[1])
distribution = [letters,frequenc]
nChar = sum(finalC[1])
#Note: Spaces " " are NOT considered as characters
print("Number of character used:", nChar, sep=" ")
return distribution
#This function loads each distribution into the dictionary distribDict
def loadDistribDict():
try:
for lang in languages:
fileToRead = open("C:\\Users\\desktop\\lproject\\"+lang+"Dist.txt","r")
data = fileToRead.read()
dist = data.split("\n")[1].split(" ")
distList = []
for number in dist:
if number == '':
number = 0
distList.append(float(number))
distribDict[lang] = distList
fileToRead.close()
print("Loaded",lang,"character frequency distribution!",sep=" ")
except Exception as e:
print(e)
#String to test
stringToCheck = "Hallo diese ist eine schoene Satze auf deutsch"
commonEnglishWords = [" is "," the "," of "," and "," to "," that "," for "," it "," as "," with "," be "," by "," this "," are "," or "," his "," from "," at "," which "," but "," they "," you "," we "," she "," there "," have "," had "," has "," yes "]
commonGermanWords = [" ein "," das "," ist "," der "," ich "," nicht "," es "," und "," Sie "," wir "," zu "," er "," sie "," mir "," ja "," wie "," den "," auf "," mich "," dass "," hier "," wenn "," sind "," eine "," von "," dich "," dir "," noch "," bin "," uns "," kann "," dem "]
commonItalianWords = [" di "," che ", " il "," per "," gli "," una "," sono ", " ho "," lo "," ha "," le "," ti "," con "," cosa "," come "," ci "," questo "," hai "," sei "," del "," bene "," era "," mio "," solo ", " gli "," tutto "," della "," mia "," fatto "]
commonFrenchWords = [" avoir "," est "," je "," pas "," et "," aller "," les "," en "," faire "," tout "," que "," pour "," une "," mes "," vouloir "," pouvoir "," nous "," dans "," savoir "," bien "," mon ", " au "," avec "," moi "," quoi "," devoir "," oui "," comme "," ils "]
commonWordsDict = {"english":commonEnglishWords,"german":commonGermanWords,"italian":commonItalianWords,"french":commonFrenchWords}
def checkLang(string):
distToCheck = returnDistribution(string,alphabet)
distToCheckFreq = distToCheck[1]
diffDict = dict()
#For each language we calculate the difference between the
#observed distribution and the given one.
for lang in languages:
diffList =[]
for i in range(len(languages)-1):
diff = abs(distToCheckFreq[i]-distribDict[lang][i])
diffList.append(diff)
diffDict[lang]=sum(diffList)
#verifica
for lang in languages:
print(lang,diffDict[lang])
langFound = min(diffDict, key=diffDict.get)
#If the sample sentence is shorter than 420 characters then
#we may have some recognition issues which will be dealt
#here below..
langChecked = ""
correct = False
if len(string) &lt; 420:
for langKey in commonWordsDict.keys():
for word in commonWordsDict[langKey]:
if word in string:
langChecked = langKey
correct = True
break
if correct:
break
if correct:
print("Lang found: ",langFound)
print("Lang checked: ",langChecked)
langFound = langChecked
#The language found is returned here
print("\n")
return langFound
loadDistribDict()
print("\n")
print("Language found by the program: ",checkLang(stringToCheck))

So far the program seem to work on text of different length. Here below are some results:


In these first two examples I used bigger sample sentences


Immagine 001


Immagine 002


In this last example, the sentence was really short, it was just 37 characters, something like: “Diese ist eine schoene Satze auf Deutsch”. In this case it was hard to draw a distribution which could match the German one. In fact the program found French and was really far away from the right answer indeed. The double-check algorithm kicked in the right answer (Lang checked).


Immagine 004


Hope this was interesting.

Weather forecast through Markov chains and Python

A Markov chain is a mathematical system that undergoes transitions from one state to another on a state space. It is essentially a kind of random process without any memory. This last statement, emphasizes the idea behind this process: “The future is independent from the past given the present”. In short, we could say that, the next step of our random process depends only on the very last step occurred. (Note that we are operating in discrete time in this case).

Let’s say that we would like to build a statistical model to forecast the weather. In this case, our state space, for the sake of simplicity, will contain only 2 states: bad weather (cloudy) and good weather (sunny). Let’s suppose that we have made some calculations and found out that tomorrow’s weather somehow relies on today’s weather, according  to the matrix below. Note that P(A|B) is the probability of A given B.

 

Markov chain weather visual

Therefore, if today’s weather is sunny, there is a P(Su|Su) chance that tomorrow will also be sunny, and a P(C|Su) chance that it will be Cloudy. Note that the two probabilities must add to 1.

Let’s code this system in Python:

# A simple Markov chain model for the weather in Python
import numpy as np
import random as rm
import time
# Let's define the statespace
states = ["Sunny","Cloudy"]
# Possible sequences of events
transitionName = [["SuSu","SuCl"],["ClCl","ClSu"]]
# Probabilities matrix (transition matrix)
transitionMatrix = [[0.8,0.2],[0.4,0.6]]
# Check that probabilities add to 1. If not, raise ValueError
if sum(transitionMatrix[0])+sum(transitionMatrix[1]) != 2:
print("Error!!!! Probabilities MUST ADD TO 1. Check transition matrix!!")
raise ValueError("Probabilities MUST ADD TO 1")
# A functions which implements the Markov model to forecast the weather
def weatherForecast(days):
# There is no reason to start from one state or another, let's just
# pick one randomly
weatherToday = rm.choice(states)
i = 0
print("Starting weather: ",weatherToday)
while i &lt; days:
if weatherToday == "Sunny":
#numpy.random.choice(a, size=None, replace=True, p=None)
change = np.random.choice(transitionName[0],replace=True,p=transitionMatrix[0])
if change == "SuSu":
pass
else:
weatherToday = "Cloudy"
elif weatherToday == "Cloudy":
change = np.random.choice(transitionName[1],replace=True,p=transitionMatrix[1])
if change == "ClCl":
pass
else:
weatherToday = "Sunny"
print(weatherToday)
i += 1
time.sleep(0.2)
# We forecast the weather for 100 days
weatherForecast(100)

Obviously the real weather forecast models are much more complicated than this one, however Markov chains are used in a very large variety of areas and weather forecast is one on them. Other real world applications include:
-Machine learning (in general)
-Speech recognition and completion
-Algorithmic music composition
-Stock market and Economics and Finance in general


For more information on Markov chains, check out the Wikipedia page.


If you are interested in Markov chains, I suggest you to check these two video series on YouTube which are (in my opinion) good explanations of the subject.
-Brandon Foltz’s Finite Math playlist, very clear explanation with real world examples and the math used is fairly simple. You just need to know a bit of matrices, operations on matrices and probability (but if you are here I guess you have no problems on this)
-Mathematicalmonk’s playlist on Machine Learning, where a more technical (formal) explanation is given in the videos on Markov chains, starting from here.


Hope this was interesting and useful.
-

Friday, 15 August 2014

Arduino module GUI (Beta version)

Hi everyone! I have just completed a first, very basic GUI to get the Aduino module (which by the way you can find here) more user friendly.

The GUI is still a “Beta” version since I have created it with PyQt4 which I started learning only 3 days ago. I bet there are plenty of features which could be improved. I am probably going to revise and update this GUI, however here is a first “raw” version which, as far as the connection and communication with Arduino Uno is concerned, works fine.

This application is for educational purpose only, any commercial purpose is excluded.

You can download the executable for windows here. For Mac and Linux users, the source code is included in the zip folder however I am not sure the program will work since I have no experience with those operating system and their USB settings. Feedbacks are much appreciated.

Here are some screenshots and some useful information:

This is the main screen. The three buttons at the bottom essentially sum up everything that this program should help you doing.

Immagine 001

First of all, you need to connect Arduino Uno to the USB port and load your program in.
Then you can click on connection on your GUI and click “Set up connection”. The default port is com3 and 9600 baud. You can easily change these default settings in the connection menu available in the GUI.

Immagine 002

Once the connection has been established you can start interact with Arduino Uno.

By clicking on the button “Read from Arduino” the program asks you how many lines to read, there’s still no default values however I suggest to read not more than 100 lines since the program might slow down or crash.

Immagine 003

By clicking on the button “Send to Arduino”, the program asks you to enter the data to send. You can enter one of the three data types showed below in the following form:
1. Integer: 2
2. Character: b
3. List: [2,3,4,5,6,7,8]

The list can be as long as you want it to be.

Immagine 004

Hope this is useful.

Wednesday, 13 August 2014

Controlling your Arduino Uno board with Python

Today I am going to talk about a particular topic which overlaps Arduino and computer programming.

Last month I was watching video about electronics and thinking about the solar controller which many solar stand alone kits use. For some reason they do not behave as they should (i.e. cut the power to the lights when the sun is rising and give full power in the evening). However, a far more important point is that for this kind of controller a centralized control system is not available. Therefore I had a great idea: let’s try to program a board such as Arduino, to do the job.

I bought an Arduino Uno, the basic model, which I guess is suitable for beginners as myself. You can check more details here.

Arduino is an open source project and can be programmed in a language which is similar to C. This point is fine, since I have some knowledge of C and can get by quite easily with beginner projects. Once the main code has been loaded on Arduino, you can interact with the board through the USB and the “shell”, and, for instance, give instructions to trigger some control flow structure such as if-if else- else. However, in some cases it does not work (still do not know why) or it is impractical, since you can enter only one value at time. It would be nice to control the board with some external tool which could let you write some script to execute. It turns out that there is a module which enables you to do such operations. This module is pySerial. It can be downloaded here and is available for Python 3 as well!

The module is great! It works really smooth and in a linear manner, as it should. However, Arduino accepts only raw bytes and binary code as input therefore some little amendments must be made to be able to communicate with it through Python 3. Note that there are some differences with Python 2 which I will not cover.

First of all, Python 3 wants naturally work with ASCII characters, therefore when sending an integer to Arduino Uno, our sweet board will understand anything but what we have sent. For characters such as ‘a’, the matter is somewhat at ease, since you just need to send the character with a b in front of it: b’a’. On the opposite side, when reading data, you need to convert it into a readable format. To solve all these “problems” I decided to build a simple class which essentially is a wrapper of some functions of Serial and can be used directly to send characters, lists and integers to Python. Perhaps I will add also the possibility to send floats and strings although I am not sure the latter can be send and understood from the board.

Here is a basic example of communication with Arduino Uno through Python 3.

First of all we need to load the following code to Arduino Uno. This simple code lights a LED light according to the value of  readData which is read from the USB port. Serial.print prints out the value of readData to the USB port.

int led = 13;
long readData;
// Setup
void setup()
{
pinMode(led,OUTPUT);
Serial.begin(9600);
}
// The main loop
void loop()
{
readData = Serial.read();
Serial.print(readData);
Serial.print('\n');
// For some reason the default value is -1 but the LED
// accepts only values from 0 to 255 (off and on)
if(readData == -1)
{
readData = 0;
}
else if(readData == 2)
{
readData = 255;
}
else if(readData == 3)
{
readData = 30;
}else if(readData == 4)
{
readData =0;
}
analogWrite(led,readData);
// delay means that the program will wait 1000ms at each loop
delay(1000);
}

For instance, to control the board in this case, you can use this Python code:

import serial
import time
# Struct is useful to convert integers to binary numbers
# understandable from Arduino.
# For instance, try print(struct.pack('>B',2))
import struct
#Setup, port and speed
port = 'com3'
speed = 9600
# Connect to Arduino
# Note: if you need to write a script giving instructions to Arduino,
# please note that that after establishing the connection, Arduino
# sets itself up again, therefore, for the script to run smoothly and
# without exceptions, a time.sleep(2) is needed.
try:
arduino = serial.Serial(port,speed)
time.sleep(2)
print("Connection to " + port + " established succesfully!\n")
except Exception as e:
print(e)
a = input("Press y to start test, otherwise press anything but y")
# Note that arduino delays of 1000ms and here the time.sleep() is of 1 second.
if a == "y":
dataToSend = [2,2,2,3,3,3,4]
for i in dataToSend:
valueToWrite = struct.pack('>B',i)
go = arduino.write(valueToWrite)
time.sleep(2)
reply = arduino.readline().decode('ascii').strip()
print("""
Instruction number: %s
Bytes Written to port: %s
Binary Written to port: %s
Arduino Reply: %s
""" %(i,go,valueToWrite,reply))
# Close connection
a = input("To leave connection open enter 'openc'...press any key to esc...")
if a == "openc":
pass
else:
arduino.close()
print("\nConnection closed!")

However, writing this code again and again is boring and can easily lead to mistakes. Therefore I decided to built a class whose name is Arduino


By creating an Arduino object using the ArduinoClass, you can call the following methods:

#Create the Arduino object
uno = Arduino()
#The object initializes itself with stardard data such as port and speed
#which you can optionally edit when creating the object. By printing uno
#you will se the port and speed
print(uno)
#Here are some methods
#self.sendChar to send characters, you just need to input the character string
uno.sendChar('a')
#self.sendInteger(2,printR=False), prints a report if True.
uno.sendInteger(2,printR=False)
#self.sendIntArray(array,delay=2,printR=False). It lets you send lists of
#integers. Prints a report if True.
uno.sendIntArray([2,3,4])
#self.readData(nlines,printData=False,array=True,integers=False,Floaters=False)
uno.readData(20)
#Reads the first nlines sent from Arduino Uno. It returns a list of strings by
#default, however it can optionally returns a list of integers or floats by #switching the default values.
#And eventually the method to close the connection
uno.closeConn()

I am also working on a GUI with PyQt4 for this script. I will keep you posted.


The source code of the ArduinoClass is available here.


Hope this is useful!