The Beginner Programmer: July 2014

Wednesday, 30 July 2014

(Naive) RSA encryption with Python

Please before continue reading, make sure to read the disclaimer at the bottom of this article.

RSA is a well-known cryptosystem used in many cases where secure data transmission is needed. It basically rely on the also well-known issue of factoring big numbers. As you may recall from high school, each number has a unique prime number factorization. This is the very strength of RSA. In order to decrypt a message encrypted with RSA you would need to factorize a REALLY BIG number and this problem may take a very long time to be solved, it may take so long that it becomes unpractical to be achieved.

If you want to get more on RSA click here.
Here is the algorithm carefully described.

I have always been fascinated by encryption and cryptosystems. For those who did not know, Alan Turing, during the Second World War, devised a machine which made it possible to decrypt Enigma code in relatively short time and therefore gain a significant advantage. This machine was basically the first computer. You can watch this incredible piece of history on you tube. Here are some interesting videos:

Numberphile Enigma video
An interesting documentary

Now let’s get to the necessary premises and the code!
First of all, to start off you need two big primes, p and q. I do not know how they usually get these primes (I haven’t covered it yet) but anyway, they are almost all you need to start your “naive” RSA encrypting system in Python. We will use small numbers for the sake of simplicity and this is basically one of the factors that led me to use the word “naive” since it could be easily broken in a short time.

Here is the result which should be printed out:

Therefore, hypothetically, Bob, who needs to send a message (512) to Alice, would use Alice’s public key to encrypt 512 and then send her the number 477. Alice would then use her private key to decrypt 477 and get back 512, the original message.

Hope this was interesting.

Disclaimer: This article is for educational purpose ONLY. If you need a security system you should ask to professionals who are competent in the field. The author of the article is by no means a professional or an expert in this field and might, therefore, make big mistakes. This code must NOT be used for anything other than educational purpose. The provider of this code does not guarantee the accuracy of the results and accepts no liability for any loss or damage that may occur as a result of the use of this code. Understanding and agreeing to the terms of this disclaimer is a condition of use of this code. By reading the article you confirm you have understood and will comply with this disclaimer.

Plain vanilla BlackJack simulation with R

Please before continue reading, make sure to read the disclaimer at the bottom of this article.

Here is a simulation I run with R in the same period I created the poker one.

I have just decided to call it plain vanilla since neither double down or split pairs are allowed. Rules are as basic as they can be.

The code looks like messy, I know, If I had to do it now, I guess I would do many things differently, however, the results looks fine, and the code runs fine as well, therefore it is not entirely to be thrown away I believe.

The simulation is divided in two parts. In the first one I looked for the probabilities for each possible event (that’s to say: “player wins”, “tie”, “dealer wins”). Since the code looks pretty hairy, many explanations are provided.

A small but important note: if you want to run the simulation, since it is a bit demanding computationally speaking, in order not to crash RStudio you may want to run a line a time and not the entire code all at once, or just maybe use only some functions, just be careful because since we run the simulation a big number of times and the functions are not the fastest to be run, your computer might complain a little.

Here is the code:

Let’s have a look at the results that we have got:

By simply standing with a hand equal to 19, you can expect to win 59% of the times, and loose 28%. A tie is likely to occur 13% of the times.

Let’s now simulate 2000*100 games with a hand equal to 18 and 3 decks and then average the probabilities:

In this case the odds seem to be a lot worse than before, by just subtracting 1 point to the player’s hand we raised the probability of win for the dealer from 28% to 43.4%.

Why not run the simulation for each hand from 16 to 21 and then plot the results? Warning: your computer may yell at you after this demanding step!

As we could expect, as the player’s hand gets higher, the probabilities of winning increases. For some reason which I ignore, R did not print in plain the probabilities of winning for the dealer, however you can easily get them from the table above or by calculating for each row 1 - Prob of Win – Prob of a tie, since the three events should (must) sum up to one. It is interesting to note that the probability of a tie is more or less near 12-14% and that, as we expected, it is equal to 0 when the hand is equal to 16 (since the dealer must draw on 16 and therefore a tie is not possible).

Here are the plots:

This is the end of part 1 of the simulation.

And now the second part. Here we want to know what are the probabilities of not bust when we draw 1 or 2 cards given a certain hand.

Let’s have a look at the new results we have got:
This is a hand we have got by asking 2 times card from a single deck.

with a starting hand equal to (10,”A”) if we were to ask 2 cards, here are the probabilities that we should face:

Eventually, we can compute the probabilities of “survival” and bust for each hand from 16 to 17 and plot the results.

The results looks pretty interesting and it is nice to see that the probability of bust if we ask for 1 card and our starting hand is 11 is 0 as we could easily predict, however, if we were to ask 2 cards, our chances of survival would suddenly drop to 25%. A big leap!

Hope this was interesting.

Disclaimer: This article is for educational purpose ONLY. Odds generated by this code are calculated by a random simulation. As such the odds will represent an approximation of the true odds. They might even be completely wrong or misleading. This code must NOT be used for anything other than educational purpose. The provider of this code does not guarantee the accuracy of the results and accepts no liability for any loss or damage that may occur as a result of the use of this code. Understanding and agreeing to the terms of this disclaimer is a condition of use of this code. By reading the article you confirm you have understood and will comply with this disclaimer.

Monday, 28 July 2014

A BlackJack game simulator with Python

Hi everyone! Here is another one of the first projects I have developed. The project is quite simple as the name tells: A blackJack game simulator.

The modules used to build this game are the following:
-The Random module, mainly the sample function to sample from a deck of 52 cards.
-EasyGui. Again, another one such as tkinter would have probably been better however I did not know even the name of tkinter at the time I built this.

Some notes on the game:
-Double down is allowed
-The scores are the same as most real BlackJack games
-You can set your initial amount available and the maximum bet (you could even set a negative bet I guess it should work even in that case Occhiolino ah ah).
-There still no insurance available and splitting is not allowed (perhaps I will implement this when I’ll improve the GUI)

IMPORTANT NOTE: Please, before running the script with Python make sure to set the path to the folder where the images are located so that they can appear on the EasyGui window.

You can download the source code here.

Here are some screenshots

Hope you enjoy.

Letter frequency with Python

Let’s say that your favourite subject is languages and comparisons between different languages, or that you enjoy as a hobby decrypting simple codes. Well then, with Python you have found the right tool to use! Occhiolino

Letter frequency, however, is a topic studied in cryptanalysis and has been studied in information theory to save up the size of information to be sent and prevent the loss of data. In fact if the most frequent letter in a language is, say “e”, then it is convenient to use the “least expensive” (in terms of amount of information) way to send that piece of information by reducing the number of bytes sent. For instance, if you were to send binary code, you could use the number 0 to represent “e”.
This is a basic underlying idea in many famous codes. If you would like to get a short introduction to this topic, check this video.

Another example, which uses techniques based on a similar concept is data-compression. Check this great video for a general introduction to data-compression.

Some encryption techniques, such as Caesar cipher and other basic ciphers, can be easily decrypted by spotting the frequency of occurrence of each character and then “guessing” what it should represent by comparing its frequency to the frequency of letters in the language the original message was written in. In fact, this decryption technique can be used for each encryption method which does not uses different symbols to represent the different occurrence of the same character. What do I mean by this? Well, imagine that you need to encrypt this: “bbbb”, now, if you decide to use a Caesar cipher and say, using a shift of 23, your encrypted message will look something like this “yyyy”. Each additional “b” will be converted into a “y” no matter what. This is a soft spot of all those encryption techniques which follows similar schemes.

By using Python, you can easily build a program to run through a long string of text and then calculate the relative frequency of occurrence of each character. Below is the code I used to build this simple program:

Once I built the code, I ran it a couple of times on some wikipedia pages written in English, French, Italian and German, below you can find the results of this process. I should mention that my code missed a lot of characters like è,é,à,ò,ù and the german umlaut. However you can easily add these by simply adding them to the alphabet list. On the y axes is represented the relative frequency of occurrence (in percentage).

And the same graphs sorted.

I do not know if there is a given distribution for each language, I doubt this, however we can clearly see that some letters are much more frequent than others. The letter “e” seems to be pretty common in all the four languages.

Hope this was interesting.

Calculating VaR with R

Simulations can be useful in an unimaginably large number of scenarios. Finance in particular is a field of study where maths and statistics have made led to great advances (sometimes for the good, sometimes for the bad). Value at Risk is just another example of subject where a simulation approach could be handy.

But, what is VaR? VaR is an indicator used in risk management, it represents the maximum potential loss which can occur to a portfolio of a certain investor, say 95% of the times. Well, actually, it could be better to say that 5% of the times the loss will be larger than what VaR predicted (and it could be way larger). In this case we say that we are calculating VaR with 5% confidence.

There are at least three ways of calculating VaR:
-Parametric VaR
-Historical VaR
-Monte Carlo VaR

Let’s see each of them. For simplicity we will assume that our hypothetical investor has only one type of stock in their portfolio and that the holding period N is equal to 1.

Parametric VaR: Here is the formula

Where W0 is the value of the portfolio at time of calculation, N is the holding period, sigma is the daily volatility and Z is the inverse of the normal distribution for 1 minus alpha which is the confidence level. (If alpha is 5% then Z is approximately –1.64, note however that VaR is a positive quantity). The use of the normal distribution of course hides important assumptions which often are fundamental for the reliability of these methods.

Historical VaR:

HR are the historical returns and Percentile is the quantile function in R applied to the historical returns. Note that there is no square root of N, since the holding period is equal to 1. If holding period > 1 day you should multiply this for N as above.

Monte Carlo VaR:
With this approach you simulate a stochastic process which represent the path of the stock and then once you have calculated the logarithmic returns you just check the 5% percentile return and multiply it for the value of the portfolio at time 0.

Let’s see how to implement all this in R. The data used has been invented, and is downloadable from here

Here are some results:

The results tells us that our investor should experience losses greater than 2835.56 (or 1843.85) only 5% of the times. Usually this two values should not differ that much, however, considering how they are structured and that the data I used is completely made up and too short for historical VaR, it is still fine that we got these results. Usually the time series from which data is gathered are very long, the longer the series the more precise the result. In the last version of VaR, once simulated the behaviour of the stock you just calculate the logarithmic returns and then take out the 95% percentile.

For more information, check the wikipedia page here.

Disclaimer
This article is for educational purpose only. The numbers are invented. The author is not responsible for any consequence or loss due to inappropriate use. It can contain mistakes and errors. You should never use this article for purposes different from the educational one.

Sunday, 27 July 2014

A simple roulette game simulator created with Python

Another project! This is one of the first game simulator I have tried to develop with Python. It’s basically a roulette simulator and has all the features a roulette should has, I guess!

Some notes:
-The roulette has only one zero (no double zero)
-Payoff are the same as a real roulette (no real money is involved, of course!)
-Outcomes should be uniformly distributed
-The graphics is not the best due to the fact that I used Easygui. Perhaps in the future I will implement the game with tkinter.

Here are the modules used:
-Random. The random module is necessary to simulate the random spin of the roulette.
-Easygui for the graphics. Easygui is a very simple module for user-interaction. I believe it is built on tkinter but has way less features. If you are new to GUI programming in Python I suggest starting with either tkinter or wxPython but not Easygui. Easygui is easier but not that useful nor customizable. On the contrary, the other two modules are harder to use for a new programmer but way more customizable and “complete”. Also the playability is somewhat ruined by the appearing and disappearing windows by Easygui.

You can download the source code here. Only to be run with Python.

Here are some screenshots.

A self-build module to work with integers

This post is an extension of the one named “more maths with Python”.

Since I wrote the original post I kept adding functions and improving the file containing the functions I posted. In the end I decided to use that file as a module in Python (you can check here how to build your own modules).
If you are interested in working with integers and managing big numbers in Python, you may want to try the module named gmpy2, you can easily google for it. It basically handles big numbers very fast and has many of the functions typical of number theory. It also has those showed in this post and the original one however they are much faster than the ones I made.

Anyway, here are some improvements I have made.

This function checks if n is prime. It returns True if it is, False otherwise. As you can see from its structure, it can be slow with really big numbers.

import math
def is_n_prime(n):
    i = 2
    while i <= math.sqrt(n):
        if n % i == 0:
            return False
        else:
            i += 1
    return True

The following function uses Fermat’s little theorem to check if a number n is prime. It returns True if it is probably prime, False otherwise. It may fail with Carmichael’s numbers such as 341. However its fail rate is low. It should be faster then the previous one.

def is_prime_fermat(n):
    test = (pow(3,n) - 3) % n
    if test == 0:
        return True
    else:
        return False

The function below uses the same strategy of the one above to look for prime numbers in the given range [n,q] where n < q and returns a list of primes. It is probably faster than the function I posted in the old article.

def range_primes_fermat(n,q):
    primes = []
    for i in range(n,q+1):
        test = (pow(3,i) - 3) % n
        if test == 0:
            primes.append(i)
    return primes

Here’s a function to find prime numbers less than a given number n.

def find_primes(n):
    i = 2
    primes = []
    while i <= n:
        if is_n_prime(i):
            primes.append(i)
            i += 1
        else:
            i += 1
    return primes

The function below returns the nth prime in a list of primes p, where p < n.

def find_nth_prime(n,nth):
    primes = find_primes(n)
    prime_to_return = primes[nth]
    return prime_to_return

I guess there is some way to speed them up, if that is the case, when I will find it out I am going to fix the functions and update them in a future post.

Hope this was useful! Enjoy!

First project: a (very) simple database management software.

Hi to everyone! Today I am going to upload a project I have recently developed to enhance my coding skills. The project is a simple database management software.

It is build using Python and the following modules:
-sqlite3 module for the database. I wanted to use the mysql module however there is still no version of this module available for Python 3. Anyway sqlite3 is a very light and fast module. It is pretty simple too.
-The GUI has been coded using tkinter. I do not know if tkinter is the best option but it is the first module for GUIs that I started learning and so far so good. It works good!

As of now, the software lets you do the following:
-Create a database
-Create and delete as many tables as you want (though only 3 fields are allowed as for now). You can create tables with up to 3 fields.
-Add and delete records
-Display records, all records or records according to defined criteria.
-Display available tables and table structure

Check the help section for allowed type of data. You can download the program from here:
https://www.dropbox.com/s/llycwz0fnlkhw11/software%20database_.zip

The source code sure has some redundancies which can be deleted. Please if you have some suggestion that would make the code work and or look better or be more manageable please leave a comment or let me now.

Here are some screenshots:

Saturday, 26 July 2014

Stochastic processes and stocks simulation

Please before continue reading, make sure to read the disclaimer at the bottom of this article.

Sometimes names of phenomena do not look like they suit the things they are attached to. In my opinion, that’s the case for stochastic processes.
Stochastic process is a fancy word to describe a collection of random variables, which should represent the path of a certain random variable followed over a period of time.

Stochastic processes are an interesting area of study and can be applied pretty everywhere a random variable is involved and need to be studied. Say for instance that you would like to model how a certain stock should behave given some initial, assumed constant parameters. A good idea in this case is to build a stochastic process.

A personal note: I do not believe these stochastic models actually performs good on stocks… At least not with these basics assumptions which I am going to list. I cannot see the reason why a stock should behave like these processes show. These processes are somehow “deterministic” in the sense that you can reasonably get to know how a stock should behave, financial markets however, have always shown to be irrational, non deterministic, and “explainable” only ex-post. In spite of all this, I still like how this kind of stochastic process works and the graph which comes out at the end looks like a stock! Furthermore I cannot hide that understanding financial markets is intriguing.

The basic assumptions are the following:
-Expected annual return and return volatility are known and constant (This is not realistic, furthermore if volatility is calculated on historical returns, there is no reason to believe it is actually capturing the future behave of the stock)
-Returns are normally distributed (Not realistic either. Returns proved themselves not to be normally distributed and to occur in larger magnitude than forecast)

The model is pretty simple, here it is:

Let’s set our scenario in R and generate the process:

Here is the summary of our 256 generated observation:

and the plot which looks realistic

Let’s compare this to a pure deterministic model where we assume a constant positive daily return of 30%/255

We can clearly see how the stochastic process uses the deterministic model as a base and then implements random shocks in.

Now, let’s have a look at the distributions

Not that much interesting, not as much as the returns, which we plotted below:

As you can see, returns are approximately normally distributed, and that’s consistent with our assumptions and the methods we used to simulate the changes in prices. It should be wise to note that drastic changes in prices are rare under these assumptions. However, the stock market proved that “extreme events” occur much more frequently than these models suggests. So, are these models to be thrown away? No, the drawings are nice and look similar to the real ones, but aside from this, I believe these models are an interesting starting point worth future development.

Disclaimer
This article is for educational purpose only. The numbers are invented. The author is not responsible for any consequence or loss due to inappropriate use. It may contain mistakes and errors. You should never use this article for purposes different from the educational one.

Thursday, 24 July 2014

The maths of Texas Hold ’em with R

Please before continue reading, make sure to read the disclaimer at the bottom of this article.

Every time I watch on tv some game of Texas Hold’em I am always curious about the small percentages which appear in the bottom corners of the screen and tell us the chances of win for each player. There must be some kind of algorithm which implements and refreshes those numbers at each draw.

Today I am going to write about one of the first simulations I put down as code and wrapped my head around. Now that I am a little smarter at coding than I was when I coded this, I believe this whole simulation could be done much better and with A LOT less code… Anyway I guess that is a good sign since the fact I found “mistakes” in my code should mean I improved at least on the very things I made mistakes on.

The code I am about to show estimates the probabilities of drawing a pair, a three of a kind (tris) and a poker.

Let’s proceed, first of all: we need to define the deck and the drawing function.

Let’s try our drawing function hand() and check the result:

Not that good draw we were looking for huh? Anyway that’s a complete game of Texas Hold’em (provided neither of the two player folded).

Why don’t we create 100 of these games? Here they are:

At this stage, if we run the code, R will generate three tables (or matrix) with the results of each one of the 100 simulate games. Something like this:

Now we need only to look for pairs, tris and pokers. We need to define 3 functions as follows:

The result should look like this:

In 100 games, we have got 47 pairs, 2 three of a kind and no pokers! Interesting data! However, this might be a mere case! We need to run this a LOT of times to be sure the odds we obtained are at least near the real ones. Let’s build a function that can do this. For instance, the function below runs n times 100 games and then collect the results. Note that It outputs the probabilities as the mean of the probabilities occurred. Much of the code here is replicated from the functions above. I guess this could have been done a lot better! If you have any idea please let me know or leave a comment!

Let’s try for instance to run this 10.000 times, with n = 100. Here are the results:

For debugging purposes our function outputs each poker it finds. Since usually pokers are not that frequent it should be fine. 10.000 times seems not to be enough…
Let’s do 100.000 times!!!

This looks better! By simulating in less than 2 minutes 100.000 games of Texas Hold’em with 2 players we concluded that drawing a poker two times in a row is a very unlikely event, drawing a pair is a relative common event while three of a kind is rare, but not as much as poker!

I should mention that, it looks like the probability of a poker is overestimated according to the formal calculation, I believe that is either because it is an “outlier” or because I did not run the simulation a big enough number of times. Anyway the other two probabilities seem fine (you can check for more information on http://en.wikipedia.org/wiki/Poker_probability)

Hope you enjoyed!

Pages

Wednesday, 30 July 2014

Monday, 28 July 2014

Sunday, 27 July 2014

Saturday, 26 July 2014

Thursday, 24 July 2014