Thursday, 30 July 2015

Estimating arrival times of people in a shop using R

Since the first time I was taught the Poisson and Exponential distributions I have been wondering how to apply them to model a real world scenario such as arrivals of people in a shop. Last week I was in a big shopping center and I thought that was the perfect time to gather some data so I chose a shop (randomly) and started recording on a piece of paper how many people arrived and their inter-arrival time.

I went through this process for an hour, in the middle of the afternoon (around 4 to 5pm) during a week day where the influx of customers should be approximately uniform, or so I am told. Here you can download in .txt format, the data I gathered. Note that even though the file is a text file, it is arranged as a .csv so that R can easily detect data.

What is an inter-arrival time? It is the interval of time between each arrival. Assuming that the arrivals are independent, their distribution is exponential. This assumption is further confirmed below by the blue histogram of the observations.

Summary of gathered data:
- I observed the arrival of customer for 58.73 minutes (around an hour)
- During that time 74 “shopping groups” entered the shop. “A shopping group” here is a broad label for a group of 1, 2, 3 or 4 actual customers. The red histograms describes the phenomena in greater detail.


As you can see it seems reasonable to assume that inter-arrival times are exponentially distributed


The red histogram above clearly shows that the majority of “shopping groups” is composed by either a single customer or a couple of customers. Groups of 3 or 4 people are less likely to appear according to the observed data. A good estimate of the probability of the number of people in a shopping group is the relative frequency of the observations.

The following code produces the two plots above

Now that we have uploaded the data, it is time to simulate some observations and compare them to the real one. Below in green is the simulated data.



The code used to simulate the data is here below

Now, the graphs above are pretty, but they are not much use, we need to compare the two sets of graphs to get an idea if our model is fine for this simple task.

Rplot04 Rplot06


It looks like our model tends to overestimate the number of arrivals in the 0-0.5 minutes section, however this is just one simulation and the average arrival time is close to what we expected using this model. It seems reasonable to use this distribution to estimate the inter-arrival times and the number of people arrived in the shop.

The R-code

Finally, let's have a look at the ideal exponential distribution given the estimated parameters. Note that some adjustments are necessary because our model is set so that the simulated times are hours, but for practical purposes it is better to work with minutes. Furthermore, by using the exponential distribution formulas we can calculate some probabilities.


By running the code we find out that the probability of the waiting time being less than 3 minutes is 0.975 meaning that it is very likely that one customer group will show up in less than three minutes.


This model is very basic and simple, both to fit and to run, therefore is very fast to implement. It can be useful to optimize the number of personnel in the store and their function, according to the expected inflow of customers and the average time needed to help the current customers. The hypothesis seem reasonable, given the time of the day considered and provided that the shop does not make some promotion or discount or any other event that could suggest a different behaviour of customers (for instance when there is a discount, it is likely that customers might tell each other to visit the store). For different times of the day, different parameters could be estimated by gathering more data and fitting different exponential distributions.

Monday, 27 July 2015

Complex functions and flow around a cylinder

Last semester I attended a course in complex analysis and I was introduced to the basics tool used in this field. I was fascinated by almost anything that I was taught and although some topics were a little bit difficult to grasp, I am sure it was worth it all the way. Complex Analysis is great for many applications, it is a very powerful tool that enables you to solve problems in the real domain that you would have not otherwise been able to solve.

Among the many applications of complex functions, potential flow is one that took my attention, mainly because it is easygoing for a novice in this field (as I am) but also because of the nice graphs one can plot using it and their physical interpretation.

While looking up for some materials I stumbled upon this paper.

It is quite a nice piece of paper, since it gives a little bit of introduction to complex functions and how they can be used to model flow.

An ideal flow is a flow that is both incompressible and irrotational, that’s to say, if F represents the fluid velocity at each point (x,y), then the following conditions must hold in order for the flow to be ideal:


The conditions above require that the flow has no vorticity (curl) and that the volume of the fluid does not change as it flows (divergence equal to zero).

Theorem 4.5 in the paper connects the velocity vector field and complex (analytic) functions. By then integrating the velocity one can find out the streamlines of the fluid flow.

If the velocity F(z) admits a complex anti-derivative,

clip_image002[11] and


Therefore the gradient of phi is equal to F and the real part phi(x,y) defines a velocity potential for the fluid flow. The harmonic conjugate psi(x,y) to the velocity potential, in fluid mechanics, is known as the stream function.

Example (4.8). Consider the complex potential function


By making some calculations we can find out


with streamlines


asymptotically horizontal at large distances.

By using Matlab, I implemented this example and plotted some plots about our Gamma function:


Now we plot the velocity (gradient of the real part of Gamma) and the the streamlines (psi).


Last but not least, the contour plot of the absolute value of Gama and of its real and imaginary part.


As you can see, the stream function shows that this model can be a first raw step in the way of modelling flow around a steady cylinder. This model however is far from being perfect: among other assumptions it assumes no friction and laminar flow. Laminar flow  can be assumed when the velocity of the fluid is very low and in many practical application it is not often the case. Anyway, this simple model surely can draw some attention to complex analysis and its tools.

Sunday, 26 July 2015

2D heat and wave equations on 3D graphs


While writing the scripts for the past articles I thought it might be fun to implement the 2D version of the heat and wave equations and then plot the results on a 3D graph.
As for the wave equation, Wolfram has a great page which describes the problem and explains the solution carefully describing each parameter. Here is a little animation I made using the solution they described

The code for this example is available to be downloaded here.
As far as the heat equation is concerned, I opted for a classical example where you have a hot spot in the middle of a square membrane and given an initial temperature, the heat will flow away as expected. Note that in the heat equation animation the colorbar does not change as time goes on. This can be edited by moving the colorbar into the animation function.

download the code for the heat equation.

On my youtube channel you can find many other animations of the wave equation.

Sunday, 19 July 2015

The wave equation

Imagine you have an ideal string of length L and would like to find an equation that describes the oscillation of the string. Assuming the string is fixed at its ends and starts its motion in a known position f(x) the simplest assumption one can make is that the acceleration of each piece of the string is somewhat proportional to the curvature of the string as such:


We can express the considerations above in the following way:


The equation above is a partial differential equation (PDE) called the wave equation and can be used to model different phenomena such as vibrating strings and propagating waves.The constant term C has dimensions of m/s and can be interpreted as the wave speed.
It turns out that the problem above has the following general solution


The thing that strikes me about this equation is how powerful the solution is. To think about it, any function that has the argument x-ct or x+ct or a combination of both is a solution to the wave equation. This means that we can model a lot of different waves!  Furthermore, as you could probably spot, the general solution is a combination of a wave travelling to the left and one travelling to the right.
Assuming let’s try the following solution


By implementing the equation in Python for a string of length 2pi and of speed 1 m/s we obtain the following animation:

The range of animation you can run is infinite. I tried out a couple of solutions:

Shortening the string, L=pi

Click here to download the code for the above video.

Trying out f(x-ct) + g(x+ct)= cos(x-ct)**3 + cos(x+ct)**3

Click here to download the code for the above video.

The solution

yields the following:

Click here to download the code for the above video.

Note how the resulting wave (cyan) is the sum of two cosine waves travelling in opposite directions.

What happens if you input the f(x-ct) term only and set g = 0? Basically what you get is a single travelling wave. The same happens if f = 0 and g = g(x+ct).

Click here to download the code for the above video.

Hope this was entertaining and helpful!

Saturday, 18 July 2015

Electric circuits 101 RC and RL circuits

RC and RL are one of the most basics examples of electric circuits and yet they are very rich in content. Let’s examine each one carefully

Here is the schematics of an RL circuit


The components used in an RL circuit are a DC voltage source (V1) a resistor (R1) and an inductor (L1). The inductor in the schematics is used to represent physical characteristic of the circuit. If I had to describe what the inductor represents I would say it represents the ‘electrical inertia’ of the circuit. As currents flows into the circuit it generates a magnetic field, that change in the magnetic field causes a change in the flux of the field concatenated to the circuit, this in turn, by the Faraday-Neumann-Lenz law generates a voltage in the circuit that is opposite to the voltage that is generating the magnetic field (V1). This is the reason because the current in the circuit will not jump immediately to its full value of V/R given by Ohm’s law. The current will eventually reach the value V/R after an infinite amount of time, but for practical purposes we can consider it at that level after some time has passed (namely 4 times tau = L/R should be sufficient).

It so happens that, given the materials and the geometry, the inductance coefficients of the circuit I am modelling is 0.0229 H. Note that inductance depends only on these two parameters (materials and geometry). I calculated the coefficient using FEMM since the analytical calculation was not that easy to do. You can download FEMM here: . This software is useful for many electromagnetic simulations and fairly user-friendly.

What we would expect is that the current will obey the following differential equation given by Ohm’s law at each point in time:


The equation above yields the following solution


Now, the same phenomenon (reversed) happens when you open the circuit, the inductor opposes the change in flux by generating a voltage that tries to keep up the shrinking flux of the magnetic field through the circuit. That means that, if you were to switch on and off the current at particular intervals you would obtain this behaviour:


As you can see, at time 0 the circuit is closed and the current is reaching its theoretical value, after 4 times tau (time constant of the circuit) the circuit is opened again and the current exponentially decays to zero. By knowing tau, one could find the frequency at which the switch should click in order to maintain this behaviour. This process yields the on and off cycle (in green). Where the green function hits the x axis the switch clicks and opens/closes the circuit.

Here below you can find the script I used to produce this graph and model the circuit.

If you are not satisfied with this crude simulation, perhaps you could use LTspiceIV which is a great free simulation software for electrical and electronic circuits. I used it to produce the schematics above and to simulate the behaviour of the current in the first 40ms after startup. Below is the LTspice simulation graph


As you can see by plotting the current through the resistor in LTspice, it seems our calculation were right!

Now let’s move on to the RC circuit, here it is:


Aside from the voltage source, the RC circuit is composed of a capacitor and a resistor, we therefore assume that self-inductance is negligible. When we close the circuit, there is no inductance here so the current will just jump to the V/R value and since the capacitor is charging up and building a voltage, we can expect the current at the resistor to drop as times goes by. The differential equation we have now is the following:


By solving it, one finds:




The current through the resistor drops exponentially, depending on a time constant tau = 1/RC. This behaviour is indeed what we find by running the simulation in Python and LTspice



Python code used for the RC circuit:

Thursday, 16 July 2015

PDEs time again: the Transport equation

Apparently, it looks like I’m all focused on PDEs at this time of the year. There’s a good reason though! I find PDEs extremely fascinating in that they allow you to model complex behaviours of objects such as waves propagating and strings vibrating and so on. If not for their beautiful mathematical content you must love them for their applications in physics!
Take the transport equation, imagine you’d like to model the behaviour of an ocean wave. Well, in an ideal world, once generated, our wave would go on forever at a constant speed, say k. In order to find the function that represents our wave at each point at every time t, one should solve the following problem


It turns out solutions are all in the form of:


For those of you already familiar with the wave equation, this should look extremely familiar. It is indeed the equation of a wave travelling towards the positive x at speed k
If we choose a Gaussian function for instance, then we would obtain the following:


and by running the same code we used for the heat equation, here is what we get:

Since the code is pretty much the same as the heat equation I will not post it here. However, it is downloadable via dropbox by clicking here if you’d like to check it out.
Now a question should arise. Our wave is an ideal wave, it does not take into account friction and if you think of an ocean wave, it dissolves itself quickly as it meets the beach. Can we model this with the transport equation? Sure! We can use the transport equation with exponential decay. The problem we need to solve now looks a little different:


where gamma is a constant which relates to the decay rate. The general solution now looks something like this


Note that it decays and tends to zero as time tends to infinity and that it is still a wave travelling ‘to the right’. If we then use the same Gaussian as before, we obtain a much more realistic wave. Here is a short clip generated using the code at the bottom of the page (the animation is slower since I set dt=0.01 for a better resolution).

You can play with the parameters I set and see how the behaviour of the wave changes!

Heat Equation part 2 a slight modification

While writing the code for the previous post I slightly modified the code in order to add 2 ‘peaks of heat’. Now, by looking at what I wrote I am not quite sure that the solution is consistent (physically speaking). At first glance we can see that there are three ‘heat spots’. If you imagine of heating the rod in two spots and cooling it in between, the situation is the following:


By setting L = 3*pi and making some other small changes we obtain:

Physically we could interpret the result as follows:
the heat flows spontaneously towards areas where the temperature is lower, therefore the temperature should go up in the middle of the rod (since it has been cooled down and has a lower temperature compared its surroundings) and go down in the sides. Assuming that the left and right boundaries of the rod are kept at constant temperature through an ideal thermostat.

Why not  checking if this solution is mathematically and physically consistent?

Wednesday, 15 July 2015

The Heat Equation: a Python implementation

By making some assumptions, I am going to simulate the flow of heat through an ideal rod.
Suppose you have a cylindrical rod whose ends are maintained at a fixed temperature and is heated at a certain x for a certain interval of time. Suppose that the temperature in each section with infinitesimal width dx is uniform so that we can describe the temperature in the rod using a function of only x and t.

Mathematically speaking, problem we are now facing is the following:


where k is a constant called thermal diffusivity and is different according to the different materials. By using the method of separation of variables, we can find the solution we need and by applying the initial conditions we find a particular solution for f(x) = sin(x) and L = pi
Our solution looks something like this:
Now we only need to evaluate our function at each x and t. Remember that if u(x,y) is differentiable, then:


holds. We can throw out the last term and approximate our function using the above relation since partial derivatives of u must exists and we can easily get them. Thinking about it, the second term is useless too, since we are not moving along the x axis, therefore we are left with the following:


By using matplotlib and the animation function I generated this short clip:

Here is the python implementation of the solution and the code used to graph the solution evolving with time.

Hope this was useful!

Saturday, 11 July 2015

Estimating data parameters using R

Say we have some data and we are pretty confident that it comes from a random variable which follows a Normal distribution, now we would like to estimate the parameters of that distribution. Since the best estimator for the population mean is the sample mean and the best estimator for the variance is the corrected variance estimator, we could use those two estimators to compute a point estimate of the parameters we need. But, what if we would like to have a rough idea of what could be the range of those parameters within a certain level of confidence? Well, then we would have to find an interval that contains the parameters at a, say, 5% confidence level.

In order to do this, since the variance is unknown and needs to be estimated, we use the Student-t distribution and the following formula for the two sided interval:


In the same way, we could find unilateral boundaries for the value of the population mean, simply by adjusting the formula above:


Furthermore one could visualise the outcome by plotting the selected regions on a standard t-Student with n-1 degrees of freedom:

1) Confidence intervalRplot012) Upper boundRplot023)Lower boundRplot03

Here below is he implementation in R of the process described above:

By clicking here you can download the code of the plotting function I used to generate the plots above.