MLE, distribution fittings and model calibrating are for sure fascinating topics. Furthermore, from the outside, they might appear to be rocket science. As far I'm concerned, when I did not know what MLE was and what you actually do when trying to fit data to a distribution, all these tecniques did looked exactly like rocket science.
They are not that much complicated though. MLE is a technique that enables you to estimate the parameters of a certain random variable given only a sample by generating a distribution which makes the observed results the most likely to have occurred. Distribution fittings, as far as I know, is the process of actually calibrating the parameters to fit the distribution to a series of observed data.
Let's see an example of MLE and distribution fittings with Python. You need to have installed scipy, numpy and matplotlib in order to perform this although I believe this is not the only way possible. For some reason that I ignore, the methods in scipy.stats related to the normal distribution use loc to indicate the mean and scale to indicate the standard deviation. I maybe can grasp why use "scale" to indicate the stdv however I really do not get "loc" I do not understand why... If you know that, please leave a comment.
from scipy.stats import norm | |
import matplotlib.pyplot as plt | |
import numpy as np | |
# Generate an array of 200 random sample from a normal dist with | |
# mean 0 and stdv 1 | |
random_sample = norm.rvs(loc=0,scale=1,size=200) | |
# Distribution fitting | |
# norm.fit(data) returns a list of two parameters | |
# (mean, parameters[0] and std, parameters[1]) via a MLE approach | |
# to data, which should be in array form. | |
parameters = norm.fit(random_sample) | |
# now, parameters[0] and parameters[1] are the mean and | |
# the standard deviation of the fitted distribution | |
x = np.linspace(-5,5,100) | |
# Generate the pdf (fitted distribution) | |
fitted_pdf = norm.pdf(x,loc = parameters[0],scale = parameters[1]) | |
# Generate the pdf (normal distribution non fitted) | |
normal_pdf = norm.pdf(x) | |
# Type help(plot) for a ton of information on pyplot | |
plt.plot(x,fitted_pdf,"red",label="Fitted normal dist",linestyle="dashed", linewidth=2) | |
plt.plot(x,normal_pdf,"blue",label="Normal dist", linewidth=2) | |
plt.hist(random_sample,normed=1,color="cyan",alpha=.3) #alpha, from 0 (transparent) to 1 (opaque) | |
plt.title("Normal distribution fitting") | |
# insert a legend in the plot (using label) | |
plt.legend() | |
# we finally show our work | |
plt.show() |
The result should look somewhat like this:
Hope this was useful.
what does pdf stand for?
ReplyDeleteProbability density function
DeleteIt’s really a nice and useful piece of information. I am glad that you shared this useful information with us. Please keeps us to date like this .thank you for sharing.
ReplyDeletePython in-house Corporate training in Nigeria
I am so happy after read your blog. It’s very useful blog for us.
ReplyDeletePython in-house training for employees in Nigeria