Saturday, 11 July 2015

Estimating data parameters using R

Say we have some data and we are pretty confident that it comes from a random variable which follows a Normal distribution, now we would like to estimate the parameters of that distribution. Since the best estimator for the population mean is the sample mean and the best estimator for the variance is the corrected variance estimator, we could use those two estimators to compute a point estimate of the parameters we need. But, what if we would like to have a rough idea of what could be the range of those parameters within a certain level of confidence? Well, then we would have to find an interval that contains the parameters at a, say, 5% confidence level.

In order to do this, since the variance is unknown and needs to be estimated, we use the Student-t distribution and the following formula for the two sided interval:


In the same way, we could find unilateral boundaries for the value of the population mean, simply by adjusting the formula above:


Furthermore one could visualise the outcome by plotting the selected regions on a standard t-Student with n-1 degrees of freedom:

1) Confidence intervalRplot012) Upper boundRplot023)Lower boundRplot03

Here below is he implementation in R of the process described above:

By clicking here you can download the code of the plotting function I used to generate the plots above.

No comments:

Post a Comment