The Beginner Programmer: Simulate data with R

Wednesday, 23 July 2014

Simulate data with R

Last semester I was attending a boring class, even though the professor was really clever, he was always bouncing around the main theme and never got straight to the point. While thinking about everything but the class, I had an idea: when you are given a set of data, say X and Y, you can easily compute a linear regression model, e.g. the regression line, and find out information on the data. Now, you will also find information on the error that the linear model made in predicting the data. By finding out the distribution of the error you can somehow simulate data similar to the original, from the regression line, by simply adding a random error (whose distribution is known) to the predicted data.
Furthermore, we know from the regression line that the expected error is 0.

Here is the code to implement this idea in R. You can get the data to work on in the bottom of the page.

The result should look something like this: In blue the actual data and in red the simulated one.