Creating A Model To Create A Simpression Model

Submitted By Cai-Karl
Words: 1551
Pages: 7

From n00b to Pro
Jack Cai

PURPOSE
Create a simulator from scratch that:
•Generates data from a variety of distributions
•Makes a response variable from a known function of the data (plus an error term)
•Constructs a linear model that estimates the coefficients of the function
•Repeats generation and modeling many times to compare the average estimates of the linear model to the known parameters.
•Package the whole thing nicely into a function that we can call in a single line in later work.

•If you’re experienced, the commands themselves may seem trivial

Outline


1) Learning how to learn



2) Randomly Generating Data



3) Data Frames and Manipulation



4) Linear Models



BREAK – Quality of presenter improves



5) Running loops



6) Function Definition



7) More advanced function topics



8) Using functions



9) A short simulation study

Learning how to learn – Jack Cai


Google CRAN Packages to get the package list
•From here you can get a description of every command in a package.



?? searches for commands related to
•??plot will find commands related to plot



? calls up the help file for that command
•?abline gives the help file for the abline() command.

LEARNING HOW TO LEARN – JACK CAI


Exercises:



Name one function in the darts game package.



What is the e-mail of the author of the Texas Holdem simulation package?



(Bonus) Tell the author about your day via e-mail; s/he likes hearing from fans.



Find a function to make a histogram



Find some example code on the heatmap() command.

Randomly generating data – jack cai


The r commands randomly generate data from a distribution



rnorm( n , mean, sd)



rexp( n, rate)



rbinom( n, size, prob)



rt( n, df) From Student’s T. (Mean is zero, so setting a mean is up to you)



set.seed() Allows you to generate the same data every time, so you or others can verify work.

Generates from normal distribution (default N(0,1))

RANDOMLY GENERATING DATA – JACK CAI


Set a random seed



Generate a vector of 50 values from the Normal (mean=10,sd=4) distribution, name the vector x1.



Do the same with
• Poisson ( lambda = 5), named x2,
• Exponential (rate = 1/7) named x3,
• Student’s t distribution (df =5), with a mean of 5, named x4,
• Normal (mean=0, sd=20), named err



Make a new variable y, let it be 3 + 20x1 + 15x2 – 12x3 – 10x4 + err

Data frames – jack cai


data.frame() makes a dataframe object of the vectors listed in the ()



The advantage of having a data frame is that it can be treated as a single object..



Data frames, models, and even matrix decompositions can be objects in R.



You can call parts of objects by name using $



model$coef or model$coefficient will bring up the estimated coefficients



If no such aspect exists, then you’ll get a null response.



Example: Cai$height

Data frames – jack cai


Exercises:



Make a data.frame() of x1,x2,x3,x4, and y Name it dat



(if you’re stuck from the last part, run “Q3-dataframethis.txt” first)



Use index indicators like dat[4,3], dat [2:7,3], dat [4,], and dat [4,-1] to get
• The 3rd row, 5th entry of dat
• The 2nd – 7th values of the 5th column
• The entire 3rd row
• The 3rd row without the 1st entry

Linear models – jack cai


The results of the lm() function are an object.



Example: mod = lm(y ~ x1 + I(x2^2) + x1:x2, data=dat)



Useful aspects
• mod$fitted
• mod$residuals



Useful functions
• summary(mod)
• predict(mod, newdata)

Linear models – jack cai


Use the lm command to create a linear model of y as a function of x1,x2,x3, and x4 additively using dat data, name it mod. (No interactions or transformations)



Get the summary of mod



Display the estimated coefficients with no other