Week 2 Day 2
Quan Sam
1/10/2020
Definitions:
Inferential situation is when we take data from samples and make generalizations about a population allowing us to make predictions.
We use 2 indicators for the estimation:
Examples:
First, we generate 11 different samples, and
means_simulated=vector(mode = "numeric", length = 11)
a=rnorm(20,85,2)
b=rep(1,20)
means_simulated[1]=mean(a)
plot(a,b,cex = .5, col = "dark red")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "dark red")
a=rnorm(20,85,2)
b=rep(1.025,20)
means_simulated[2]=mean(a)
points(a,b,cex = .5, col = "pink")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "pink")
a=rnorm(20,85,2)
b=rep(1.05,20)
means_simulated[3]=mean(a)
points(a,b,cex = .5)
points(x=mean(a),y=0.8,pch=24,cex=0.5)
a=rnorm(20,85,2)
b=rep(1.075,20)
means_simulated[4]=mean(a)
points(a,b,cex = .5, col = "green")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "green")
a=rnorm(20,85,2)
b=rep(1.1,20)
means_simulated[5]=mean(a)
points(a,b,cex = .5, col = "purple")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "purple")
a=rnorm(20,85,2)
b=rep(1.125,20)
means_simulated[6]=mean(a)
points(a,b,cex = .5, col = "tomato")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "tomato")
a=rnorm(20,85,2)
b=rep(1.15,20)
means_simulated[7]=mean(a)
points(a,b,cex = .5, col = "plum")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "plum")
a=rnorm(20,85,2)
b=rep(1.175,20)
means_simulated[8]=mean(a)
points(a,b,cex = .5, col = "gold")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "gold")
a=rnorm(20,85,2)
b=rep(1.2,20)
means_simulated[9]=mean(a)
points(a,b,cex = .5, col = "dimgrey")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "dimgrey")
a=rnorm(20,85,2)
b=rep(1.225,20)
means_simulated[10]=mean(a)
points(a,b,cex = .5, col = "aquamarine")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "aquamarine")
a=rnorm(20,85,2)
b=rep(1.25,20)
means_simulated[11]=mean(a)
points(a,b,cex = .5, col = "firebrick1")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "firebrick1")
mean(means_simulated)
## [1] 85.15239
mean(means_simulated)
## [1] 85.15239
abline(v = mean(means_simulated), col = "red")
We can conclude that although we do not have the mean values of the whole population, we can estimate it based on the mean values of 11 samples as they are approaching to the mean values of the population. Now, we generate 30 values.
weight = c(56.6,54.8,59.0,60.4,61.8,62.6,65.0,58.1,61.4,60.8,59.2,58.1,57.5,55.2,54.6,61.6,56.9,61.3,67.2,53.9,54.1,62.0,63.5,58.1,56.0,51.5,63.8,58.1,58.2,61.3)
mean(weight)
## [1] 59.08667
sd(weight)
## [1] 3.655203
If we only have a sample of 30 values, we still can estimate the mean value of the population if the mean value of the sample lies in the confidence interval of .
The confidence interval is spanning from mean(weight) - 2*sd(weight)/sqrt(30) to mean(weight) + 2*sd(weight)/sqrt(30).
Share with your friends: |