Definitions

Download 15.28 Kb.

Date	17.12.2020
Size	15.28 Kb.
	#55019

Day-2-Week-2
zhang-routledge, Water treatment

Examples

Week 2 Day 2

Quan Sam

1/10/2020

Definitions:

Inferential situation is when we take data from samples and make generalizations about a population allowing us to make predictions.

We use 2 indicators for the estimation:

Indicator of position or the average (Xbar): is calculated as

Indicator of dispersion or the variance(^{2}): is the expectation of the squared deviation of a random variable from its mean, in other words, measures how far a set of numbers is spread out from their average value. It is calculated as

Examples:

First, we generate 11 different samples, and

means_simulated=vector(mode = "numeric", length = 11)

a=rnorm(20,85,2)
b=rep(1,20)
means_simulated[1]=mean(a)
plot(a,b,cex = .5, col = "dark red")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "dark red")

a=rnorm(20,85,2)
b=rep(1.025,20)
means_simulated[2]=mean(a)
points(a,b,cex = .5, col = "pink")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "pink")

a=rnorm(20,85,2)
b=rep(1.05,20)
means_simulated[3]=mean(a)
points(a,b,cex = .5)
points(x=mean(a),y=0.8,pch=24,cex=0.5)

a=rnorm(20,85,2)
b=rep(1.075,20)
means_simulated[4]=mean(a)
points(a,b,cex = .5, col = "green")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "green")

a=rnorm(20,85,2)
b=rep(1.1,20)
means_simulated[5]=mean(a)
points(a,b,cex = .5, col = "purple")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "purple")

a=rnorm(20,85,2)
b=rep(1.125,20)
means_simulated[6]=mean(a)
points(a,b,cex = .5, col = "tomato")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "tomato")

a=rnorm(20,85,2)
b=rep(1.15,20)
means_simulated[7]=mean(a)
points(a,b,cex = .5, col = "plum")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "plum")

a=rnorm(20,85,2)
b=rep(1.175,20)
means_simulated[8]=mean(a)
points(a,b,cex = .5, col = "gold")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "gold")

a=rnorm(20,85,2)
b=rep(1.2,20)
means_simulated[9]=mean(a)
points(a,b,cex = .5, col = "dimgrey")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "dimgrey")

a=rnorm(20,85,2)
b=rep(1.225,20)
means_simulated[10]=mean(a)
points(a,b,cex = .5, col = "aquamarine")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "aquamarine")

a=rnorm(20,85,2)
b=rep(1.25,20)
means_simulated[11]=mean(a)
points(a,b,cex = .5, col = "firebrick1")
points(x=mean(a),y=0.8,pch=24,cex=0.5,col = "firebrick1")
mean(means_simulated)

## [1] 85.15239

mean(means_simulated)

## [1] 85.15239

abline(v = mean(means_simulated), col = "red")

We can conclude that although we do not have the mean values of the whole population, we can estimate it based on the mean values of 11 samples as they are approaching to the mean values of the population. Now, we generate 30 values.

weight = c(56.6,54.8,59.0,60.4,61.8,62.6,65.0,58.1,61.4,60.8,59.2,58.1,57.5,55.2,54.6,61.6,56.9,61.3,67.2,53.9,54.1,62.0,63.5,58.1,56.0,51.5,63.8,58.1,58.2,61.3)
mean(weight)

## [1] 59.08667

sd(weight)

## [1] 3.655203

If we only have a sample of 30 values, we still can estimate the mean value of the population if the mean value of the sample lies in the confidence interval of .

The confidence interval is spanning from mean(weight) - 2*sd(weight)/sqrt(30) to mean(weight) + 2*sd(weight)/sqrt(30).

Download 15.28 Kb.

Share with your friends: