# Statistics Assignment

#### Statistics Assignment

SAMPLING DISTRIBUTIONS

READING: SECTIONS 5.1, 5.2

A POPULATION PARAMETER IS A SINGLE VALUE THAT DESCRIBES A POPULATION

CHARACTERISTIC (SUCH AS CENTER, SPREAD, ETC...).

POPULATION PARAMETERS ARE USUALLY UNKNOWN. AN IMPORTANT OBJECTIVE OF

STATISTICAL INFERENCE IS TO USE SAMPLE INFORMATION TO ESTIMATE PARAMETERS.

A STATISTIC IS A NUMBER COMPUTED FROM THE SAMPLE DATA ONLY. THE RESULTING

SAMPLE VALUE MUST BE INDEPENDENT OF THE DISTRIBUTION PARAMETERS.

THE SAMPLING DISTRIBUTION OF A STATISTIC IS THE DISTRIBUTION OF THE VALUES

TAKEN BY THE STATISTIC OVER ALL POSSIBLE RANDOM SAMPLES OF THE SAME SIZE

FROM A GIVEN POPULATION.

THE BINOMIAL DISTRIBUTION

A BERNOULLI TRIAL IS A RANDOM EXPERIMENT THAT CONSISTS OF TWO POSSIBLE

OUTCOMES THAT ARE REFERRED TO AS SUCCESS (S) OR FAILURE (F).

THE PROBABILITY OF SUCCESS IN A SINGLE BERNOULLI TRIAL IS DENOTED BY p

PROBABILITY OF FAILURE IS EQUAL TO 1− p

IN PRACTICAL SITUATIONS, A BERNOULLI TRIAL IS THE RANDOM SAMPLING OF A

SINGLE INDIVIDUAL FROM A POPULATION IN WHICH EACH ELEMENT HAS ONE, AND

ONLY ONE, OF TWO ATTRIBUTES, AGAIN REFERRED TO AS SUCCESS OR FAILURE.

.

WE WILL REFER TO A BINOMIAL SETTING IF THE FOLLOWING HOLDS:

A BERNOULLI TRIAL IS REPEATED n TIMES UNDER THE FOLLOWING ASSUMPTIONS:

1- THE OUTCOMES OF THE n TRIALS ARE ALL INDEPENDENT.

2- THE PROBABILITY OF SUCCESS p

WE DEFINE THE RANDOM VARIABLE X AS FOLLOWS:

REMAINS THE SAME THROUGHOUT THE n

X =COUNT OF SUCCESSES IN THE n TRIALS

NOTE THAT THE POSSIBLE VALUES OF X ARE 0,1,2,* ,n .

THE PMF OF X IS DESCRIBED BY THE FOLLOWING FORMULA:

p k nCk p p

( )= (1− )

k n k

k = 0,1,2,* ,n

,

!

n

,

IS THE NUMBER OF WAYS OF SELECTING n

nCk

=

k n k

!( )!

OBJECTS k AT A TIME.

m!=1⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ m AND 0!= 1.

WE WILL SAY THAT X HAS THE BINOMIAL DISTRIBUTION, OR THAT X IS BINOMIAL( n ,

LOOKING AT THE BINOMIAL FORMULA:

k 0 1 2

p(k) (1-p)2 2p(1-p) p

2

K 0 1 2 3

p(k) (1-p)3 3p(1-p)2 3p2

(1-p) p

3

K 0 1 2 3 4

p(k) (1-p)4 4p(1-p)3 6p2

THE BINOMIAL PROBABILITIES ARE GIVEN IN TABULATED FORM, TABLE C IN OUR

THE BINOMIAL MODEL PROVIDES A NICE APPROXIMATION TO THE FOLLOWING

WE ARE DEALING WITH A VERY LARGE, OR INFINITE, POPULATION. EACH ELEMENT

IN THE POPULATION HAS ONE, AND ONLY ONE, OF TWO ATTRIBUTES: SUCCESS (S) OR

FAILURE (F). THERE IS A PROPORTION p

(1-p)2 4p3

(1-p) p

4

OF INDIVIDUALS IN THE SAMPLE WITH THE

A SIMPLE RANDOM SAMPLE OF SIZE n IS SELECTED FROM THE POPULATION. LET X BE

THE STATISTIC DEFINED AS THE COUNT OF SUCCESSES IN THE SAMPLE.

THE SAMPLING DISTRIBUTION OF X IS WELL APPROXIMATED BY A BINOMIAL( n ,

REMARK: THE BINOMIAL DISTRIBUTION BECOMES THE EXACT DISTRIBUTION IF THE

SAMPLING IS DONE WITH REPLACEMENT (REGARDLESS OF POPULATION SIZE).

PROBABILITY HISTOGRAMS OF SOME BINOMIAL DISTRIBUTIONS ARE GIVEN BELOW.

THE BINOMIAL: n = 10, p = 0.1

Binomial Distribution, n=10, p=.1

40

30

Percent

20

10

0

0 5 10

X

0

5

=

.

Binomial Dist

5

2

2

0

1

5

Percent

1

050

0

n=1

0,

ribution

p=.

,

5

5

X

1

0

THE BINOMIAL: n = 10, p = 0.9

Binomial Distribution, n=10, p=.9

0 5 10

THE MEAN AND STANDARD DEVIATION OF THE BINOMIAL( n

p

) DISTRIBUTION ARE

,

μ = np

AND

σ = np(1− p)

AN APPROXIMATION RESULT FOR THE BINOMIAL DISTRIBUTION:

ASSUME THAT X HAS A BINOMIAL( n

p

,

) DISTRIBUTION.

IF n IS LARGE ENOUGH, THEN THE DISTRIBUTION OF X CAN BE WELL APPROXIMATED

BY A NORMAL DISTRIBUTION WITH MEAN μ = np

AND STANDARD DEVIATION

AS A RULE OF THUMB, THE APPROXIMATION WORKS WELL IF np ≥ 10 AND

EXAMPLE: THE HISTOGRAM OF 10,000 OBSERVATIONS TAKEN FROM A BINOMIAL

, AND A NORMAL DISTRIBUTION WITH μ = np = 10

, WERE OVERLAID ON THE SAME PAGE:

,

Histogram of Binomial Counts

Normal

Mean 9.999

StDev 2.450

N 10000

3 6 9 12 15 18

Observations

THE TWO DISTRIBUTIONS ARE VERY CLOSE.

REPEATING THE SIMULATION WITH n = 2500

p = 0.4

,

μ = np =1000, AND

,

, GIVES

Histogram of Binomial, with Normal Curve

900 1000 1100

Observations

THE LARGER SAMPLE SIZE GIVES A CLOSER APPROXIMATION.

THE SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION:

AN EQUIVALENT, AND IMPORTANT, FORMULATION OF THEOREM 3 CAN BE GIVEN IN

TERMS OF THE SAMPLE PROPORTION DEFINED BY

X

pˆ =

n

.

IS LARGE ENOUGH, THEN THE SAMPLING DISTRIBUTION OF pˆ

APPROXIMATED BY A NORMAL DISTRIBUTION WITH MEAN μ = p

CAN BE WELL

AND STANDARD

AS A RULE OF THUMB, THE APPROXIMATION WORKS WELL IF np ≥ 10 AND n(1− p) ≥ 10

AN AUDITOR SAMPLES 100 OF A FIRM'S TRAVEL VOUCHERS TO ASCERTAIN HOW MANY

OF THE VOUCHERS ARE IMPROPERLY DOCUMENTED.

A) WHAT IS THE APPROXIMATE PROBABILITY THAT MORE THAN 30% OF THE 100

SAMPLED VOUCHERS ARE IMPROPERLY DOCUMENTED IF, IN FACT, 20% OF ALL THE

VOUCHERS ARE IMPROPERLY DOCUMENTED.

B) IF YOU WERE THE AUDITOR AND OBSERVED MORE THAN 30% OF THE 100 VOUCHERS

WITH IMPROPER DOCUMENTATION, WHAT WOULD YOU CONCLUDE ABOUT THE FIRM'S

CLAIM THAT ONLY 20% SUFFERED FROM IMPROPER DOCUMENTATION?

THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN:

BE A RANDOM SAMPLE FROM A VERY LARGE OR INFINITE

POPULATION WITH MEAN μ AND STANDARD DEVIATION σ . THEN THE SAMPLING

DISTRIBUTION OF THE SAMPLE MEAN X HAS MEAN

μ = μ X AND STANDARD

BE A RANDOM SAMPLE FROM A NORMALLY DISTRIBUTED

POPULATION WITH MEAN μ AND STANDARD DEVIATION σ . THEN THE SAMPLING

DISTRIBUTION OF THE SAMPLE MEAN X IS ITSELF NORMALLY DISTRIBUTED WITH MEAN

AND STANDARD DEVIATION n

ILLUSTRATION OF THEOREMS 5 AND 6 BY SIMULATION:

USING MINITAB WE SIMULATE THE GENERATION OF 500 RANDOM SAMPLES FROM A

NORMAL POPULATION WITH MEAN μ = 20 AND STANDARD DEVIATION σ = 8

500 VALUES ARE GROUPED INTO A FREQUENCY HISTOGRAM.

σ

σ =

X

.

FOR EACH

AND 50. THE SAMPLE MEAN IS COMPUTED FOR EACH SAMPLE AND THE

500xbars;n=5

100

Frequency

50

0

5 15 25 35

xbar

POPULATION MEAN = 20.046; STDEV = 3.701

500xbarsn=50

12 17 22 27

90

80

70

60

Frequency

50

40

30

20

10

0

16 17 18 19 20 21 22 23 24

xbar

MEAN = 20.023; STDEV = 1.792 MEAN = 20.110; STDEV=1.129

NOTE THAT THE SHAPE OF THE HISTOGRAM REMAINS NORMAL, THE MEAN REMAINS

UNCHANGED (EQUAL TO THE POPULATION MEAN OF 20) AND, IMPORTANTLY, THE

STANDARD DEVIATION DECREASES AS THE SAMPLE SIZE INCREASES n (COMPARE IT TO

THE NICOTINE CONTENT IN A SINGLE CIGARETTE OF A PARTICULAR

BRAND IS NORMALLY DISTRIBUTED WITH MEAN μ = 0.8

DEVIATION σ = 0.04 MG. IF A RANDOM SAMPLE OF 16 CIGARETTES IS SELECTED AND

ANALYZED. FIND THE PROBABILITY THAT THE SAMPLE MEAN NICOTINE CONTENT

B) IS BETWEEN 0.79 MG AND 0.81 MG.

MG AND STANDARD

THEOREM 7 (THE CENTRAL LIMIT THEOREM (CLT)):

BE A RANDOM SAMPLE FROM A VERY LARGE, OR INFINITE,

POPULATION WITH MEAN μ AND STANDARD DEVIATION σ . THEN FOR n LARGE

ENOUGH, THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN X CAN BE WELL-
APPROXIMATED BY A NORMAL DISTRIBUTION WITH MEAN

μ = μ X AND STANDARD

ILLUSTRATION OF THE CLT BY SIMULATION:

TO ILLUSTRATE THE CENTRAL LIMIT THEOREM, WE USE MINITAB TO SIMULATE THE

GENERATION OF 500 RANDOM SAMPLES FROM A KNOWN DISCRETE DISTRIBUTION

CALLED THE POISSON DISTRIBUTION WITH MEAN μ = 1 AND STANDARD DEVIATION

FOR EACH OF n = 5, 20, and 50

SAMPLE AND THE 500 VALUES ARE GROUPED INTO A FREQUENCY HISTOGRAM.

. THE SAMPLE MEAN IS COMPUTED FOR EACH

PopulationHistogram

500xbarsn=5

0 1 2 3 4 5

100

90

80

70

Frequency

60

50

40

30

20

10

0

0 1 2 3

xbar

MEAN = 1; STDEV = 1 MEAN = .9788; STDEV = .4469

500xbarsn=50

0.5 1.0 1.5

150

100

Frequency

50

0

0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6

xbar

MEAN = .9888; STDEV = .2264 MEAN = .9978; STDEV = .1421

NOTE THAT THE MEAN REMAINS UNCHANGED AND THAT THE STANDARD DEVIATION

DECREASES AS THE SAMPLE SIZE

THAT THE POPULATION DISTRIBUTION IS SKEWED TO THE RIGHT. FOR SMALL SAMPLE

SIZES, THE SKEWNESS IS PRESENT IN THE SAMPLING DISTRIBUTION OF X BUT IS NOT

NICELY AND CAN BE WELL- APPROXIMATED BY A NORMAL DISTRIBUTION.

n

INCREASES (COMPARE IT TO n

1

). NOTE ALSO

BECOMES LARGER, THE SAMPLING DISTRIBUTION OF X CENTERS