Statistics Questions to be solves using R software

Statistics Questions to be solves using R software

Statistics Questions to be solves using R software

 

 

Question 1:

One hundred twenty two guinea pigs that were randomly assigned either to a control group () or to a treatment group () that received a dose of tubercle bacilli. The survival times (in days) of the guinea pigs were recorded. The data set is provided separately.

  • Compute the descriptive statistics and create side-by-side boxplots and histograms to compare the two distributions.
  • Test the hypothesis that the average survival time for the treated population is smaller than its counterpart in the control population. Type out all four steps in testing: (a) state null and the alternative hypotheses; (b) the sample value of the test statistic and the DF (if applicable); (c) report the P value; (d) state your conclusion in clear terms (do not just say '(do not) reject the null)'
  • Report a 95% confidence interval.
  • Assess the validity of the t procedure used in this problem. Would departure from the normality assumption be a concern. Rely on the graphs you obtained in (i) and the information on robustness of the t test on page 17 of the Chapter 7 class notes (Statistical Inference 2).

 

Rcmdr instructions

To display and summarize:

Histograms-by-group

  • Graphs → Histogram
  • Pick the relevant variable
  • Click on the “Plot by groups” button and select the “Group” variable
  • Select “Percentages” from Options →Axis Scaling and insert titles etc under “Plot labels”
  • Click OK. The output appears in a separate graph window that you copy and paste

To make side-by-side boxplots

  • Graphs → Boxplot
  • Pick the relevant variable
  • Click on the “Plot by groups” button and select the “Group” variable
  • From Options, select the mouse option and insert title under “Plot labels”. Click OK.

To find the summary statistics

  • Statistics → Summaries → Numerical summaries
  • Pick the relevant variable
  • Click on the “Summarize by Groups” button and select the “group” variable
  • Click on Statistics and select the usual numerical summaries. Click OK

Confidence interval and statistical test

  • Statistics → Means → Independent samples t-test
  • Pick the group and response variables (should be obvious for this problem)
  • From the Options tab
  • Leave the alternative hypothesis at the two-sided state
  • Enter the confidence level (in decimal notation)
  • Choose No for “Assume equal variances” and click OK
  • Copy and paste the output
  • You will easily locate in the output all you need to answer (ii) and (iii). For the statistical test, read question (ii) carefully. If the test is two-sided, you have all you need to answer. If the test is one-sided, then the P value is equal to half the P value of the two-sided test (this must be clear to you; make sure the difference between the sample means reflects the direction of the alternative hypothesis.)


Question 2: (R )

The candy company that makes M&M’s claims that 10% of the M&M’s it produces are green. Suppose that the candies are packaged at random in large bags of 200 M&M’s. We randomly pick a bag of M&M’s. Assume that this represents a simple random sample of size n = 200. The bag contains 12 green M&M's.

  • Is there sufficient statistical evidence to conclude that the proportion of green M&M's produced by the company differs from 10%? State the null and alternative hypotheses. Report the sample value of the test statistic as well as the P value of the test and state your conclusion in clear terms (do not just say '(do not) reject the null). Use α = 0.05.
  • Construct a 95% confidence interval for the proportion of green M&M's produced by the company.

 

R instructions (no Rcmdr)

  • Type the following command (in red color) in the R console :
  • test(count, sample size, null value) #the count, sample size and the null value of the parameter are given in the set up.
  • Press Enter
  • Copy and paste the output. You have all the information you need to answer (i) and (ii).

 

Question 3:

A consumer advocate agency is concerned about reported failures of two brands of MP3 players, which we will label Brand A and Brand B.  In a random sample of 197 Brand A players, 33 units failed within one  year of purchase.  Of the 290 Brand B players, 25 units were reported to have failed within the first year following purchase.

  • Is there sufficient statistical evidence to conclude that the proportion of Brand A MP3 players that fail within one year of purchase differs from its counterpart for Brand B? State the null and alternative hypotheses. Report the sample value of the test statistic as well as the P value of the test and state your conclusion in clear terms (do not just say '(do not) reject the null). Use α = 0.05.
  • Construct a 95% confidence interval for the proportion of green M&M's produced by the company.

 

R instructions (no Rcmdr)

  • Type the following command (in red color) in the R console :
  • failures= c(count of brand A failures, count of Brand B failures) #enter the counts given to you in the set up.
  • Press Enter
  • ssize=c(Brand A sample size, BrandB sample size) #enter the sample sizes given to you in the set up.
  • Press Enter
  • test(mp3.failures, sample.size)
  • Press Enter
  • Copy and paste the output. You have all the information you need to answer (i) and (ii).


Question 4 :

The data classifies a sample of 865 college students according to their field of study and their score on a scale called PEOPLE designed to measure altruism defined as an interest in the welfare of others. For this scale, lower scores were designed to indicate lower levels of altruism. The table below summarizes the results.

 

SchoolLOWMEDIUMHIGH
Agriculture52735
Consumer/Family Sciences13254
Engineering1212994
Liberal Arts777129
Management34428
Science72924
Technology26264

 

  1. Summarize the distribution of the Altruism variable per School using side-by-side bar charts. Make sure the counts are transformed into percents relative to row total (the table is provided separately in an Excel file. Which fields of study appear to be related to higher or lower levels of altruism?
  2. Is level of altruism independent of field of study? Do a chi-square test for independence. Type out all four steps in testing: state null and the alternative hypotheses; report the test statistic, its sample value, and its DF (if applicable); report the P value; conclude (insert the R output where appropriate).

 

Rcmdr Instructions

  1. Use Excel. The table is provided as an Excel file.
  2. Choose Statistics → Contingency tables → Enter two-way table
  • Enter the number of rows (the number of schools)
  • Enter the number of columns (the number of categories in the Altruism variable)
  • Enter the counts in the 7 x 3 table above and click on Statistics
  • Click on Statistics and select Chisquare test (it’s selected by default) and Print expected frequencies
  • The output will (re)produce the table of counts, the table of expected frequencies, as well as the chisquare value, its df, and he P-value.

For US and UK students, need help on this assignment upload it through our website www.mytutorstore.com or send through email at care@mytutorstore.com

No Comments

Post a Reply