FINAL ASSIGNMENT
SUBMISSION DETAILS
Submit your solutions as a Word Document (.doc or .docx) to Canvas by 11:59pm,
Thursday, May 6.
paragraph form (on separate pages, not on the computer output) and attach any excel
work as an Appendix.
Remember this is a statistics class – you should identify your methods and the
information you use to answer a question.
Bring relevant information from the output into your write-up. Report your findings in
the context of the question (i.e., discuss birth weights and altitude, not X’s and Y’s), and
provide both statistical explanations (e.g., discussion of slopes) and less-statistical
Ideally, your write-up should be complete enough and communicate
effectively so we don’t have to refer to the computer output (but please turn
in the output as well). 20% of the grade for this project concerns the way in
QUESTION #1: 80 POINTS
QUESTIONS #2: 120 POINTS
QUESTION #1: SIMPLE LINEAR REGRESSION
Preliminary data suggests that women living at higher altitudes tend to have smaller
babies. To investigate, a sample of n=800 mother/infants from across the southwest
(and living at different altitudes) was enrolled. The following focuses on the analysis of
the association between altitude, measured in thousands of meters above sea level
(variable name altKmeters, so altKmeters=1.2 corresponds to an altitude of 1,200
meters) and birth weight (variable name bweight, measured in grams).
Some descriptive data from the study (n=800):
Variable Mean St. Dev. Minimum Maximum
altKmeters
bweight
0.62
3245
0.53
414
0.00
2026
1.99
4545
#A (20 Points)
The correlation coefficient between altKmeters and bweight was r = -0.15. Give an
interpretation of this correlation coefficient, and test whether there is an association
between altitude and birth weight (report the test statistic, degrees freedom, and p-value
for this test, as well as giving a summary of the result of this test).
#B (10 Points).
For the correlation reported in 1A, find the 95% confidence interval for this correlation
coefficient.
#C (20 Points).
The following is the ANOVA table from a simple regression predicting birth weight from
altitude:
ANOVA Table
Source df
Sum of
Squares
Mean
Square F p-value
Regression
Error
3,195,584
132,489,915
Total 135,685,499
From the regression, the estimated standard error of Y|X is s(yx) = 407
Complete the above ANOVA table.
What can you conclude from the p-value from this ANOVA table?
Find and interpret the R2 for this regression model.
#D (20 Points).
The following gives the parameter estimates (i.e., slopes) and standard errors for the
regression model:
Variable Parameter
Estimate
Standard
Error t-statistic p-value 95% CI
Intercept
altKmeters
3326
-120
21.9
27.4
Complete the above table. What are the degrees of freedom associated with the tstatistic in this table?
Give an interpretation for the slope for altKmeters in this regression model.
Based on the 95% confidence interval for the slope for altKmeters from this table, is
there a significant association between altitude and birth weight? Explain.
#E (10 Points).
Based on the above regression equation, what is the predicted mean birth weight for
infants born to mothers living at sea level? For infants living at 1,000 meters above sea
level? Give an interval estimate for the mean birth weight for infants born to mothers
living at 1,000 meters above sea level.
QUESTION #2: MULTIPLE REGRESSION
‘As people age, the hippocampus, the brain’s memory center, loses 1% to 2% of its
volume annually (on average, volumes may increase or decrease over time), affecting
memory and possibly increasing the risk for dementia. A growing body of evidence has
pointed to aerobic exercise as a low-cost hedge against neurocognitive decline.’
In a study, 210 healthy elderly adults (ages 55 to 85) were recruited and randomized to
one of three groups (n=70 per group).
• The Control group agreed to be evaluated but were not assigned to any
intervention program.
• The Walking group (aerobic exercise) walked three days a week for 40 minutes.
• The Yoga group (yoga and toning exercises, which are non-aerobic exercise)
participated in group yoga sessions 3 days a week.
Magnetic resonance imaging (MRI) was used to measure the volume of the hippocampus
at study baseline and then again after 1 year. The dependent variable for this study is the
percent change in hippocampus volume, where positive change values indicate an
increase in hippocampus volume (e.g., 1.3 indicates a 1.3% increase in volume) and
negative change values indicate a reduction in hippocampus volume (e.g., -1.7 indicates a
1.7% decrease in volume). Our primary study question is whether those who exercised
had less decline in hippocampus volume than those in the control group.
Data for this study are saved in the attached ‘Elders.xlsx’ files.
Variables in the data set are:
1. subjid, an id number ranging from 1 to 210;
2. age, in years, restricted to adults between the ages of 55 and 85;
3. sexf, coded 1 for females and 0 for males;
4. IQ, measured at the start of the study, as a general measure of cognitive ability,
the mean IQ is expected to be around 100;
5. exercise, coded 1 for those in the Control group, 2 for those in the Walking group,
and 3 for those in the Yoga group;
6. hippochange, the percent change in the hippocampus volume, which should
range roughly between -4 percent (indicating a 4% decrease in volume) and 4
percent (indicating a 4% increase in volume).
#A (10 Points).
As a description of the study sample, complete the following tables:
Description of the study sample
Variable Mean
Standard
Deviation Minimum Maximum
Age
IQ
Description of the study sample
Variable n %
Sex
Male
Female
#B (10 Points).
Create a scatter plot showing the association between age (the independent variable)
and hippochange (the dependent variable).
#C (10 Points).
Find and interpret the correlation coefficient describing the association between age and
change in hippocampus volume.
For #D through #F, run a multiple regression predicting hippochange from age, sex,
and IQ (do not include exercise in this analysis). Use this regression model to answer
these questions.
#D (10 Points).
Complete the following table summarizing the results of this multiple regression:
Multiple regression predicting percent change in hippocampus volume
Variable Slope
SE of
Slope p-value
Intercept
Age
Sex Female
IQ
Report and interpret the R2 for this regression model. What can you conclude from the
p-value from the ANOVA table for this regression?
#E (10 Points).
What can you say about the associations between age, sex, and IQ and change in
hippocampus volume, based on this regression?
For Questions #F and #G, run a multiple regression predicting change in
hippocampus volume from age, sex, IQ, and exercise group. Use this regression analysis
#F (30 Points).
Provide a Table , similar to that in Question #D, reporting slopes, standard errors, and pvalues from this regression. Report and interpret the R2 from this regression.
#G (30 Points).
Our primary interest in this analysis is in whether or not either Walking exercise or Yoga
exercise has a positive benefit on change in hippocampus volume, compared to the No
Exercise group. Interpret the results of this multiple regression analysis, with a focus on
this question.
#H (10 Points).
Conduct a partial F test comparing the regression model used in #F to the regression
model used in #D (report the F statistic, degrees freedom, and p-value from this test).
What is the null hypothesis being tested by this partial F test (give the hypothesis in the
context of the question, in terms of age, sex, IQ, and exercise)?
What do you conclude from this partial F test?
Report and interpret the partial R2 associated with the comparison of the
regression models in #F and #D.