FINAL ASSIGNMENT

SUBMISSION DETAILS

Submit your solutions as a Word Document (.doc or .docx) to Canvas by 11:59pm,

Thursday, May 6.

You should plan your analyses and write up results. Please write up your responses in

paragraph form (on separate pages, not on the computer output) and attach any excel

work as an Appendix.

Remember this is a statistics class – you should identify your methods and the

information you use to answer a question.

Bring relevant information from the output into your write-up. Report your findings in

the context of the question (i.e., discuss birth weights and altitude, not X’s and Y’s), and

provide both statistical explanations (e.g., discussion of slopes) and less-statistical

explanations (e.g., associations between birth weight and altitude) in your answers.

Ideally, your write-up should be complete enough and communicate

effectively so we don’t have to refer to the computer output (but please turn

in the output as well). 20% of the grade for this project concerns the way in

which the answers are communicated.

QUESTION #1: 80 POINTS

QUESTIONS #2: 120 POINTS

QUESTION #1: SIMPLE LINEAR REGRESSION

Preliminary data suggests that women living at higher altitudes tend to have smaller

babies. To investigate, a sample of n=800 mother/infants from across the southwest

(and living at different altitudes) was enrolled. The following focuses on the analysis of

the association between altitude, measured in thousands of meters above sea level

(variable name altKmeters, so altKmeters=1.2 corresponds to an altitude of 1,200

meters) and birth weight (variable name bweight, measured in grams).

Some descriptive data from the study (n=800):

Variable Mean St. Dev. Minimum Maximum

altKmeters

bweight

0.62

3245

0.53

414

0.00

2026

1.99

4545

#A (20 Points)

The correlation coefficient between altKmeters and bweight was r = -0.15. Give an

interpretation of this correlation coefficient, and test whether there is an association

between altitude and birth weight (report the test statistic, degrees freedom, and p-value

for this test, as well as giving a summary of the result of this test).

#B (10 Points).

For the correlation reported in 1A, find the 95% confidence interval for this correlation

coefficient.

#C (20 Points).

The following is the ANOVA table from a simple regression predicting birth weight from

altitude:

ANOVA Table

Source df

Sum of

Squares

Mean

Square F p-value

Regression

Error

3,195,584

132,489,915

Total 135,685,499

From the regression, the estimated standard error of Y|X is s(yx) = 407

Complete the above ANOVA table.

What can you conclude from the p-value from this ANOVA table?

Find and interpret the R2 for this regression model.

#D (20 Points).

The following gives the parameter estimates (i.e., slopes) and standard errors for the

regression model:

Variable Parameter

Estimate

Standard

Error t-statistic p-value 95% CI

Intercept

altKmeters

3326

-120

21.9

27.4

Complete the above table. What are the degrees of freedom associated with the tstatistic in this table?

Give an interpretation for the slope for altKmeters in this regression model.

Based on the 95% confidence interval for the slope for altKmeters from this table, is

there a significant association between altitude and birth weight? Explain.

#E (10 Points).

Based on the above regression equation, what is the predicted mean birth weight for

infants born to mothers living at sea level? For infants living at 1,000 meters above sea

level? Give an interval estimate for the mean birth weight for infants born to mothers

living at 1,000 meters above sea level.

QUESTION #2: MULTIPLE REGRESSION

‘As people age, the hippocampus, the brain’s memory center, loses 1% to 2% of its

volume annually (on average, volumes may increase or decrease over time), affecting

memory and possibly increasing the risk for dementia. A growing body of evidence has

pointed to aerobic exercise as a low-cost hedge against neurocognitive decline.’

In a study, 210 healthy elderly adults (ages 55 to 85) were recruited and randomized to

one of three groups (n=70 per group).

• The Control group agreed to be evaluated but were not assigned to any

intervention program.

• The Walking group (aerobic exercise) walked three days a week for 40 minutes.

• The Yoga group (yoga and toning exercises, which are non-aerobic exercise)

participated in group yoga sessions 3 days a week.

Magnetic resonance imaging (MRI) was used to measure the volume of the hippocampus

at study baseline and then again after 1 year. The dependent variable for this study is the

percent change in hippocampus volume, where positive change values indicate an

increase in hippocampus volume (e.g., 1.3 indicates a 1.3% increase in volume) and

negative change values indicate a reduction in hippocampus volume (e.g., -1.7 indicates a

1.7% decrease in volume). Our primary study question is whether those who exercised

had less decline in hippocampus volume than those in the control group.

Data for this study are saved in the attached ‘Elders.xlsx’ files.

Variables in the data set are:

1. subjid, an id number ranging from 1 to 210;

2. age, in years, restricted to adults between the ages of 55 and 85;

3. sexf, coded 1 for females and 0 for males;

4. IQ, measured at the start of the study, as a general measure of cognitive ability,

the mean IQ is expected to be around 100;

5. exercise, coded 1 for those in the Control group, 2 for those in the Walking group,

and 3 for those in the Yoga group;

6. hippochange, the percent change in the hippocampus volume, which should

range roughly between -4 percent (indicating a 4% decrease in volume) and 4

percent (indicating a 4% increase in volume).

#A (10 Points).

As a description of the study sample, complete the following tables:

Description of the study sample

Variable Mean

Standard

Deviation Minimum Maximum

Age

IQ

Description of the study sample

Variable n %

Sex

Male

Female

#B (10 Points).

Create a scatter plot showing the association between age (the independent variable)

and hippochange (the dependent variable).

#C (10 Points).

Find and interpret the correlation coefficient describing the association between age and

change in hippocampus volume.

For #D through #F, run a multiple regression predicting hippochange from age, sex,

and IQ (do not include exercise in this analysis). Use this regression model to answer

these questions.

#D (10 Points).

Complete the following table summarizing the results of this multiple regression:

Multiple regression predicting percent change in hippocampus volume

Variable Slope

SE of

Slope p-value

Intercept

Age

Sex Female

IQ

Report and interpret the R2 for this regression model. What can you conclude from the

p-value from the ANOVA table for this regression?

#E (10 Points).

What can you say about the associations between age, sex, and IQ and change in

hippocampus volume, based on this regression?

For Questions #F and #G, run a multiple regression predicting change in

hippocampus volume from age, sex, IQ, and exercise group. Use this regression analysis

to answer these questions.

#F (30 Points).

Provide a Table , similar to that in Question #D, reporting slopes, standard errors, and pvalues from this regression. Report and interpret the R2 from this regression.

#G (30 Points).

Our primary interest in this analysis is in whether or not either Walking exercise or Yoga

exercise has a positive benefit on change in hippocampus volume, compared to the No

Exercise group. Interpret the results of this multiple regression analysis, with a focus on

this question.

#H (10 Points).

Conduct a partial F test comparing the regression model used in #F to the regression

model used in #D (report the F statistic, degrees freedom, and p-value from this test).

What is the null hypothesis being tested by this partial F test (give the hypothesis in the

context of the question, in terms of age, sex, IQ, and exercise)?

What do you conclude from this partial F test?

Report and interpret the partial R2 associated with the comparison of the

regression models in #F and #D.