7 Aug 2002 DRAFT some issues that may need to be resolved are highlighted; please provide comments/suggestions to Kevin Sullivan (cdckms@sph.emory.edu); Copyright 2001 Data Description Inc.

 

LESSON 13

Options for controlling for confounders

 

Design options

Randomization

RCT only

Groups are similar (on both measured and unmeasured factors)

Restriction

Easy, inexpensive

Generalizability

Matching most freq with case-control studies

Gain precision

Number of controls per case

Matched analyses

 

Analysis options

Stratified analysis

Mathematical modeling

 

 


Mathematical Modeling

 

Introduction to Mathematical Modeling

 

A mathematical model is a mathematical expression that describes how an outcome variable can be predicted from explanatory variables.

 

 

 

Linear regression usually a continuous outcome variable (e.g., blood pressure, antibody level, weight); predictor variables can be categorical or continuous.

 

The Logistic Model

 

In epidemiology many times the outcome variable is dichotomous. When the dependent variable is dichotomous, the most popular mathematical model is a non-linear model called the logistic model.

 

 

 


Table 14-2. Example Data 1: Hypothetical cohort study of the relationship between smoking and coronary heart disease (CHD) stratified on sex

 

Females

 

Smoker

Non-Smoker

 

CHD

5

8

13

No CHD

45

142

187

 

50

150

200

Risk

10.0%

5.3%

 

Odds Ratio for females (ORf) = 2.0 (0.6, 6.3)

 

Males

 

Smoker

Non-Smoker

 

CHD

300

50

350

No CHD

300

150

450

 

600

200

800

Risk

50.0%

25.0%

 

Odds Ratio for males (ORm) = 3.0 (2.1, 4.3)

----

Summary information

Directly adjusted OR = 2.9 (2.1, 4.1)

Mantel-Haenszel OR = 2.9 (2.1, 4.1)

 

 

Chi-square p-value (MH) p-value < .001

 

 

 


Table 14-9. Example data 1: Hypothetical cohort study of the relationship between smoking and coronary heart disease (CHD) controlling for the sex of the individual, logistic regression model

 

 

There were 363 type 1.0's (model gives log odds of this type) and 637 type .0's.

 

Log likelihood = -575.0730

Likelihood ratio = 158.1036 2 df (P = .0000)

 

Dependent Variable = CHD

 

Standard

Coefficient Error Coef/SE "P value"

CONSTANT -3.0336 .2997 -10.1211 .0000

SMOKE 1.0618 .1733 6.1277 .0000

SEX 1.9643 .3045 6.4505 .0000

 

95.0-% confidence limits

Coefficient Odds ratio

lower upper lower upper

limit limit limit limit

SMOKE .7222 1.0618 1.4015 2.0590 2.8916 4.0611

SEX 1.3675 1.9643 2.5612 3.9254 7.1302 12.9513

 


TABLE 14-14. Advantages and disadvantages of stratification and logistic regression.

 

 

Stratification

Logistic Regression

Parameters estimated

RR, RD, OR, IDR, IDD, others

OR

Validity of parameters estimated

More valid (no model assumptions)

Less valid (based on model assumptions)

Exposure and third variables

Must be categorical

Can be categorical or continuous

Number of third variables or categories in third variables

Compared to logistic regression, limited

Compared to stratified analysis, can usually have many more variables or variables with many categories