Research Projects and Interests

Model Selection, Machine Learning, Data Mining

A recent research interest area of mine is model selection in complex regression problems. Much of the recent work in this area is focused on parametric models, e.g. least squares and likelihood, while many biometric, econometric models are semiparametric. There are still many new and emerging problems in this area.

Almost all of my research on this problem thus far has focused on the extension of existing regularization methods to semi-parametric models; in particular, the accelerated failure time model. I have investigated both rank-based estimation and Buckley-James-type estimation with all manner of penalties. In addition, I proposed rank-based survival ensembles to complement the mboost package in R. Below are some references whose pdfs can be downloaded on my publications page as well as some software that can also be downloaded from the software page.

References:

Chung M, Long Q, and Johnson BA. (2012) A tutorial on rank-based coefficient estimation in small- and large-scale problems. Statistics and Computing (In press).

Long Q, Chung M, Moreno C, Johnson BA. (2011) Risk prediction for cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects. Annals of Applied Statistics (In press).

Johnson BA, Long Q, Chung M. (2011) On path restoration for censored outcomes. Biometrics (In press).

Johnson BA, Long Q. (2011) Survival ensembles by the sum of pairwise differences. Annals of Applied Statistics (In press).

Johnson BA. (2009) Rank-based estimation in the L1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. Biostatistics 10, 659--666.

Johnson BA. (2009) On lasso for censored data. Electronic Journal of Statistics, 3, 485-506.

Johnson BA, Lin DY, and Zeng D. (2008) Penalized estimating functions and variable seleciton in semiparametric regression models. Journal of the American Statistical Association, 103, 672-680.

Johnson BA and Peng L. (2008) Rank-based variable selection. Journal of Nonparametric Statistics, 20, 241-252.

Johnson BA. (2008) Variable selection in semiparametric linear regression with censored data. Journal of the Royal Statistical Society, Series B, 70, 351-370.

Semiparametric Theory in Missing Data, Causal Inference

An ongoing research interest of mine (thanks to my advisor Anastasios (Butch) Tsiatis) is the analysis of coarsened data and a semi-parametric approach to handle it. Coarsened data is defined as a many-to-one function of the true, complete data and a generalization of traditional notions of missing data. A nice introduction to this topic is found in Tsiatis (2006). Special cases of coarsened data include survival analysis, missing data, and measurement error. My first thesis advisee, Li Li, and I did some work in this area and those papers are forthcoming. In 2011, one of Li's paper was accepted to JASA.

References:

Li L, Eron J, Ribaudo H, Gulick RM, Johnson BA (2012) Evaluating the effect of early versus late ARV regimen change after failure on the initial regimen: results from the AIDS Clinical Trials Group Study A5095. Journal of the American Statistical Association (In press).

Johnson BA. (2008) Treatment-competing events in dynamic regimes. Lifetime Data Analysis, 14, 196-215.

Johnson BA and Tsiatis AA. (2005) Semiparametric inference in observational duration-response studies, with duration possibly right-censored. Biometrika, 92, 605-618.

Johnson BA and Tsiatis AA. (2004). Estimating mean response as a function of treatment duration in an observational study, where treatment duration may be informatively censored. Biometrics 60, 315-323.

Treatment and Prevention of HIV and AIDS

Beginning in 2004, I began applying my interest in complex treatment strategies to HIV and AIDS research. At that time, I was a postdoctoral fellow and met Joe Eron, Professor of Medicine at UNC-CH, through a colleague in Biostatistics, Michael Hudgens. We conceived of a novel strategy to estimate the causal effect of delayed switch from a failing antiretroviral regimen. Rather than conditioning the analysis only on those patients that failed, we estimate the combined effect of failing on the initial regimen and switching early or late to second-line regimen. Interestingly, in our analysis of the ACTG 5095 data, we found that our method detected mild clinical benefit, on average, to switching within 8 weeks of confirmed virologic failure of an efavirenz-containing regimen whereas the conventional method found no difference.

Since arriving at Emory in 2006, I have been a member of Emory's CFAR through the Biostatistics Core. In addition to working on therapeutic studies, I have also worked on several projects in prevention. My Emory collaborators include Patrick Sullivan (EPI), Rob Stephenson (Global Health), Frank Wong (BSHE), Eric Nehl (BSHE), and Vince Marconi (Medicine, Ponce Clinic, Grady and VA Hospital). We have submitted several grant applications together and papers are forthcoming.

References:

Environmental Health

I was introduced to statistical problems in environmental and occupational health while I was a postdoctoral fellow at UNC-CH. I worked with Larry Kupper and Stephen M. Rappaport. Rappaport is an expert in exposure biology and got me started on nonlinear regression models. The basic idea is to estimate the biomarker response curve as a function of occupational exposure. In contrast to classic pharmacokinetic data (in Davidian and Giltinan, 1995, for example), we do not see multiple outcomes per subject over time, dose, or exposure. Rather, in occupational studies of exposure, we get to observe outcome measurements for a single exposure dose. Naturally, as one might expect, the same nonlinear models that would be applied to individual-level data fit well to a random sample from the population. We have applied some standard semiparametric tools to Chinese studies of benzene exposure and are looking to develop some novel statistical methods shortly.

Since arriving at Emory, I have collaborated with several investigators in our Environmental Health department. My main collaborators are Jeremy Sarnat, Stephanie Sarnat, Roby Greenwald, Ying Zhou, and Yang Liu. I continue to collaborate with Rappaport (UC-Berkeley).

References:

Sarnat S, Raysoni AU, Li W, Holguin F, Johnson BA, Flores-Luevano S, Garcia J, and Sarnat JA. (2012) Air pollution and acute respiratory response in a panel of asthmatic children along the U.S.-Mexico border. Environmental Health Perspectives (In press)

Taylor DJ, Kupper LL, Johnson BA, Kim S, Rappaport SM. (2008) Parametric methods for evaluating nonlinear exposure-biomarker relationships when the predictor and the response variables are measured with error. Journal of Agricultural, Biological, and Environmental Statistics 3, 367-387.

Johnson BA and Rappaport SM. (2007) On modeling metabolism-based biomarkers of exposure: a comparative analysis of nonlinear models with few repeated measurements. Statistics in Medicine 26, 1901-1919.

Kim S, Lan Q, Waidyanatha S, Chanock S, Johnson BA, Vermeulen R, Smith MT, Zhang L, Li G, Shen M, Yin S, Rothman N, Rappaport SM. (2007) Genetic polymorphisms and benzene metabolism in humans exposed to a wide range of air concentrations. Pharmacogenetics and Genomics 17, 789-801.

Kim S, Vermeulen R, Waidyanatha S, Johnson BA, Lan Q, Rothman N, Smith MT, Zhang L, Li G, Shen M, Yin S, Rappaport SM (2007) Modeling human metabolism of benzene following occupational and environmental exposures. Cancer Epidemiology Biomarkers and Prevention 15, 2246-2252.

Kim S, Vermeulen R, Waidyantha S, Johnson BA, Lan Q, Rothman N, Smith MT, Zhang L, Li G, Shen M, Yin S, Rappaport SM. (2006). Using urinary biomarkers to elucidate dose-related patterns of human benzene metabolism. Carcinogensis 27, 772-781.

Johnson BA, Kupper LL, Taylor DJ, Rappaport SM. (2005). Modeling exposure-biomarker relationships: applications of linear and nonlinear toxicokinetics. Journal of Agricultural, Biological, and Environmental Statistics 10, 440-459.

Pleil JD, Vette AF, Johnson BA, Rappaport SM (2004). Air levels of carcinogenic polycyclic aromatic hydrocarbons after the World Trade Center disaster. Proceedings of the National Academy of Sciences 101, 11685-11688.

Other Projects

Measurement Error In Nutrition. A common application of measurement error occurs in the analysis of nutritional information taken from dietary instruments. A standard goal of nutrition studies is to determine the relationship of clinical outcome to nutrient intake, say iron. Of course, we never actually get to observe how much iron a person actually gets or takes, just the types and amounts of food that a person eats. In large nutrition studies, food intake is typically obtained through questionnaires and/or diaries. In smaller studies, subjects may be required to give blood or urine samples which offer much better nutrient intake information but are also more expensive to collect, which preclude their use in large trials.

References:

Johnson BA, Herring AH, Ibrahim JG, and Siega-Riz AM (2007) Structured measurement error in nutritional epidemiology: applications in the pregnancy, infection, and nutrition (PIN) study. Journal of the American Statistical Association, 102, 856-866.

This page created and maintained by Brent A. Johnson. Page last updated: January, 2012.

Brent A. Johnson

Teaching

Research

Publications

Software

Biostatistics Home

School of Public Health Home

Emory Home

Research Projects and Interests

Model Selection, Machine Learning, Data Mining

Semiparametric Theory in Missing Data, Causal Inference

Treatment and Prevention of HIV and AIDS

Environmental Health

Other Projects