<- colnames(adni)[c(43:104)]
features <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
covariates <- c("timedays,DIAGNOSIS")
interaction <- "manufac"
batch <- combat_harm(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni)
combat_model head(combat_model$harmonized_df)
Harmonization
ComBatFamQC provides four types of commonly used harmonization techniques, integrated through the ComBatFamily package developed by Dr. Andrew Chen, for users to consider. The four harmonization techniques include:
- Original ComBat (Johnson et al., 2007)
- Longitudinal ComBat (Beer et al., 2020)
- ComBat-GAM (Pomponio et al., 2020)
- CovBat (Chen et al., 2021)
There are two types of harmonization scenarios users can choose from:
- First-time Harmonization (Can also do interactive harmonization through Rshiny)
- Out of Sample Harmonization
- predict from existing ComBat model (works only for original ComBat and ComBat-GAM)
- harmonize new data toward existing reference data (works for all built-in ComBat harmonization methods)
First Harmonization
Specify parameters carefully based on the harmonization method to be applied.
Users can also use the command-line interface via ComBatQC_CLI.R to start the harmonization stage. Apart from the same required parameters as the diagnosis stage(features
, covariates
, batch
, smooth
, random
), using the command-line interface also requires users to set the following parameter:
--diagnosis/-d
: FALSE--outdir
: Path to save the harmonized dataset (in .csv format)--mout
: Path to save the ComBat model (optional if users do not wish to save the model; in .rds format)
Original ComBat
A method designed for batch effect correction in cross-sectional data with linear covariate effects.
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104
--covariates 9,11,13-14 -b 16 -i 9*14 -m lm --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds
Longitudinal ComBat
A method accounts for intra-subject correlation in longitudinal data by incorporating random effects into the model.
<- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni)
combat_model_lmer head(combat_model_lmer$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104
--covariates 9,11,13-14 -b 16 -i 9*14 -m lmer -r 3 --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds
ComBat-GAM
A method allows for preservation of non-linear covariate effects through use of the generalized additive model.
<- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni)
combat_model_gam head(combat_model_gam$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104
--covariates 9,11,13-14 -b 16 -i 9*14 -m gam -s 11 --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds
CovBat
CovBat is used for correcting covariance batch effects.
<- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni, family = "covfam")
covbat_model head(covbat_model$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104
--covariates 9,11,13-14 -b 16 -i 9*14 -m gam -s 11 -f covfam --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds
Out-of-Sample Harmonization
from ComBat Model
Specify predict
parameter to be TRUE and object
parameter to be saved ComBat model.
<- combat_model_gam$combat.object
saved_model <- combat_harm(df = adni %>% head(1000), predict = TRUE, object = saved_model) harm_predict
Using the command-line interface requires users to set the following parameter:
--predict
: TRUE--object/-o
: Path to the saved ComBat model (in .rds format)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --predict TRUE -o path/to/saved_model.rds --outdir /path/to/harmonized_data.csv
from Reference Data
Specify reference
parameter to be saved reference data. To be noticed, the reference data should have identical columns as the new data and the new data should contain reference data as its sub sample.
# harmonize reference data
<- adni %>% group_by(site) %>% summarize(count = n()) %>% arrange(desc(count)) %>% pull(site) %>% head(30)
reference_site <- adni %>% filter(site %in% reference_site)
reference_df <- colnames(reference_df)[c(43:104)]
features <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
covariates <- c("timedays,DIAGNOSIS")
interaction <- "site"
batch <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = reference_df)
ref_model
# harmonize new data to the reference data
<- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni, reference = ref_model$harmonized_df) harm_new
Using the command-line interface requires users to set the following parameter:
--reference
: Path to the reference dataset
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104 --covariates 9,11,13-14 -b 16 -i 9*14 -m lmer -r 3 --reference path/to/reference.csv --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds