features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
combat_model <- combat_harm(
type = "lm",
features = features,
batch = batch,
covariates = covariates,
interaction = interaction,
smooth = NULL,
random = NULL,
df = adni
)
head(combat_model$harmonized_df)Harmonization
ComBatFamQC provides four types of commonly used harmonization techniques, integrated through the ComBatFamily package developed by Dr. Andrew Chen, for users to consider. The four harmonization techniques include:
- ComBat (Johnson et al., 2007)
- Longitudinal ComBat (Beer et al., 2020)
- ComBat-GAM (Pomponio et al., 2020)
- CovBat (Chen et al., 2021)
There are two types of harmonization scenarios users can choose from:
- First-time Harmonization (Can also do interactive harmonization through Rshiny)
- Out of Sample Harmonization
- predict from existing ComBat model (currently not supports Longitudinal ComBat)
- harmonize new data toward existing reference data (works for all built-in ComBat harmonization methods)
First Time Harmonization
In order to run harmonization, you need to specify your harmonization parameters based on the method you plan to use.
In R, harmonization is performed with the combat_harm() function, where you provide the features, batch variable, covariates, interaction terms, and optional smooth or random effects depending on the selected ComBat method
Users can also use the command-line interface via ComBatQC_CLI.R to start the harmonization stage. Apart from the same required parameters as the diagnosis stage(features, covariates, batch, smooth, random), using the command-line interface also requires users to set the following parameter:
--diagnosis/-d:FALSE--outdir: Path to save the harmonized dataset (in .csv format)--mout: Path to save the ComBat model (optional if users do not wish to save the model; in .rds format)
ComBat
A method designed for batch effect correction in cross-sectional data with linear covariate effects.
Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--features 43-104 \
--covariates 9,11,13-14 \
-b 16 \
-i 9*14 \
-m lm \
--outdir /path/to/harmonized_data.csv \
--mout /path/to/saved_model.rdsLongitudinal ComBat
A method accounts for intra-subject correlation in longitudinal data by incorporating random effects into the model.
combat_model_lmer <- combat_harm(
type = "lmer",
features = features,
batch = batch,
covariates = covariates,
interaction = interaction,
smooth = NULL,
random = "subid",
df = adni
)
head(combat_model_lmer$harmonized_df)Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--features 43-104 \
--covariates 9,11,13-14 \
-b 16 \
-i 9*14 \
-m lmer \
-r 3 \
--outdir /path/to/harmonized_data.csv \
--mout /path/to/saved_model.rdsComBat-GAM
A method allows for preservation of non-linear covariate effects through use of the generalized additive model.
combat_model_gam <- combat_harm(
type = "gam",
features = features,
batch = batch,
covariates = covariates,
interaction = interaction,
smooth = "AGE",
smooth_int_type = "linear",
df = adni
)
head(combat_model_gam$harmonized_df)Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--features 43-104 \
--covariates 9,11,13-14 \
-b 16 \
-i 9*14 \
-m gam \
-s 11 \
--outdir /path/to/harmonized_data.csv \
--mout /path/to/saved_model.rdsCovBat
CovBat is used for correcting covariance batch effects.
covbat_model <- combat_harm(
type = "gam",
features = features,
batch = batch,
covariates = covariates,
interaction = interaction,
smooth = "AGE",
smooth_int_type = "linear",
df = adni,
family = "covfam"
)
head(covbat_model$harmonized_df)Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--features 43-104 \
--covariates 9,11,13-14 \
-b 16 \
-i 9*14 \
-m gam \
-s 11 \
-f covfam \
--outdir /path/to/harmonized_data.csv \
--mout /path/to/saved_model.rdsOut-of-Sample Harmonization
from ComBat Model
Specify predict parameter to be TRUE and object parameter to be saved ComBat model.
saved_model <- combat_model_gam$combat.object
harm_predict <- combat_harm(
df = adni %>% head(1000),
predict = TRUE,
object = saved_model
)Using the command-line interface requires users to set the following parameter:
--predict:TRUE--object/-o: Path to the saved ComBat model (in .rds format)
Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--predict TRUE \
-o path/to/saved_model.rds \
--outdir /path/to/harmonized_data.csvfrom CovBat Model
Specify predict parameter to be TRUE and object parameter to be saved CovBat model.
saved_model <- covbat_model$combat.object
harm_predict <- combat_harm(
df = adni,
predict = TRUE,
object = saved_model
)Using the command-line interface requires users to set the following parameter:
-d/--diagnosis: set toFALSEto run harmonization (useTRUEto run batch effect diagnostics only)--predict: set toTRUEfor out-of-sample harmonization using a saved covbat model--object/-o: Path to the saved CovBat model file (.rds)--outdir: Path to write the harmonized data (.csv)
Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--predict TRUE \
-o path/to/saved_model.rds \
--outdir /path/to/harmonized_data.csvfrom Reference Data
Specify reference parameter to be saved reference data. To be noticed, the reference data should have identical columns as the new data and the new data should contain reference data as its sub sample.
# identify reference site(s)
reference_site <- adni %>%
group_by(site) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
pull(site) %>%
head(30)
reference_df <- adni %>%
filter(site %in% reference_site)
# specify harmonization variables
features <- colnames(reference_df)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "site"
# harmonize reference data
ref_model <- combat_harm(
type = "lmer",
features = features,
batch = batch,
covariates = covariates,
interaction = interaction,
smooth = NULL,
random = "subid",
df = reference_df
)
# harmonize new data to the reference data
harm_new <- combat_harm(
type = "lmer",
features = features,
batch = batch,
covariates = covariates,
interaction = interaction,
smooth = NULL,
random = "subid",
df = adni,
reference = ref_model$harmonized_df
)Using the command-line interface requires users to set the following parameter:
--reference: Path to the reference dataset
Rscript path/to/combatQC_CLI.R \
path/to/unharmonized_data.csv \
-d FALSE \
--features 43-104 \
--covariates 9,11,13-14 \
-b 16 \
-i 9*14 \
-m lmer \
-r 3 \
--reference path/to/reference.csv \
--outdir /path/to/harmonized_data.csv \
--mout /path/to/saved_model.rdsLearn More
- Refer to the Example: Post Harmonization for a detailed illustration of post harmonization analysis in application.