Harmonization

ComBatFamQC provides four types of commonly used harmonization techniques, integrated through the ComBatFamily package developed by Dr. Andrew Chen, for users to consider. The four harmonization techniques include:

There are two types of harmonization scenarios users can choose from:

First Harmonization


Specify parameters carefully based on the harmonization method to be applied.

Users can also use the command-line interface via ComBatQC_CLI.R to start the harmonization stage. Apart from the same required parameters as the diagnosis stage(features, covariates, batch, smooth, random), using the command-line interface also requires users to set the following parameter:

  • --diagnosis/-d: FALSE
  • --outdir: Path to save the harmonized dataset (in .csv format)
  • --mout: Path to save the ComBat model (optional if users do not wish to save the model; in .rds format)

Original ComBat

A method designed for batch effect correction in cross-sectional data with linear covariate effects.

features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
combat_model <- combat_harm(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni)
head(combat_model$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104 
--covariates 9,11,13-14 -b 16 -i 9*14 -m lm --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds

Longitudinal ComBat

A method accounts for intra-subject correlation in longitudinal data by incorporating random effects into the model.

combat_model_lmer <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni)
head(combat_model_lmer$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104 
--covariates 9,11,13-14 -b 16 -i 9*14 -m lmer -r 3 --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds

ComBat-GAM

A method allows for preservation of non-linear covariate effects through use of the generalized additive model.

combat_model_gam <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni)
head(combat_model_gam$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104 
--covariates 9,11,13-14 -b 16 -i 9*14 -m gam -s 11 --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds

CovBat

CovBat is used for correcting covariance batch effects.

covbat_model <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni, family = "covfam")
head(covbat_model$harmonized_df)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104 
--covariates 9,11,13-14 -b 16 -i 9*14 -m gam -s 11 -f covfam --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds

Out-of-Sample Harmonization


from ComBat Model

Specify predict parameter to be TRUE and object parameter to be saved ComBat model.

saved_model <- combat_model_gam$combat.object
harm_predict <- combat_harm(df = adni %>% head(1000), predict = TRUE, object = saved_model)

Using the command-line interface requires users to set the following parameter:

  • --predict: TRUE
  • --object/-o: Path to the saved ComBat model (in .rds format)
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --predict TRUE -o path/to/saved_model.rds --outdir /path/to/harmonized_data.csv

from Reference Data

Specify reference parameter to be saved reference data. To be noticed, the reference data should have identical columns as the new data and the new data should contain reference data as its sub sample.

# harmonize reference data
reference_site <- adni %>% group_by(site) %>% summarize(count = n()) %>% arrange(desc(count)) %>% pull(site) %>% head(30)
reference_df <- adni %>% filter(site %in% reference_site)
features <- colnames(reference_df)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "site"
ref_model <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = reference_df)

# harmonize new data to the reference data
harm_new <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni, reference = ref_model$harmonized_df)

Using the command-line interface requires users to set the following parameter:

  • --reference: Path to the reference dataset
Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv -d FALSE --features 43-104 --covariates 9,11,13-14 -b 16 -i 9*14 -m lmer -r 3 --reference path/to/reference.csv --outdir /path/to/harmonized_data.csv --mout /path/to/saved_model.rds