Batch Effect Diagnostics

In this section, we will use a sample data included in this package to provide a thorough introduction to the key functionalities of the ComBatFamQC package for batch effect diagnostics.

Data


The sample dataset in the package is the longitudinal cortical thickness data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (Jack et al. (2008)), which con- tains cortical thickness measures of 62 regions from 663 unique participants with images collected longitudinally between 2-6 visits. The processed data can be downloaded from GitHub.

The key variables of interest are presented below:

  • Batch variable:
    • manufac: MRI manufacturer
  • Covariates
    • timedays: time related variable
    • Age
    • SEX
    • DIAGNOSIS: (AD, CN, LMCI)
    (We consider an interaction effect between timedays and DIAGNOSIS)
  • Features to harmonize
    • 62 cortical thickness regions
  • Random Effect
    • subid : Subject IDs.

Interactive Batch Effect Diagnostics


The diagnostic workflow is the same whether you use R or the CLI:

  1. Set up variables: Select features, batch, covariates, interactions, and random effects.

  2. Fit the diagnostic model: A linear mixed-effects model (lmer) evaluates batch effects while accounting for repeated measures.

  3. View diagnostics: Inspect plots showing batch trends, covariate effects, and distribution differences.

  4. Interpret results: Use the summaries and visualizations to determine whether harmonization is needed.

You can run this workflow through R or through the CLI. Expand the panels below to see the corresponding code.

The R interface prepares the data, fits the models, and launches an interactive Shiny app for diagnostics using visual_prep()and comfam_shiny().

features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
result_lmer <- visual_prep(
  type = "lmer",
  features = features,
  batch = batch,
  covariates = covariates,
  interaction = interaction,
  smooth = NULL,
  random = "subid",
  df = adni
)
comfam_shiny(result_lmer)

What this does (summary):

  • Selects columns 43–104 as features (62 cortical thickness regions).

  • Uses manufac as the batch variable.

  • Includes timedays , AGE , SEX , and DIAGNOSIS , plus the timedays × DIAGNOSIS interaction.

  • Treats subid as a random effect for subject-level correlation.

  • Launches a Shiny app to explore diagnostic plots interactively.

If you prefer CLI, you can run the same workflow directly from the terminal. This approach is especially convenient for automation, scripting, or large-scale batch runs.

The CLI interface mirrors the R workflow but is designed for scripting and batch runs.

Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv \
  -d TRUE -v TRUE \
  --features 43-104 \
  --covariates 9,11,13-14 \
  -b 16 -i 9*14 -r 3 -m lmer

Main options overview:

  • --diagnosis/-d: whether to include diagnosis effects (default TRUE)
  • --visualization/-v: whether to generate diagnostic visuals (default TRUE)
  • --features/-f: feature columns (e.g., 43–104)
  • --covariates/-c: covariate columns
  • --batch/-b: batch column
  • --smooth/-s: smooth term column (for gam)
  • --random/-r: random-effect column (for lmer)

After running either workflow, ComBatFamQC will produce diagnostic plots and summaries that help you visually and quantitatively assess batch effects before harmonization.

Exporting Results


Once diagnostics are complete, you may want to:

  • Save tabular summaries (e.g., per-feature statistics, model summaries), or

  • Produce a standalone report that documents the diagnostic analysis.

ComBatFamQC supports both use cases via R functions and CLI options.

Export from R

You can export results either as an Excel file or a Quarto report. Both options use the same underlying diagnostics object produced in the previous step

diag_save(path = "path/to/dir", result = result_lmer, use_quarto = FALSE)
library(quarto)
diag_save(path = "path/to/dir", result = result_lmer, use_quarto = TRUE)

Export from CLI

From the command line, you can run diagnostics and export results in a single step or perform diagnostics first and export later. The CLI supports both Excel-style outputs and Quarto report generation.

Rscript path/to/combatQC_CLI.R path/to/unharmonized_data.csv \
  -d TRUE -v FALSE \
  --features 43-104 \
  --covariates 9,11,13-14 \
  -b 16 -i 9*14 -r 3 -m lmer \
  --outdir /path/to/dir\
  --quarto TRUE

Main options overview:

  • --visualization/-v: set to FALSE when exporting only
  • --outdir: directory for outputs
  • --quarto/-q: whether to generate a Quarto report

Learn More