Statistical Data Analysis of Microbiomes and Metabolomics
Book logo
Book series logo

Statistical Data Analysis of Microbiomes and Metabolomics

Author(s):
Publication Date:
February 3, 2022
Copyright © 2022 American Chemical Society
eISBN:
‍9780841299160
DOI:
10.1021/acsinfocus.7e5035
Read Time:
five to six hours
Collection:
1
Publisher:
American Chemical Society
Google Play Store

Compared with other research fields, both microbiome and metabolomics data are complicated and have some unique characteristics, respectively. Thus, choosing an appropriate statistical test or method is a very important step in the analysis of microbiome and metabolomics data. However, this is still a difficult task for those biomedical researchers without a statistical background and for those biostatisticians who do not have research experiences in these fields.

Statistical Data Analysis of Microbiomes and Metabolomics focuses on data analysis, statistical methods, and models. The general goal of this primer is to provide our readers with:

  • The challenges of analyzing microbiome and metabolomics data using the standard models and methods.

  • The new specifically designed methods and models developed to target the unique characteristics of microbiome data.

  • The strengths and weaknesses of the newly developed methods and models.

  • A comparison of the same categories of methods, based on their nature and capabilities, including whether the methods fit different types of data.

  • Explanations for whether the tested methods and used models with their assumptions and attributes are amenable to the tested data.

  • References to real studies to illustrate each of the important methods and models.

Graduate students studying microbiome and metabolomics; statisticians, working on microbiome and metabolomics projects, either for their own research, or for their collaborative research for experimental design, grant application, and data analysis; and researchers who investigate biomedical and biochemical projects with the microbiome, metabolome, and multi-omics data analysis will benefit from reading this work.

Book series logo
Detailed Table of Contents
About the Series
Preface
Chapter 1.
Introduction to Microbiome Research Themes and to Characteristics of Microbiome and Metabolomics Data
1.1
Introduction and Overview
1.2
Microbiome Research Themes
1.2.1
Association and Mediation Analyses of Environment, Microbiome, and Host
1.2.2
Host and Microbiome Multi-Omics Integration
1.3
Characteristics of Microbiome and Metabolomics Data
1.3.1
Characteristics of Microbiome Data
1.3.1.1
Microbiome Data Are Classified into Hierarchical Taxonomic Ranks and Encoded as a Phylogenetic Tree
1.3.1.2
Microbiome Data Are Multivariate or High Dimensional
1.3.1.3
Microbiome Data Are Compositional
1.3.1.4
Microbiome Data Are Overdispersed
1.3.1.5
Microbiome Data Are Sparse and Zero-Inflated
1.3.1.6
Microbiome Data Are Heterogeneous
1.3.2
Characteristics of Metabolomics Data
1.3.2.1
Metabolomics Data Are High Dimensional
1.3.2.2
Metabolomics Data Are Compositional
1.3.2.3
Metabolomics Data Are Heterogeneous
1.4
Challenges for Statistical Analysis of Microbiome and Metabolomics Data
1.4.1
High-Dimensionality Causes the Large P and Small N Problem
1.4.2
Compositionality Causes the Dependency Problem
1.4.3
Sparsity with Excess Zeros Causes the Overdispersion and Zero-Inflation Problems
1.4.4
Heterogeneity Challenges Data Integration, Modeling, and Meta-Analysis
1.5
That’s a Wrap
1.6
Read These Next
Chapter 2.
Cross-Sectional Statistical Methods for Microbiome Data Analysis
2.1
Introduction and Overview
2.2
Count-Based Models
2.2.1
Overdispersed and Zero-Inflated Models
2.2.1.1
Methods Based on Differential Expression Analysis for RNA-Seq Data
2.2.1.2
Methods Based on Classical Overdispersed and Zero-Inflated Models
2.2.1.3
Methods Based on Dirichlet-Multinomial Distribution
2.2.1.4
Methods Based on Generalized Linear Models (GLMs) and Generalized Linear Mixed Models (GLMMs)
2.3
Relative Abundance-Based Models
2.3.1
Methods for Differential Abundance
2.3.2
Methods Based on Zero-Inflated Beta Models
2.4
Compositional Abundance-Based Models
2.4.1
Methods for Differential Abundance
2.4.2
Logistic Normal Multinomial Models
2.5
That’s a Wrap
2.6
Read These Next
Chapter 3.
Longitudinal Methods for Analysis of Microbiome Data
3.1
Introduction and Overview
3.2
Proportion (Continuous)-Based Models Based on GLMMs
3.2.1
Linear Mixed-Effects Models (LMMs)
3.2.2
Zero-Inflated Gaussian Mixed Models (ZIGMMs)
3.3
Two-Stage Beta Regression Models
3.4
Count-Based Models Based on GLMM
3.5
Multivariate Longitudinal Methods for Microbiome Data Analysis
3.5.1
Dirichlet/Paired Multinomial (DM and PM) Distributions-Based Methods
3.5.2
Multivariate Distance/Kernel-Based Methods
3.5.3
Multivariate GEE-Based Methods
3.5.4
Latent Variable SEM-Based Methods
3.6
That’s a Wrap
3.7
Read These Next
Chapter 4.
Mediational Methods for Analysis of Microbiome Data
4.1
Introduction and Overview
4.2
Framework of SEM-Based Mediation Analysis
4.2.1
Product-of-Coefficients Method
4.2.2
Difference-of-Coefficients Method
4.3
Framework of Counterfactual Mediation Analysis
4.3.1
Rubin’s Counterfactual Model
4.3.2
Pearl’s Natural Direct Effect and Natural Indirect Effect Model
4.4
Distance-Based Mediation Methods
4.4.1
Omnibus Test of Mediation Effect (MedTest)
4.4.2
Multivariate Omnibus Distance Mediation Analysis (MODIMA)
4.5
Compositional-Based Mediation Methods
4.5.1
Causal Compositional Mediation Model (CCMM)
4.5.2
Sparse Microbial Causal Mediation Model (SparseMCMM)
4.5.3
IsometricLRTMM Method
4.6
Nonparametric Mediation Methods
4.6.1
Information Theory
4.6.2
Nonparametric Entropy Mediation (NPEM)
4.6.2.1
Step 1
4.6.2.2
Step 2
4.6.2.3
Step 3
4.6.2.4
Step 4
4.7
Latent Variable SEM-Based Mediation Methods
4.8
Model Selection-Based Mediation Methods
4.9
That’s a Wrap
4.10
Read These Next
Chapter 5.
Univariate Metabolomics Data Analysis
5.1
Introduction and Overview
5.2
Univariate Approach for Analysis of Metabolomics Data
5.3
Performing Normality Test Using Shapiro–Wilk’s Test
5.4
Statistical Hypothesis Testing of Metabolomics Data
5.4.1
Parametric Statistical Methods
5.4.1.1
Student’s t-Test and Welch’s t-Test
5.4.1.2
ANOVA
5.4.2
Nonparametric Statistical Methods
5.4.2.1
Wilcoxon Rank-Sum Test and Mann–Whitney U-Test
5.4.2.2
Kruskal–Wallis Test
5.4.2.3
Wilcoxon Signed-Rank Test
5.4.2.4
Remarks on Nonparametric Methods
5.5
Performing Multiple Testing Correction to Adjust the P-Values
5.6
Constructing Volcano Plots to Identify Differential Metabolites
5.6.1
Introduction to the Volcano Plot
5.6.2
Classic Volcano Plot (CVP)
5.6.3
Robust Volcano Plot (RVP)
5.6.3.1
Constructing an RVP
5.6.3.2
Comparison of RVP with Other Methods in Determining Differential Metabolites
5.6.4
Bayesian Volcano Plot (BVP)
5.7
Limitations of Classic Univariate Metabolomics Analysis
5.8
That’s a Wrap
5.9
Read These Next
Chapter 6.
Multivariate Metabolomics Data Analysis
6.1
Introduction and Overview
6.2
Principal Component Analysis (PCA)
6.3
The Family of Partial Least Squares
6.3.1
Partial Least Squares (PLS)
6.3.2
Sparse Partial Least Squares (sPLS)
6.4
Discriminant Analysis (DA)
6.4.1
Linear Discriminant Analysis (LDA)
6.4.2
Linear Discriminant Analysis Effect Size (LefSe)
6.4.3
Partial Least Square-Discriminant Analysis (PLS-DA)
6.4.4
Sparse Linear Discriminant Analysis (SLDA and sLDA)
6.4.4.1
SLDA
6.4.4.2
sLDA
6.5
The Family of Orthogonal Projections to Latent Structures
6.5.1
Orthogonal Projections to Latent Structures (O-PLS)
6.5.2
Two-Way Orthogonal Partial Least Squares (O2-PLS)
6.5.3
Kernel-Based Orthogonal Projections to Latent Structures (K-OPLS)
6.6
Machine Learning
6.6.1
Support Vector Machines (SVMs)
6.6.2
Random Forest (RF)
6.7
Clustering
6.7.1
Hierarchical Clustering Analysis (HCA)
6.7.2
K-Means Clustering
6.8
That’s a Wrap
6.9
Read These Next
Bibliography
Glossary
Index
Reviewer quotes
Useful insights around statistical data analysis for microbiome and metabolomics research
Xiaotao Shen, Ph.D., Stanford University School of Medicine.
I will recommend this work to my colleagues who study the microbiome, metabolome, and multi-omics data analysis, and their application on precision medicine.
Author Info
Yinglin Xia
Yinglin Xia is a Research Associate Professor in the Department of Medicine at the University of Illinois at Chicago (UIC). He was a Research Assistant Professor in the Department of Biostatistics and Computational Biology at the University of Rochester (Rochester, NY) before joining AbbVie (North Chicago, IL) as a Clinical Statistician. He joined UIC as a Research Associate Professor in 2015. Dr. Xia has successfully applied his statistical study design and data analysis skills to clinical trials, medical statistics, biomedical sciences, and social and behavioral sciences. He has published more than 120 statistical methodology and research papers in peer-reviewed journals. He serves on the editorial board of 9 scientific journals and has served as a reviewer for over 90 scientific journals. Dr. Xia is the lead author of Statistical Analysis of Microbiome Data with R (Springer Nature, 2018), which was the first statistics book in microbiome study.
author image
Jun Sun
Jun Sun is a tenured Professor of Medicine at the University of Illinois at Chicago. She is an elected fellow of the American Gastroenterological Association (AGA) and American Physiological Society (APS). She chairs the AGA Microbiome and Microbial Therapy section. She is an internationally recognized expert on microbiome and vitamin D receptor in inflammation. Dr. Sun has published over 210 scientific articles in peer-reviewed journals and 6 books on microbiome. She is on the editorial boards of more than 10 peer-reviewed international scientific journals and serves on the study sections for the national and international research foundations. Dr. Sun is a believer of scientific art and artistic science. She enjoys writing her science papers in English and poems in Chinese. Her poetry collection《让时间停留在这一刻》(“Let Time Stay Still at This Moment”) was published in 2018.
author image