Divergence of Root Microbiota in Different Habitats based on Weighted Correlation Networks

Jing Tang; Xinzhu He; Ming Tang; Xiaorong Xu; Ximin Zhang

doi:10.3791/62205

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

Network analysis was applied to evaluate the association of various ecological microbial communities, such as soil, water and rhizosphere. Presented here is a protocol on how to use the WGCNA algorithm to analyze different co-occurrence networks that may occur in the microbial communities due to different ecological environments.

Abstract

The root microbiome plays an important role in plant growth and environmental adaptation. Network analysis is an important tool for studying communities, which can effectively explore the interaction relationship or co-occurrence model of different microbial species in different environments. The purpose of this manuscript is to provide details on how to use the weighted correlation network algorithm to analyze different co-occurrence networks that may occur in microbial communities due to different ecological environments. All analysis of the experiment is performed in the WGCNA package. WGCNA is an R package for weighted correlation network analysis. The experimental data used to demonstrate these methods were microbial community data from the NCBI (National Center for Biotechnology Information) database for three niches of the rice (Oryza sativa) root system. We used the weighted correlation network algorithm to construct co-abundance networks of microbial community in each of the three niches. Then, differential co-abundance networks among endosphere, rhizoplane and rhizosphere soil were identified. In addition, the core genera in network were obtained by the "WGCNA" package, which plays an important regulated role in network functions. These methods enable researchers to analyze the response of microbial network to environmental disturbance and verify different microbial ecological response theories. The results of these methods show that the significant differential microbial networks identified in the endosphere, rhizoplane and rhizosphere soil of rice.

Introduction

Microbiome research has important implications for understanding and manipulating ecosystem processes¹^,². Microbial populations are interconnected by interacting ecological networks, whose characteristics can affect the response of microorganisms to environmental changes³^,⁴. Furthermore, the properties of these networks affect the stability of microbial communities, and are closely associated with soil function⁵. Weighted gene correlation network analysis has now been widely applied for research on the relationship between genes and microbial communities⁶. Previous studies have focused mainly on the associations between networks of different genes or populations and the outside world⁷. However, the differences in correlation networks formed by microbial populations under different environmental conditions have been scarcely investigated. The purpose of the research presented in this paper is to provide insights and details on the rapid implementation of the WGCNA algorithm to construct a co-occurrence network of microbiome samples collected under different environmental conditions. Based on the analysis results, we assessed the composition and differences of the population and further discussed the relationship between different microbial populations. The following basic flow of weighted correlation network algorithm⁸ was applied. First, a similarity matrix needed to be constructed by calculating the Pearson correlation coefficient between the Operational Taxonomic Units (OTU) expression profiles. Then, the parameters of the adjacency functions (the power or the sigmoid adjacency functions) were adopted with a scale-free topology criterion, the similarity matrix was transformed into an adjacency matrix, and each co-occurrence network corresponded to an adjacency matrix. We used average linkage hierarchical clustering coupled with the TOM-based dissimilarity to group OTUs with coherent expression profiles into modules. Further, we calculated the relationship between conservative statistics and the related parameter analysis modules, finally identifying the hub OTU in the module. These methods are particularly suitable for analysis of the differences in network structures among various microbial populations under divergent environmental conditions. In this manuscript, we have described in detail the method of co-expression network development, the analysis of the dissimilarities between the modules, and have provided a brief overview of the steps in the procedure applied to obtain the core species in different module networks.

Protocol

1. Data Download

Download the data of the accession PRJNA386367 form the NCBI database. From the data of the accession PRJNA386367, select the rhizosphere, rhizoplane, and endosphere microbiome data from rice plants grown for 14 weeks in a submerged rice field in Arbuckle, California in 2014.
NOTE: The rhizosphere, rhizoplane, and endosphere microbiome data were presented by the OTUs table in accession PRJNA386367.

2. Optimal power value determination

NOTE: The WGCNA package contains all of the following functional parameters. WGCNA is an R package for weighted correlation network analysis. The key command lines refer to the Supplement S1.

In the R language environment, open the Rstudio software and install the WGCNA package.
Load the data and use the goodSamplesGenes function to check the correctness of the data. Execute the command lines:
"gsg = goodSamplesGenes(datExpr0, verbose = 3)
gsg$allOK "
Click Run.
Check for outliers and store samples that meet the requirements. When the check result is TRUE, continue to the next step. Save the result.
Use the PickSoftThreshold function to calculate the scale-free index R² of the two groups of the data under different power values. Execute the command line:
"sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5)"
Click Run.
Visualize the results (Figure 1). Execute the command line:
"plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("ES_Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red");
abline(h=0.9,col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("ES_Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")"
Click Run.
NOTE: The premise of the weighted correlation network algorithm is that the established co-expression network structure conforms to the standards of the scale-free topology criterion, increasing its robustness. A scale-free index closer to 1 indicates a network structure that is closer to the scale-free network.
Select the power value when the scale-free index R² squared greater than 0.9 and proceed to the next step of analysis.
NOTE: When the scale-free index is close to 1, the network structure is closer to the scale-free network. When analyzing two or more networks, it is necessary to choose to make each network close to the power value of the scale-free network to satisfy the comparability between the co-expressed networks.

3. Construction of a co-expression network and module identification

NOTE: Based on the above calculated power value, the co-occurrence network is constructed. The key command lines refer to the Supplement S2.

Use the adjacency function in the WGCNA package to add signed parameters for the construction of a symbolic co-occurrence network. Execute the command line:
"adjacency = adjacency(datExpr0, power = softPower)"
Click Run.
Apply the TOM-similarity function to develop a topological overlapping network and calculate the dissimilarity network. Execute the command line:
"TOM = TOMsimilarity(adjacency);
dissTOM = 1-TOM"
Click Run.
NOTE: The signed parameter was added to set the topology overlap network type.
Use the hclust function to select the average linkage hierarchical clustering method for hierarchical clustering. Execute the command line:
"geneTree = hclust(as.dist(dissTOM), method = "average");"
Click Run.
Use the cutreeDynamic function to perform dynamic branch cutting and set the minClusterSize parameter to 30. Obtain the module recognition result. Execute the command line:
"dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM, deepSplit = 2, pamRespectsDendro = FALSE, minClusterSize = minModuleSize);"
Click Run.
NOTE: The minimum module size could not be lower than 30.
Calculate the module eigen of each OTUs module by the moduleEigengenes function. Execute the command line:
"MEList = moduleEigengenes(datExpr0, colors = dynamicColors)
MEs = MEList$eigengenes"
Click Run.
NOTE: The module eigen represented the overall OTU expression level in the module. It was not a specific OTU, but the first principal component of each cluster obtained by singular network value decomposition.
Perform the cluster function based on the correlation coefficient of module eigen. Use the mergeCloseModules function to merge the modules with a value lower than 0.25. Execute the command line:
"merge = mergeCloseModules(datExpr0, dynamicColors, cutHeight = MEDissThres, verbose = 3)"
Click Run.
Finally, use the plotDendroAndColors function for visualization to obtain the module assignment display diagram of each co-expression network (Figure 2). Use the table function to extract the module attribution corresponding of each OTin the module assignment table. Execute the command line:
"plotDendroAndColors(geneTree, mergedColors, "Merged dynamic",dendroLabels = FALSE,
hang = 0.03,addGuide = TRUE, guideHang = 0.05,
main = "ES_Gene dendrogram and module colors")"
Click Run.
NOTE: In the module assignment diagram of the co-expressing network, different colors represent different modules, and gray represents OTUs that cannot be classified into any module. A greater number of OTUs in the gray module indicates that the early-stage preprocessing quality of the expression matrix is poor.

4. Module comparison

NOTE: This method can be used to compare the network modules of two ecological microbial communities. In this article, compare the differences of microbial network modules between endosphere and rhizoplane, endosphere and rhizosphere, rhizosphere and rhizoplane.

Preservation test
1. Load the parameters and results of the two data sets saved in the previous steps.
2. Set the network module assignment result of a group of microbial data as the reference group, whereas the other group as the test group.
3. Use the modulePreservation function to calculate the values of conservativeness statistical parameters Z_summary and medianRank. Execute the command line:
  "system.time({mp=modulePreservation(multiExpr,
  multiColor,referenceNetworks=1,
  nPermutation=100, randomSeed=1,quickCor=0,verbose=3)})"
  Click Run.
  NOTE: This result can quantified the conservativeness between modules. Z_summary>10 indicates that two modules are highly preserved, whereas Z_summary<2 denotes non- preserved modules. medianRank expresses the relative preservation of the module assessed by ranking. Higher medianRank values denote non- preserved modules. (The key command lines refer to the Supplement S3.)
4. Use the plot function to visualize the results (Figure 3). Get the parameters Z_summary and medianRank (Table 1).
  NOTE: The network modules that satisfy both the Z_summary value less than 2 and the median Rank value at the top, is the most highly non-preserved module in the two ecological microbial communities.
5. Based on the results of the aforementioned two statistical parameters to identify the module with most highly non-preserved module of the two networks.
Correlation analysis of the module membership
1. Set the module assignment results of the two networks were set as the reference and the test group, respectively.
  NOTE: The settings need to be the same as Preservation test.
2. Use the corPvalueStudent function to extract the kME (module membership) value of each OTU in several candidate modules.
  Execute the command line:
  "Pvalue = as.data.frame(corPvalueStudent(as.matrix
  (ModuleMembership), Samples))"
  Click Run.
  NOTE: kME stands for the degree of module membership. ME stands for module eigen, which represents the overall level of OTU expression in the module. kME is the correlation coefficient between each OTU and the ME. Quantify the importance of OTU in the network by the kME value of OTU. (The key command lines refer to the Supplement S4.)
3. Then, use the verboseScatterplot function to calculate the correlation coefficient of the kME value of the corresponding OTUs in the two networks and draw the correlation analysis diagram (Figure 4).
  Execute the command line:
  "verboseScatterplot(abs(TModuleMembership
  [TmoduleGenes, Tcolumn]),
  abs(NModuleMembership[NmoduleGenes, Ncolumn]),
  xlab = paste("kME in", "ES"),
  ylab = paste("kME in", "RP"),
  main = paste("lightyellow"),
  cex.main = 1.7, cex.lab = 1.6, cex.axis = 1.6, col = modulecolor)"
  Click Run.
4. Select the module with the smallest correlation coefficient of the kME value of the OTU of the two networks. Consider this module to have the largest difference of the two networks.

5. Analysis of the microbial differential network module

Obtain data of the dominant bacteria phyla through statistical analysis of the OTU sequence set of the module with the largest difference.
NOTE: The OTU sequence set of the module with the largest difference is summed by the taxonomy of phyla. The dominant bacteria phyla accounted for more than 10%.
Then, use the exportNetworkToCytoscape function to obtain the file containing the interaction relationship information of the OTU in the largest differential module.
Execute the command line:
"cyt = exportNetworkToCytoscape(modTOM,
edgeFile = paste("NEW-ES_CytoscapeInput-edges-", modules , ".txt", sep=""),
nodeFile = paste("NEW-ES_CytoscapeInput-nodes-", modules, ".txt", sep=""),
weighted = TRUE,threshold = 0.5, nodeNames = modProbes,
altNodeNames = modGenes, nodeAttr = moduleColors[inModule])"
Click Run.
Import the file into Cytoscape. Set the threshold to 0.5 and adjust other parameters as needed.
Construct a co-occurrence network of differential microorganisms (Figure 5).
Obtained the information of the core genus that has the most important regulatory role in the network.
NOTE: According to the the kME value of OUT, the core genus can be defined.
Finally, the functions of the core genus were assessed and its influence on the entire difference network was analyzed.

Results

The representative results in this article were downloaded from the 2014 California Abaker rice root microbiome data in the NCBI database (PRJNA386367)⁹. The data includes the rhizosphere, rhizoplane, and endosphere microbiome samples from rice plants grown for 14 weeks in a submerged rice field. We used the WGCNA algorithm to select the power value that satisfied the three networks that were close to the scale-free network (Figure 1) and developed three co-expression...

Discussion

Correlation networks have been increasingly used in bioinformatics applications. WGCNA is a systems biology method for descriptive analysis of the relationships between various elements of a biological system¹². R software package was used in earlier work on WGCNA¹³^,¹⁴^,¹⁵. The package includes functions for network construction, module detection, calculations of topological properties, data simulation, vi...

Disclosures

The authors have nothing to disclose.

Acknowledgements

The development of this manuscript was supported by funds from National Natural Science Foundation of China-Guizhou Provincial People's Government Karst Science Research Center Project (U1812401), Doctoral Research Project of Guizhou Normal University (GZNUD[2017]1), Science and Technology Support Project of Guizhou Province (QKHZC[2021]YB459) and the Science and Technology Project of the Guiyang([2019]2-8).

The authors would like to thank Edwards J.A et al for providing rice microbiome data in public databases and support from TopEdit (www.topeditsci.com) for its linguistic assistance during the preparation of this manuscript.

Materials

Name	Company	Catalog Number	Comments
R	The University of Auckland	version 4.0.2	R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
RStdio	JJ Allaire	version 1.4.1103	The RStudio IDE is a set of integrated tools designed to help you be more productive with R and Python.
Cytoscape		version 3.7.1	Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data.
NCBI database			The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.

References

Philippot, L., Raaijmakers, J. M., Lemanceau, P., vander Putten, W. H. Going back to the roots: the microbial ecology of the rhizosphere. Nature Reviews Microbiology. 11, 789-799 (2013).
Fierer, N. Embracing the unknown: disentangling the complexities of the soil microbiome. Nature Review Microbiology. 15 (10), 579-590 (2017).
Jin, J., Wang, G. H., Liu, X. B., Liu, J. D., Chen, X. L., Herbert, S. J. Temporal and spatial dynamics of bacterial community in the rhizosphere of soybean genotypes grown in a black soil. Pedosphere. 19 (6), 808-816 (2009).
Ma, B., et al. Genetic correlation network prediction of forest soil microbial functional organization. ISME J. 12 (10), 2492-2505 (2018).
de Vries, F. T., et al. Soil bacterial networks are less stable under drought than fungal networks. Nature Communications. 9 (1), 3033 (2018).
Colin, C., et al. Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis. (10), 2300-2308 (2013).
Ma, B., Zhao, K., Lv, X., et al. Genetic correlation network prediction of forest soil microbial functional organization. ISME J. 12, 2492-2505 (2018).
Zhang, B., Horvath, S. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology. 4 (1), (2005).
Edwards, J. A., et al. Compositional shifts in root-associated bacterial and archaeal microbiota track the plant life cycle in field-grown rice. PLoS Biology. 16 (2), 2003862 (2018).
Bashan, Y., De-Bashan, L. E. How the Plant Growth-Promoting Bacterium Azospirillum Promotes Plant Growth-A Critical Assessment. Advances in Agronomy. 108, 77-136 (2010).
Lovley, D. R., et al. Geobacter: The Microbe Electric's Physiology, Ecology, and Practical Applications. Advances in Microbial Physiology. 59, 1 (2011).
Langfelder, P., Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 9, 559 (2008).
Zhang, B., Horvath, S. A General Framework for Weighted Gene Co-expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 4 (1), 17 (2005).
Horvath, S., Dong, J. Geometric interpretation of Gene Co-expression Network Analysis. PLoS Computational Biology. 4 (8), 1000117 (2008).
Langfelder, P., Horvath, S. Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology. 1, 54 (2007).
Horvath, S., et al. Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target. Proceedings of the National Academy of Sciences of the United States of America. 103 (46), 17402-17407 (2006).
Carlson, M. R., et al. and Sequence Conservation: Predictions from Modular Yeast Co-expression Networks. BMC Genomics. 7 (1), 40 (2006).
Fuller, T., et al. Weighted Gene Co-expression Network Analysis Strategies Applied to Mouse Weight. Mammalian Genome. 6 (18), 463-472 (2007).
Yin, L., Wang, Y., Lin, Y., et al. Explorative analysis of the gene expression profile during liver regeneration of mouse: a microarray-based study[J]. Artificial Cells Nanomedicine & Biotechnology. 47 (1), 1113-1121 (2019).
Oldham, M., Horvath, S., Geschwind, D. Conservation and Evolution of Gene Co-expression Networks in Human and Chimpanzee Brains. Proceedings of the National Academy of Sciences of the United States of America. 103 (47), 17973-17978 (2006).
Oldham, M. C., et al. Functional organization of the transcriptome in human brain. Nature Neuroscience. 11 (11), 1271-1282 (2008).
Keller, M. P., et al. A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility. Genome Research. 18 (5), 706-716 (2008).
Weston, D., Gunter, L., Rogers, A., Wullschleger, S. Connecting genes, coexpression modules, and molecular signatures to environmental stress phenotypes in plants. BMC Systems Biology. 2 (1), 16 (2008).
Jorda´n, F. Keystone species and food webs. Biological Sciences. 364, 1733-1741 (2009).
Backhed, F., Ley, R. E., Sonnenburg, J. L., Peterson, D. A., Gordon, J. I. Host-Bacterial Mutualism in the Human Intestine. Science. 307, 1915-1920 (2009).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Microbiota Root Microbiota WGCNA Algorithm Co occurrence Network Microbial Species Hierarchical Clustering Scale free Network Adjacency Function TOM Similarity Clustering Analysis Dynamic Branch Cutting Module Size NCBI Database Data Visualization

This article has been published

Video Coming Soon

Keep me updated: