A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Results
  • Discussion
  • Disclosures
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

The Inherent Dynamics Visualizer is an interactive visualization package that connects to a gene regulatory network inference tool for enhanced, streamlined generation of functional network models. The visualizer can be used to make more informed decisions for parameterizing the inference tool, thus increasing confidence in the resulting models.

Abstract

Developing gene regulatory network models is a major challenge in systems biology. Several computational tools and pipelines have been developed to tackle this challenge, including the newly developed Inherent Dynamics Pipeline. The Inherent Dynamics Pipeline consists of several previously published tools that work synergistically and are connected in a linear fashion, where the output of one tool is then used as input for the following tool. As with most computational techniques, each step of the Inherent Dynamics Pipeline requires the user to make choices about parameters that don't have a precise biological definition. These choices can substantially impact gene regulatory network models produced by the analysis. For this reason, the ability to visualize and explore the consequences of various parameter choices at each step can help increase confidence in the choices and the results.The Inherent Dynamics Visualizer is a comprehensive visualization package that streamlines the process of evaluating parameter choices through an interactive interface within a web browser. The user can separately examine the output of each step of the pipeline, make intuitive changes based on visual information, and benefit from the automatic production of necessary input files for the Inherent Dynamics Pipeline. The Inherent Dynamics Visualizer provides an unparalleled level of access to a highly intricate tool for the discovery of gene regulatory networks from time series transcriptomic data.

Introduction

Many important biological processes, such as cell differentiation and environmental response, are governed by sets of genes that interact with each other in a gene regulatory network (GRN). These GRNs produce the transcriptional dynamics needed for activating and maintaining the phenotype they control, so identifying the components and topological structure of the GRN is key to understanding many biological processes and functions. A GRN may be modeled as a set of interacting genes and/or gene products described by a network whose nodes are the genes and whose edges describe the direction and form of interaction (e.g., activation/repression of transcription, post-translational modification, etc.)1. Interactions can then be expressed as parameterized mathematical models describing the impact a regulating gene has on the production of its target(s)2,3,4. Inference of a GRN model requires both an inference of the structure of the interaction network and estimation of the underlying interaction parameters. A variety of computational inference methods have been developed that ingest time series gene expression data and output GRN models5. Recently, a new GRN inference method was developed, called the Inherent Dynamics Pipeline (IDP), that utilizes time series gene expression data to produce GRN models with labeled regulator-target interactions that are capable of producing dynamics that match the observed dynamics in the gene expression data6. The IDP is a suite of tools connected linearly into a pipeline and can be broken down into three steps: a Node Finding step that ranks genes based on gene expression characteristics known or suspected to be related to the function of the GRN7,8, an Edge Finding step that ranks pairwise regulatory relationships8,9, and a Network Finding step that produces GRN models that are capable of producing the observed dynamics10,11,12,13,14,15.

Like most computational methods, the IDP requires a set of user-specified arguments that dictate how the input data is analyzed, and different sets of arguments can produce different results on the same data. For example, several methods, including the IDP, contain arguments that apply some threshold on the data, and increasing/decreasing this threshold between successive runs of the particular method can result in dissimilar results between runs (see Supplement Note 10: Network inference methods of5). Understanding how each argument may impact the analysis and subsequent results is important for achieving high confidence in the results. Unlike most GRN inference methods, the IDP consists of multiple computational tools, each having its own set of arguments that a user must specify and each having its own results. While the IDP provides extensive documentation on how to parameterize each tool, the interdependency of each tool on the output of the previous step makes parameterizing the entire pipeline without intermediate analyses challenging. For instance, arguments in the Edge and Network Finding steps are likely to be informed by prior biological knowledge, and so will depend on the dataset and/or organism. To interrogate intermediate results, a basic understanding of programming, as well as a deep understanding of all the result files and their contents from the IDP, would be needed.

The Inherent Dynamics Visualizer (IDV) is an interactive visualization package that runs in a user's browser window and provides a way for users of the IDP to assess the impact of their argument choices on results from any step in the IDP. The IDV navigates a complicated directory structure produced by the IDP and gathers the necessary data for each step and presents the data in intuitive and interactive figures and tables for the user to explore. After exploring these interactive displays, the user can produce new data from an IDP step that can be based on more informed decisions. These new data can then be immediately used in the next respective step of the IDP. Additionally, exploration of the data can help determine whether an IDP step should be rerun with adjusted parameters. The IDV can enhance the use of the IDP, as well as make the use of the IDP more intuitive and approachable, as demonstrated by investigating the core oscillator GRN of the yeast cell-cycle. The following protocol includes IDP results from a fully parameterized IDP run versus an approach that incorporates the IDV after runs of each IDP step, i.e., Node, Edge, and Network Finding.

Protocol

1. Install the IDP and IDV

NOTE: This section assumes that docker, conda, pip, and git are installed already (Table of materials).

  1. In a terminal, enter the command: git clone https://gitlab.com/biochron/inherent_dynamics_pipeline.git.
  2. Follow the install instructions in the IDP's README file.
  3. In a terminal, enter the command: git clone https://gitlab.com/bertfordley/inherent_dynamics_visualizer.git.
    NOTE: Cloning of the IDV should happen outside of the IDP's top-level directory.
  4. Follow the install instructions in the IDV's README file.

2. Node finding

  1. Create a new IDP configuration file that parametrizes the Node Finding step.
    NOTE: All quotation marks in the following steps should not be typed out. The quotation marks are only used here as a delimiter between the protocol text and what is to be typed out.
    1. Add the main IDP arguments to the configuration file.
    2. Open a new text file in a text editor and type "data_file =", "annotation_file =", "output_dir =", "num_proc =", and "IDVconnection = True" on individual lines.
    3. For "data_file", after the equal to sign, type the path to and name of the respective time series file and type a comma after the name. Separate each data by a comma, if more than one time series data set is being used. See Supplemental File 1 and Supplemental File 2 for an example of time series gene expression files.
    4. Type the path to and name of the annotation file for "annotation_file", after the equal to sign. See Supplemental File 3 for an example of an annotation file.
    5. For "output_file", after the equal to sign, type the path to and name of the folder where results will be saved.
    6. After the equal to sign, for "num_proc", type the number of processes the IDP should use.
    7. Add Node Finding arguments to the configuration file.
    8. In the same text file as in step 2.1.1, type in the order presented "[dlxjtk_arguments]", "periods =", and "dlxjtk_cutoff =" on individual lines. Place these after the main arguments.
    9. For "periods", after the equal to sign, if one-time series data set is being used, type each period length separated by commas. For more than one time series data set, type each set of period lengths as before but place square brackets around each set and place a comma between the sets.
    10. After the equal to sign, for "dlxjtk_cutoff", type an integer specifying the maximum number of genes to retain in the gene_list_file output by de Lichtenberg by JTK_CYCLE (DLxJTK) (Table 1).
      NOTE: It is highly recommended to review the dlxjtk_arguments sections in the IDP README to get a better understanding of each argument. See Supplemental File 4 for an example of a configuration file with the Node Finding arguments specified.
  2. In the terminal, move into the IDP directory, named inherent_dynamics_pipeline.
  3. In the terminal, enter the command: conda activate dat2net
  4. Run the IDP using the configuration file created in step 2.1 by running this command in the terminal, where <config file name> is the name of the file: python src/dat2net.py <config file name>
  5. In the terminal, move to the directory named inherent_dynamics_visualizer and enter the command: ./viz_results.sh <results_directory>
    NOTE: <results_directory> will point to the directory used as the output directory for the IDP.
  6. In a web browser, enter http://localhost:8050/ as the URL.
  7. With the IDV now open in the browser, click on the Node Finding tab and select the node finding folder of interest from the dropdown menu.
  8. Manually curate a new gene list from the gene list table in the IDV to be used for subsequent IDP steps.
    1. To extend or shorten the gene list table, click on the up or down arrows or manually enter in an integer between 1 and 50 in the box next to Gene expression of DLxJTK-ranked genes. Top:.
    2. In the gene list table, click on the box beside a gene to view its gene expression profile in a line graph. Multiple genes can be added.
    3. Optionally specify the number of equally sized bins to compute and order genes by the time interval containing their peak expression, by inputting an integer into the input box above the gene list table labeled Input integer to divide the first cycle into bins:.
      NOTE: This option is specific to oscillatory dynamics and might not be applicable to other types of dynamics.
    4. Select a heatmap viewing preference by clicking on an option under Order Genes By: First Cycle Max Expression (Table 1) which orders genes based on the time of the gene-expression peak in the first cycle.
      NOTE: DLxJTK Rank orders genes based on the periodicity ranking from the DLxJTK algorithm of the IDP.
    5. Click on the Download Gene List button to download the gene list into the file format needed for the Edge Finding step. See Supplemental File 5 for an example of a gene list file.
  9. In the Editable Gene Annotation Table, label a gene as a target, a regulator, or both in the annotation file for the Edge Finding step in a new Edge Finding run. If a gene is a regulator, label the gene as an activator, repressor, or both.
    1. To label a gene as an activator, click on the cell in the tf_act column and change the value to 1. To label a gene as a repressor, change the value in the tf_rep column to 1. A gene will be allowed to act as both an activator and a repressor in the Edge Finding step by setting the values in both the tf_act and tf_rep columns to 1.
    2. To label a gene as a target, click on the cell in the target column and change the value to 1.
  10. Click on the Download Annot. File button to download the annotation file into the file format needed for the Edge Finding step.

3. Edge finding

  1. Create a new IDP configuration file that parametrizes the Edge Finding step.
    1. Add the main IDP arguments to the configuration file. Open a new text file in a text editor and repeat step 2.1.1.
    2. Add Edge Finding arguments to the configuration file.
    3. In the same text file as in step 3.1.1, type in the order presented "[lempy_arguments]", "gene_list_file =", "[netgen_arguments]", "edge_score_column =", "edge_score_thresho =", "num_edges_for_list =", "seed_threshold =", and "num_edges_for_seed =" on individual lines. These should go below the main arguments.
    4. For "gene_list_file", after the equal to sign, enter the path to and name of the gene list file generated in step 2.8.5.
    5. For "edge_score_column", after the equal to sign, enter either "pld" or "norm_loss" to specify which data frame column from the lempy output is used to filter the edges.
    6. Select either "edge_score_threshold" or "num_edges_for_list", and delete the other. If "edge_score_threshold" was selected, enter a number between 0 and 1. This number will be used to filter edges based on the column specified in step 3.1.5.
      1. If "num_edges_for_list" was selected, enter a value equal to or less than the number of possible edges. This number will be used to filter the edges based on how they are ranked in the column specified in step 3.1.5. The edges left over will be used to build networks in Network Finding.
    7. Select either "seed_threshold" or "num_edges_for_seed" and delete the other. If "seed_threshold" was selected, enter a number between 0 and 1. This number will be used to filter edges based on the column specified in step 3.1.5.
      1. If "num_edges_for_seed" was selected, enter a value equal to or less than the number of possible edges. This number will be used to filter the edges based on how they are ranked in the column specified in step 3.1.5. The edges left over will be used to build the seed network (Table 1) used in Network Finding.
        NOTE: It is highly recommended to review the lempy_arguments and netgen_arguments sections in the IDP README to get a better understanding of each argument. See Supplemental File 7 for an example of a configuration file with the Edge finding arguments specified.
  2. Repeat steps 2.2 and 2.3.
  3. Run the IDP using the configuration file created in step 3.1 by running this command in the terminal, where <config file name> is the name of the file: python src/dat2net.py <config file name>
  4. If the IDV is still running, stop it by pressing Control C in the terminal window to stop the program. Repeat steps 2.5 and 2.6.
  5. With the IDV open in the browser, click on the Edge Finding tab and select the edge finding folder of interest from the drop-down menu.
    NOTE: If multiple datasets are used in Edge Finding, then make sure to select the last dataset that was used in the Local Edge Machine (LEM) analysis (Table 1). It is important when selecting edges for the seed network or edge list based on LEM results to look at the last time series data listed in the configuration file as this output incorporates all preceding datafiles in its inference of regulatory relationships between nodes.
  6. To extend or shorten the edge table, manually enter an integer in the input box under Number of Edges:.
  7. Optionally filter edges on the LEM ODE parameters. Click and drag to move either the left side or the right side of each parameter's slider to remove edges from the edge table that have parameters outside of their new allowed parameter bounds.
  8. Optionally create a new seed network if a different seed network is wanted than the one proposed by the IDP. See Supplemental File 8 for an example of a seed network file.
    1. Select either From Seed to select the seed network or From Selection from the dropdown menu under Network:.
    2. Deselect/select edges from the edge table by clicking the corresponding checkboxes adjacent to each edge to remove/add edges from the seed network.
  9. Click on the Download DSGRN NetSpec button to download the seed network in the Dynamic Signatures Generated by Regulatory Networks (DSGRN) (Table 1) network specification format.
  10. Select additional nodes and edges to be used in the Network Finding step.
    1. Select edges from the edge table by clicking the corresponding checkboxes to include in the edge list file used in Network Finding.
    2. Click on Download Node and Edge Lists to download the node list and edge list files in the format required for their use in Network Finding. See Supplemental File 9 and Supplemental File 10 for examples of edge and node list files, respectively.
      ​NOTE: The node list must contain all the nodes in the edge list file, so the IDV automatically creates the node list file based on the selected edges. Two options are available for viewing the edges in Edge Finding. The LEM Summary Table option presents the edges as a ranked list of the top 25 edges. Top-Line LEM Table presents the edges in a concatenated list of the top three ranked edges for each possible regulator. The number of edges viewed for each option can be adjusted by the user by changing the number in the Number of Edges input box.

4. Network finding

  1. Create a new IDP configuration file that parametrizes the Network Finding step.
    1. Add the main IDP arguments to the configuration file. Open a new text file in a text editor and repeat step 2.1.1.
    2. Add Network Finding arguments to the configuration file.
    3. In the same text file as in step 4.1.1, type in the order presented "[netper_arguments]", "edge_list_file =", "node_list_file =", "seed_net_file =", "range_operations =", "numneighbors =", "maxparams =", "[[probabilities]]", "addNode =", "addEdge =", "removeNode =", and "removeEdge =" on individual lines, below the main arguments.
    4. For "seed_net_file", "edge_list_file" and "node_list_file", after the equal sign, enter the path to and name of the seed network file and the edge and node list files generated in steps 3.9 and 3.10.2.
    5. After the equal to sign, for "range_operations", type two numbers separated by a comma. The first and second numbers are the minimum and the maximum number of addition or removal of nodes or edges per network made, respectively.
    6. For "numneighbors", after the equal to sign, enter a number that represents how many networks to find in Network Finding.
    7. For "maxparams", after the equal to sign, enter a number that represents the maximum number of DSGRN parameters to allow for a network.
    8. Enter values between 0 and 1 for each of these arguments: "addNode", "addEdge", "removeNode", and "removeEdge", after the equal to sign. The numbers must sum to 1.
      NOTE: It is highly recommended to review the netper_arguments and netquery_arguments sections in the IDP README to get a better understanding of each argument. See Supplemental File 11 and Supplemental File 12 for examples of a configuration file with the Network Finding arguments specified.
  2. Repeat steps 2.2 and 2.3.
  3. Run the IDP using the configuration file created in step 4.1 by running this command in the terminal, where <config file name> is the name of the file: python src/dat2net.py <config file name>
  4. If the IDV is still running, stop it by pressing Control C in the terminal window to stop the program. Repeat steps 2.5 and 2.6.
  5. With the IDV open in the browser, click on the Network Finding tab and select the network finding folder of interest.
  6. Select a network or set of networks to generate an edge prevalence table (Table 1) and to view the networks along with their respective query results.
    1. Two options are available for selecting networks: Option 1 - Input lower and upper bounds on query results by inputting minimum and maximum values in the input boxes corresponding to the x-axis and y-axis of the plot. Option 2 - Click and drag over the scatterplot to draw a box around the networks to be included. After selection or input bounds are entered, press the Get Edge Prevalence From Selected Networks button.
      NOTE: If more than one DSGRN query was specified, use the radio buttons labeled with the query type to switch between each query's results. The same applies if more than one epsilon (noise level) was specified.
  7. Click the arrows beneath the edge prevalence table to move to the next page of the table. Press Download Table to download the edge prevalence table.
  8. Input an integer in the Network Index input box to display a single network from the selection made in step 4.6. Click on Download DSGRN NetSpec to download the displayed network in the DSGRN network specification format.
  9. Search networks for similarity to a specified motif or network of interest.
    1. Use the checkboxes corresponding to each edge to select edges to be included in the network or motif used for the similarity analysis. Click on Submit to create the similarity scatterplot for the selected motif or network.
      NOTE: Use the arrows in the edge list to sort alphabetically and the arrows beneath the table to move to the next page of the table.
    2. Click and drag over the scatterplot to draw a box around the networks to be included to select a network or set of networks to generate an edge prevalence table and to view the networks along with their respective query results.
      NOTE: If more than one DSGRN query was specified, use the radio buttons labeled with the query type to switch between each query's results. The same applies if more than one epsilon (noise level) was specified.
    3. Repeat steps 4.7 and 4.8 to download the edge prevalence table and the displayed network for the similarity analysis, respectively.

Results

The steps described textually above and graphically in Figure 1 were applied to the core oscillating GRN of the yeast cell-cycle to see if it is possible to discover functional GRN models that are capable of producing the dynamics observed in time series gene expression data collected in a yeast cell-cycle study16. To illustrate how the IDV can clarify and improve IDP output, the results, after performing this analysis in two ways, were compared: 1) running all steps ...

Discussion

The inference of GRNs is an important challenge in systems biology. The IDP generates model GRNs from gene expression data using a sequence of tools that utilize the data in increasingly complex ways. Each step requires decisions about how to process the data and what elements (genes, functional interactions) will be passed to the next layer of the IDP. The impacts of these decisions on IDP results are not as obvious. To help in this regard, the IDV provides useful interactive visualizations of the outputs from individua...

Disclosures

The authors have nothing to disclose.

Acknowledgements

This work was funded by the NIH grant R01 GM126555-01 and NSF grant DMS-1839299.

Materials

NameCompanyCatalog NumberComments
Dockerhttps://docs.docker.com/get-docker/
Githttps://git-scm.com/
Inherent Dynamics Pipelinehttps://gitlab.com/biochron/inherent_dynamics_pipeline
Inherent Dynamics Visualizerhttps://gitlab.com/bertfordley/inherent_dynamics_visualizer
Minicondahttps://docs.conda.io/en/latest/miniconda.html
Piphttps://pip.pypa.io/en/stable/

References

  1. Karlebach, G., Shamir, R. Modelling and analysis of gene regulatory networks. Nature Reviews Molecular Cell Biology. 9 (10), 770-780 (2008).
  2. Aijö, T., Lähdesmäki, H. Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics. Bioinformatics. 25 (22), 2937-2944 (2009).
  3. Huynh-Thu, V. A., Sanguinetti, G. Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics. 31 (10), 1614-1622 (2015).
  4. Oates, C. J., et al. Causal network inference using biochemical kinetics. Bioinformatics. 30 (17), 468-474 (2014).
  5. Marbach, D., et al. Wisdom of crowds for robust gene network inference. Nature Methods. 9 (8), 796-804 (2012).
  6. . Inherent Dynamics Pipeline Available from: https://gitlab.com/biochron/inherent_dynamics_pipeline (2021)
  7. Motta, F. C., Moseley, R. C., Cummins, B., Deckard, A., Haase, S. B. Conservation of dynamic characteristics of transcriptional regulatory elements in periodic biological processes. bioRxiv. , (2020).
  8. . LEMpy Available from: https://gitlab.com/biochron/lempy (2021)
  9. McGoff, K. A., et al. The local edge machine: inference of dynamic models of gene regulation. Genome Biology. 17, 214 (2016).
  10. Cummins, B., Gedeon, T., Harker, S., Mischaikow, K. Model rejection and parameter reduction via time series. SIAM Journal on Applied Dynamical Systems. 17 (2), 1589-1616 (2018).
  11. Cummins, B., Gedeon, T., Harker, S., Mischaikow, K. Database of Dynamic Signatures Generated by Regulatory Networks (DSGRN). Lecture Notes in Computer Science. (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). , 300-308 (2017).
  12. Cummins, B., Gedeon, T., Harker, S., Mischaikow, K. DSGRN: Examining the dynamics of families of logical models. Frontiers in Physiology. 9. 9, 549 (2018).
  13. . DSGRN Available from: https://github.com/marciogameiro/DSGRN (2021)
  14. . Dsgm_Net_Gen Available from: https://github.com/breecummins/dsgrn_net_gen (2021)
  15. . Dsgrn_Net_Query Available from: https://github.com/breecummins/dsgrn_net_query (2021)
  16. Orlando, D. A., et al. Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature. 453 (7197), 944-947 (2008).
  17. Monteiro, P. T., et al. YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Research. 48 (1), 642-649 (2020).
  18. de Bruin, R. A. M., et al. Constraining G1-specific transcription to late G1 phase: The MBF-associated corepressor Nrm1 acts via negative feedback. Molecular Cell. 23 (4), 483-496 (2006).
  19. Horak, C. E., et al. Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. Genes & Development. 16 (23), 3017-3033 (2002).
  20. Cherry, J. M., et al. Saccharomyces genome database: The genomics resource of budding yeast. Nucleic Acids Research. 40, 700-705 (2012).
  21. Zhu, G., et al. Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature. 406 (6791), 90-94 (2000).
  22. Loy, C. J., Lydall, D., Surana, U. NDD1, a high-dosage suppressor of cdc28-1N, is essential for expression of a subset of late-S-phase-specific genes in saccharomyces cerevisiae. Molecular and Cellular Biology. 19 (5), 3312-3327 (1999).
  23. Cho, C. Y., Kelliher, C. M., Hasse, S. B. The cell-cycle transcriptional network generates and transmits a pulse of transcription once each cell cycle. Cell Cycle. 18 (4), 363-378 (2019).
  24. Smith, L. M., et al. An intrinsic oscillator drives the blood stage cycle of the malaria parasite Plasmodium falciparum. Science. 368 (6492), 754-759 (2020).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Inherent Dynamics VisualizerInteractive ApplicationGene Regulatory NetworkInference PipelineFunctional Network ModelsParameter ChoicesNode FindingConfiguration FileTime Series FileAnnotation FileResults FolderDLxJTK ArgumentsGene List FileIDP ConfigurationJTK CYCLEWeb Browser InteractionGene List Table

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2025 MyJoVE Corporation. All rights reserved