Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Alexander  R. Ochs; Mehrsa Mehrabi; Danielle Becker; Mira  N. Asad; Jing Zhao; Michael  V. Zaragoza; Anna Grosberg

doi:10.3791/60038

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

Many researchers generate "medium-sized", low-velocity, and multi-dimensional data, which can be managed more efficiently with databases rather than spreadsheets. Here we provide a conceptual overview of databases including visualizing multi-dimensional data, linking tables in relational database structures, mapping semi-automated data pipelines, and using the database to elucidate data meaning.

Abstract

Science relies on increasingly complex data sets for progress, but common data management methods such as spreadsheet programs are inadequate for the growing scale and complexity of this information. While database management systems have the potential to rectify these issues, they are not commonly utilized outside of business and informatics fields. Yet, many research labs already generate "medium sized", low velocity, multi-dimensional data that could greatly benefit from implementing similar systems. In this article, we provide a conceptual overview explaining how databases function and the advantages they provide in tissue engineering applications. Structural fibroblast data from individuals with a lamin A/C mutation was used to illustrate examples within a specific experimental context. Examples include visualizing multidimensional data, linking tables in a relational database structure, mapping a semi-automated data pipeline to convert raw data into structured formats, and explaining the underlying syntax of a query. Outcomes from analyzing the data were used to create plots of various arrangements and significance was demonstrated in cell organization in aligned environments between the positive control of Hutchinson-Gilford progeria, a well-known laminopathy, and all other experimental groups. In comparison to spreadsheets, database methods were enormously time efficient, simple to use once set up, allowed for immediate access of original file locations, and increased data rigor. In response to the National Institutes of Health (NIH) emphasis on experimental rigor, it is likely that many scientific fields will eventually adopt databases as common practice due to their strong capability to effectively organize complex data.

Introduction

In an era where scientific progress is heavily driven by technology, handling large amounts of data has become an integral facet of research across all disciplines. The emergence of new fields such as computational biology and genomics underscores how critical the proactive utilization of technology has become. These trends are certain to continue due to Moore's law and steady progress gained from technological advances¹^,². One consequence, however, is the rising quantities of generated data that exceed the capabilities of previously viable organization methods. Although most academic laboratories have sufficient computational resources for handling complex data sets, many groups lack the technical expertise necessary to construct custom systems suited for developing needs³. Having the skills to manage and update such data sets remains critical for efficient workflow and output. Bridging the gap between data and expertise is important for efficiently handling, re-updating, and analyzing a broad spectrum of multifaceted data.

Scalability is an essential consideration when handling large data sets. Big data, for instance, is a flourishing area of research that involves revealing new insights from processing data characterized by huge volumes, large heterogeneity, and high rates of generation, such as audio and video⁴^,⁵. Using automated methods of organization and analysis is mandatory for this field to appropriately handle torrents of data. Many technical terms used in big data are not clearly defined, however, and can be confusing; for instance, "high velocity" data is often associated with millions of new entries per day whereas "low velocity" data might only be hundreds of entries per day, such as in an academic lab setting. Although there are many exciting findings yet to be discovered using big data, most academic labs do not require the scope, power, and complexity of such methods for addressing their own scientific questions⁵. While it is undoubtable that scientific data grows increasingly complex with time⁶, many scientists continue to use methods of organization that no longer meet their expanding data needs. For example, convenient spreadsheet programs are frequently used to organize scientific data, but at the cost of being unscalable, error prone, and time inefficient in the long run⁷^,⁸. Conversely, databases are an effective solution to the problem as they are scalable, relatively cheap, and easy to use in handling varied data sets of ongoing projects.

Immediate concerns that arise when considering schemas of data organization are cost, accessibility, and time investment for training and usage. Frequently used in business settings, database programs are more economical, being either relatively inexpensive or free, than the funding required to support use of big data systems. In fact, a variety of both commercially available and open source software exists for creating and maintaining databases, such as Oracle Database, MySQL, and Microsoft (MS) Access⁹. Many researchers would also be encouraged to learn that several MS Office academic packages come with MS Access included, further minimizing cost considerations. Furthermore, nearly all developers provide extensive documentation online and there is a plethora of free online resources such as Codecademy, W3Schools, and SQLBolt to help researchers understand and utilize structured query language (SQL)¹⁰^,¹¹^,¹². Like any programming language, learning how to use databases and code using SQL takes time to master, but with the ample resources available the process is straightforward and well worth the effort invested.

Databases can be powerful tools for increasing data accessibility and ease of aggregation, but it is important to discern which data would most benefit from a greater control of organization. Multi-dimensionality refers to the number of conditions that a measurement can be grouped against, and databases are most powerful when managing many different conditions¹³. Conversely, information with low dimensionality is simplest to handle using a spreadsheet program; for example, a data set containing years and a value for each year has only one possible grouping (measurements against years). High dimensional data such as from clinical settings would require a large degree of manual organization in order to effectively maintain, a tedious and error-prone process beyond the scope of spreadsheet programs¹³. Non-relational (NoSQL) databases also fulfill a variety of roles, primarily in applications where data does not organize well into rows and columns¹⁴. In addition to being frequently open source, these organizational schemas include graphical associations, time series data, or document-based data. NoSQL excels at scalability better than SQL, but cannot create complex queries, so relational databases are better in situations that require consistency, standardization, and infrequent large-scale data changes¹⁵. Databases are best at effectively grouping and re-updating data into the large array of conformations often needed in scientific settings¹³^,¹⁶.

The main intent of this work, therefore, is to inform the scientific community about the potential of databases as scalable data management systems for "medium sized", low velocity data as well as to provide a general template using specific examples of patient sourced cell-line experiments. Other similar applications include geospatial data of river beds, questionnaires from longitudinal clinical studies, and microbial growth conditions in growth media¹⁷^,¹⁸^,¹⁹. This work highlights common considerations for and utility of constructing a database coupled with a data-pipeline necessary to convert raw data into structured formats. The basics of database interfaces and coding for databases in SQL are provided and illustrated with examples to allow others to gain the knowledge applicable to building basic frameworks. Finally, a sample experimental data set demonstrates how easily and effectively databases can be designed to aggregate multifaceted data in a variety of ways. This information provides context, commentary, and templates for assisting fellow scientists on the path towards implementing databases for their own experimental needs.

For the purposes of creating a scalable database in a research laboratory setting, data from experiments using human fibroblast cells was collected over the past three years. The primary focus of this protocol is to report on the organization of computer software to enable the user to aggregate, update, and manage data in the most cost- and time-efficient manner possible, but the relevant experimental methods are provided as well for context.

Experimental setup
The experimental protocol for preparing samples has been described previously²⁰^,²¹, and is presented briefly here. Constructs were prepared by spin-coating rectangular glass coverslips with a 10:1 mixture of polydimethylsiloxane (PDMS) and curing agent, then applying 0.05 mg/mL fibronectin, in either unorganized (isotropic) or 20 µm lines with 5 µm gap micropatterned arrangements (lines). Fibroblast cells were seeded at passage 7 (or passage 16 for positive controls) onto the coverslips at optimal densities and left to grow for 48 h with media being changed after 24 h. The cells were then fixed using 4% paraformaldehyde (PFA) solution and 0.0005% nonionic surfactant, followed by the coverslips being immunostained for cell nuclei (4',6'-diaminodino-2-phenylinodole [DAPI]), actin (Alexa Fluor 488 phalloidin), and fibronectin (polycloncal rabbit anti-human fibronectin). A secondary stain for fibronectin using goat anti-rabbit IgG antibodies (Alexa Fluor 750 goat anti-rabbit) was applied and preservation agent was mounted onto all coverslips to prevent fluorescent fading. Nail polish was used to seal coverslips onto microscope slides then left to dry for 24 h.

Fluorescence images were obtained as described previously²⁰ using a 40x oil immersion objective coupled with a digital charge coupled device (CCD) camera mounted on an inverted motorized microscope. Ten randomly selected fields of view were imaged for each coverslip at 40x magnification, corresponding to a 6.22 pixels/µm resolution. Custom-written codes were used to quantify different variables from the images describing the nuclei, actin filaments, and fibronectin; corresponding values, as well as organization and geometry parameters, were automatically saved in data files.

Cell lines
More extensive documentation on all sample data cell lines can be found in prior publications²⁰. To describe briefly, the data collection was approved and informed consent was performed in accordance with UC Irvine Institutional Review Board (IRB # 2014-1253). Human fibroblast cells were collected from three families of different variations of the lamin A/C (LMNA) gene mutation: heterozygous LMNA splice-site mutation (c.357-2A>G)²² (family A); LMNA nonsense mutation (c.736 C>T, pQ246X) in exon 4²³ (family B); and LMNA missense mutation (c.1003C>T, pR335W) in exon 6²⁴ (family C). Fibroblast cells were also collected from other individuals in each family as related mutation-negative controls, referred to as "Controls", and others were purchased as unrelated mutation-negative controls, referred to as "Donors". As a positive control, fibroblast cells from an individual with Hutchinson-Gliford progeria (HGPS) were purchased and grown from a skin biopsy taken from an 8-year-old female patient with HGPS possessing a LMNA G608G point mutation²⁵. In total, fibroblasts from 22 individuals were tested and used as data in this work.

Data types
Fibroblast data fell into one of two categories: cellular nuclei variables (i.e., percentage of dysmorphic nuclei, area of nuclei, nuclei eccentricity)²⁰ or structural variables stemming from the orientational order parameter (OOP)²¹^,²⁶^,²⁷ (i.e., actin OOP, fibronectin OOP, nuclei OOP). This parameter is equal to the maximum eigenvalue of the mean order tensor of all the orientation vectors, and it is defined in detail in previous publications²⁶^,²⁸. These values are aggregated into a variety of possible conformations, such as values against age, gender, disease status, presence of certain symptoms, etc. Examples of how these variables are used can be found in the results section.

Example codes and files
The example codes and other files based on the data above can be downloaded with this paper, and their names and types are summarized in Table 1.

Protocol

NOTE: See Table of Materials for the software versions used in this protocol.

1. Evaluate if the data would benefit from a database organization scheme

Download the example codes and databases (see Supplemental Coding Files, which are summarized in Table 1).
Use Figure 1 to evaluate if the data set of interest is "multi-dimensional".
NOTE: Figure 1 is a graphical representation of a multi-dimensional database provided for the example data set.
If the data can be visualized in a "multi-dimensional" form like the example and if the ability to relate a specific experimental outcome to any of the dimensions (i.e., conditions) would allow for greater scientific insight into the available data, proceed to construct a relational database.

2. Organize the database structure

NOTE: Relational databases store information in the form of tables. Tables are organized in schema of rows and columns, similar to spreadsheets, and can be used to link identifying information within the database.

Organize the data files, so they have well thought out unique names. Good practice with file naming conventions and folder-subfolder structures, when done well, allow for broad database scalability without compromising the readability of accessing files manually. Add date files in a consistent format, such as "20XX-YY-ZZ", and name subfolders according to metadata is one such example.
As the data-base structure is designed, draw relationships between the fields in different tables. Thus, multi-dimensionality is handled by relating different fields (i.e., columns in the tables) in individual tables to each other.
Create readme documentation that describes the database and relationships that were created in step 2.2. Once an entry between different tables is linked, all associated information is related to that entry and can be used to call complex queries to filter down to the desired information.
NOTE: Readme documents are a common solution for providing supplemental information and database structural information about a project without adding non-uniform data to the structure.
Following steps 2.1−2.3, make the end result similar to this example where the differing characteristics of individuals (Figure 2A) are related to associated experimental data of those individuals (Figure 2B). The same was done through relating columns of pattern types (Figure 2C) and data types (Figure 2D) to matching entries in the main data values table to explain various shorthand notations (Figure 2B).
Determine all the essential and merely helpful data points that need to be recorded for long range data collection.
NOTE: A key advantage of using databases over spreadsheet programs, as mentioned earlier, is scalability: additional data points can be trivially added at any point and calculations, such as averages, are instantly updated to reflect newly added data points.
1. Identify the necessary information for creating distinct data points prior to beginning. Leave raw data untouched, instead of modifying or saving over it, so that reanalysis is possible and accessible.
  NOTE: For the given example (Figure 2), the "Designator" corresponding to an individual, "Pattern type", "Coverslip #", and "Variable type" were all vital fields for distinctness of the associated value.
2. If desired, add other helpful, non-vital information such as the "Total # of Coverslips" to indicate the number of repetitions conducted and help determine if data points are missing in this example.

3. Set up and organize the pipeline

Identify all the various experiments and data analysis methods that might lead to data collection along with the normal data storage practices for each data type. Work with open source version control software such as GitHub to ensure necessary consistency and version control while minimizing user burden.
If possible, create procedure for consistent naming and storing of data to allow for an automated pipeline.
NOTE: In the example, outputs were all consistently named, thus creating a data-pipeline that looked for specific attributes was straightforward once the files were selected. If consistent naming is not possible, the tables in the database will need to be populated manually, which is not recommended.
Use any convenient programming language to generate new data entries for the database.
1. Create small "helper" tables (files #8−#10 in Table 1) in separate files that can guide automated selection of data. These files serve as a template of possibilities for the pipeline to operate under and are easy to edit.
2. To generate new data entries for the data-pipeline (Figure 3D), program the code (LocationPointer.m, file #1 in Table 1) to use the helper tables as inputs to be selected by the user (files #8−#10 in Table 1).
3. From here, assemble a new spreadsheet of file locations by combining the new entries with the previous entries (Figure 3E). Create a code to automate this step as shown in LocationPointerCompile.m (file #2 in Table 1).
4. Afterwards, check this merged spreadsheet for duplicates, which should be automatically removed. Create a code to automate this step as shown in LocationPointer_Remove_Duplicates.m (file #3 in Table 1).
5. Additionally, check the spreadsheet for errors, and notify the user of their reason and location (Figure 3F). Create a code to automate this step as shown in BadPointerCheck.m (file #4 in Table 1). Alternatively, write a code that will check the compiled database and identify duplicates in one step as shown in LocationPointer_Check.m (file #5 in Table 1).
6. Create a code to let the user manually remove bad points without losing the integrity of the database as shown in Manual_Pointer_Removal.m (file #6 in Table 1).
7. Then use the file locations to generate a data value spreadsheet (Figure 3G, file #12 in Table 1) as well as to create a most updated list of entries that can be accessed to identify file locations or merged with future entries (Figure 3H). Create a code to automate this step as shown in Database_Generate.m (file #7 in Table 1).
Double check that the pipeline adds to the experimental rigor by checking for inclusion of rigorous naming conventions, automated file assembly codes, and automated error checks as previously described.

4. Create the database and queries

NOTE: If tables store information in databases, then queries are requests to the database for information given specific criteria. There are two methods to create the database: starting from a blank document or starting from the existing files. Figure 4 shows a sample query using SQL syntax that is designed to run using the database relationships shown in Figure 2.

Method 1: Starting from scratch in creating the database and queries
1. Create a blank database document.
2. Load the helper tables (files #8−#10 in Table 1) by selecting External Data | Text File Import | Choose File (files #8−#10) | Delimited | First Row Contains Headers, Comma | leave default | Choose My Own Primary Key (Designator for Cell Lines File #8, Variable Name for Data Types File #9, Pat Name for Pattern Type File #10) | leave default | Finish.
3. Load the Data value table (file #12 in Table 1) by selecting External Data | Text File Import | Choose File (file #12) | Delimited | First Row Contains Headers, Comma | leave default | Let Access Add primary key | Import to Table: DataValues | Finish.
4. Create the relationships by selecting Database Tools | Relationships | Drag all Tables to the board | Edit Relationships | Create New | Match the DataValue fields with Helper Tables Designators | Joint Type 3.
5. Select Create | Query Design.
6. Select or drag all relevant tables into the top window. In this example ‘Cell Lines', ‘Data Values', ‘Data Types', and ‘Pattern Type'. The relationships should automatically set up based on the previous Relationship design.
7. Fill out the query columns for desired results, for example:
  1. Click on Show | Totals.
  2. Fill out the first column (Table: DataValues, Field: DataVar, Total: GroupBy, Criteria: "Act_OOP"), the second column (Table: DataValues, Field: PatVar, Total: GroupBy, Criteria: "Lines"), and the third column (Table: Cell_Lines, Field: Designator, Total: GroupBy, Sort: Ascending).
  3. Fill out the fourth column (Table: DataValues, Field: Parameter, Total: Ave), the fifth column (Table: DataValues, Field: Parameter, Total: StDev), and the sixth column (Table: DataValues, Field: Parameter, Total: Count).
8. Run the query.
Alternatively, use the provided example database as a basis for examples. Open the database file Database_Queries.accdb (file #13 in Table 1) that was downloaded earlier. Use it as a template by replacing existing tables with the data of interest.

5. Move the output tables to a statistical software for significance analysis

For this sample experimental data, use the one-way analysis of variance (ANOVA) using Tukey's test for mean comparisons between various conditions.
NOTE: Values of p < 0.05 were considered statistically significant.

Results

Multi-dimensionality of the data
In the context of the example data-set presented here, the subjects, described in the Methods section, were divided into groups of individuals from the three families with the heart disease-causing LMNA mutation ("Patients"), related non-mutation negative controls ("Controls"), unrelated non-mutation negative controls ("Donors"), and an individual with Hutchinson-Gilford progeria syndrome (HGPS) as a positive control

Discussion

Technical discussion of the protocol
The first step when considering the use of databases is to evaluate if the data would benefit from such an organization.

The next essential step is to create an automated code that will ask the minimum input from the user and generate the table data structure. In the example, the user entered the category of data type (cell nuclei or structural measurements), cell lines' subject designator, and number of files being selected. The rele...

Disclosures

The authors have nothing to disclose.

Acknowledgements

This work is supported by the National Heart, Lung, and Blood Institute at the National Institutes of Health, grant number R01 HL129008. The authors especially thank the LMNA gene mutation family members for their participation in the study. We also would like to thank Linda McCarthy for her assistance with cell culture and maintaining the lab spaces, Nasam Chokr for her participation in cell imaging and the nuclei data analysis, and Michael A. Grosberg for his pertinent advice with setting up our initial Microsoft Access database as well as answering other technical questions.

Materials

Name	Company	Catalog Number	Comments
4',6'-diaminodino-2-phenylinodole (DAPI)	Life Technologies, Carlsbad, CA
Alexa Fluor 488 Phalloidin	Life Technologies, Carlsbad, CA
Alexa Fluor 750 goat anti-rabbit	Life Technologies, Carlsbad, CA
digital CCD camera ORCAR2 C10600-10B	Hamamatsu Photonics, Shizuoka Prefecture, Japan
fibronectin	Corning, Corning, NY
IX-83 inverted motorized microscope	Olympus America, Center Valley, PA
Matlab R2018b	Mathworks, Natick, MA
MS Access	Microsoft, Redmond, WA
paraformaldehyde (PFA)	Fisher Scientific Company, Hanover Park, IL
polycloncal rabbit anti-human fibronectin	Sigma Aldrich Inc., Saint Louis, MO
polydimethylsiloxane (PDMS)	Ellsworth Adhesives, Germantown, WI
Prolong Gold Antifade	Life Technologies, Carlsbad, CA
rectangular glass coverslips	Fisher Scientific Company, Hanover Park, IL
Triton-X	Sigma Aldrich Inc., Saint Louis, MO

References

Cavin, R. K., Lugli, P., Zhirnov, V. V. Science and engineering beyond Moore's law. Proceedings of the IEEE. 100, 1720-1749 (2012).
Mast, F. D., Ratushny, A. V., Aitchison, J. D. Systems cell biology. The Journal of Cell Biology. 206 (6), 695-706 (2014).
Barone, L., Williams, J., Micklos, D. Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators. PLoS Computational Biology. 13 (10), 1005755 (2017).
Gandomi, A., Haider, M. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management. 35 (2), 137-144 (2015).
Siddiqa, A., et al. A survey of big data management: Taxonomy and state-of-the-art. Journal of Network and Computer Applications. 71, 151-166 (2016).
Anderson, C. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine. , (2008).
Broman, K. W., Woo, K. H. Data Organization in Spreadsheets. The American Statistician. 72 (1), 2-10 (2018).
Lee, H., et al. How I do it: a practical database management system to assist clinical research teams with data collection, organization, and reporting. Academic Radiology. 22 (4), 527-533 (2015).
Bassil, Y. A comparative study on the performance of the Top DBMS systems. Journal of Computer Science & Research. 1 (1), 20-31 (2012).
. Learn SQL - Codeacademy Available from: https://www.codecademy.com/learn/learn-sql (2018)
. SQL Tutorial - w3schools.com Available from: https://www.w3schools.com/sql (2018)
. Introduction to SQL - SQLBolt Available from: https://sqlbolt.com (2018)
Pedersen, T. B., Jensen, C. S. Multidimensional database technology. Computer. 34 (12), 40-46 (2001).
Győrödi, C., Gyorodi, R., Sotoc, R. A Comparative Study of Relational and Non-Relational Database Models in a Web- Based Application. International Journal of Advanced Computer Science and Applications. 6 (11), 78-83 (2015).
Nayak, A., Poriya, A., Poojary, D. Type of NOSQL databases and its comparison with relational databases. International Journal of Applied Information Systems. 5 (4), 16-19 (2013).
Lei, C., Feng, D., Wei, C., Ai-xin, Z., Zhen-hu, C. The application of multidimensional data analysis in the EIA database of electric industry. Procedia Environmental Sciences. 10, 1210-1215 (2011).
Soranno, P. A., et al. Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse. GigaScience. 4, 28 (2015).
Edwards, P. Questionnaires in clinical trials: guidelines for optimal design and administration. Trials. 11, 2 (2010).
Richards, M. A., et al. MediaDB: A Database of Microbial Growth Conditions in Defined Media. PLoS ONE. 9 (8), 103548 (2014).
Core, J. Q., et al. Age of heart disease presentation and dysmorphic nuclei in patients with LMNA mutations. PLoS ONE. 12 (11), 0188256 (2017).
Drew, N. K., Johnsen, N. E., Core, J. Q., Grosberg, A. Multiscale Characterization of Engineered Cardiac Tissue Architecture. Journal of Biomechanical Engineering. 138 (11), 111003 (2016).
Zaragoza, M. V., et al. Exome Sequencing Identifies a Novel LMNA Splice-Site Mutation and Multigenic Heterozygosity of Potential Modifiers in a Family with Sick Sinus Syndrome, Dilated Cardiomyopathy, and Sudden Cardiac Death. PLoS ONE. 11 (5), 0155421 (2016).
Zaragoza, M., Nguyen, C., Widyastuti, H., McCarthy, L., Grosberg, A. Dupuytren's and Ledderhose Diseases in a Family with LMNA-Related Cardiomyopathy and a Novel Variant in the ASTE1 Gene. Cells. 6 (4), 40 (2017).
Zaragoza, M. V., Hakim, S. A., Hoang, V., Elliott, A. M. Heart-hand syndrome IV: a second family with LMNA-related cardiomyopathy and brachydactyly. Clinical Genetics. 91 (3), 499-500 (2017).
Eriksson, M., et al. Recurrent de novo point mutations in lamin A cause Hutchinson-Gilford progeria syndrome. Nature. 423 (6937), 293-298 (2003).
Drew, N. K., Eagleson, M. A., Baldo, D. B., Parker, K. K., Grosberg, A. Metrics for Assessing Cytoskeletal Orientational Correlations and Consistency. PLoS Computational Biology. 11 (4), 1004190 (2015).
Hamley, I. W. . Introduction to Soft Matter: Synthetic and Biological Self-Assembling Materials. , (2013).
Grosberg, A., Alford, P. W., McCain, M. L., Parker, K. K. Ensembles of engineered cardiac tissues for physiological and pharmacological study: Heart on a chip. Lab Chip. 11 (24), 4165-4173 (2011).
Hey, T., Trefethen, A., Berman, F., Fox, G., Hey, A. J. G. The Data Deluge: An e-Science Perspective. Grid Computing: Making the Global Infrastructure a Reality. , (2003).
Wardle, M., Sadler, M. How to set up a clinical database. Practical Neurology. 16 (1), 70-74 (2016).
Kerr, W. T., Lau, E. P., Owens, G. E., Trefler, A. The future of medical diagnostics: large digitized databases. The Yale Journal of Biology and Medicine. 85 (3), 363 (2012).
Laulederkind, S. J., et al. The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data. Database. 2011, (2011).
Harris, P. A., et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics. 42 (2), 377-381 (2009).
Panko, R. R. What we know about spreadsheet errors. Journal of Organizational and End User Computing (JOEUC). 10 (2), 15-21 (1998).
Ziemann, M., Eren, Y., El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biology. 17 (1), 177 (2016).
Enhancing Reproducibility through Rigor and Transparency. NIH Available from: https://grants.nih.gov/reproducibility/index.htm (2018)
Hofseth, L. J. Getting rigorous with scientific rigor. Carcinogenesis. 39 (1), 21-25 (2017).
. SQL Training and Tutorials - Lynda.com Available from: https://www.lynda.com/SQL-training-tutorials/446-0.html (2018)

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Databases Tissue Engineering Multidimensional Data Data Organization Scientific Data Management Relational Databases Data Visualization Experimental Outcomes Data Analysis Methods Database Scalability Metadata File naming Conventions Query Filtering Data Structures

This article has been published

Video Coming Soon

Keep me updated: