This measure can help answer key questions in the biomedical detection field about generating multiple solutions. The main advantage of this technique is that it provides a user-friendly graphical user interface for assisting biomedical researchers in the detection of multiple feature subsides. Begin by loading the data Matrix and class labels into the software.
Click load data matrix to select the user specified data metrics file and load class labels to select the corresponding class label file. To determine the class labels in the number of top-ranked features, select the names of the positive and negative classes in the appropriate drop down boxes and select 10 as the number of top-ranked features in the top X drop down box for a comprehensive screen of the feature subset. To tune the system parameters for different performances, select the performance measurement accuracy as the accuracy balanced accuracy drop down box for the selected extreme learning machine classifier.
Then, select a cut-off value of 0.7 for the specified performance measurement in the performance cutoff input box. To run the pipeline, click analyze and select 0.7 as the default value of the performance measurement cut off. And, 10 as the default number of the best feature subsets.
Then collect and interpret the features detected by the software. To generate a 3D scatter plot of the top 10 features of the subsets with the best classification performances detected by the software, click analyze and sort the three features in a feature subset in ascending order of their ranks, using the ranks of the three features as the F1, F2, and F3 axes. Change the performance cut off value to 0.7 and click analyze to generate a 3D scatter plot of the feature subsets with a greater than or equal to performance cut off performance measurement value.
Then click 3D tuning to open a new window for manual tuning of the viewing angles of the 3D scatter plot and reduce to reduce the redundancy of the detected feature subsets. To annotate a gene in both the DNA and protein sequence levels, open the David database web page and click on the gene ID conversion link to input the feature IDs of the first biomarker subset of the prepared data set. Click the gene list link and click submit list to retrieve the annotations of interest, and show Gene list to obtain the list of Gene symbols.
Next, open the GeneCards database web page and enter the name of the gene of interest into the database query input box to find the annotations of this gene. Open the Online Mendelian Inheritance in Man database and search for the gene to find the annotations of this gene from the database. To annotate the encoded proteins, open the UniProt knowledge base database page and search for the annotations of the gene from this database.
Open the group based prediction system, or GPS web server, and retrieve the protein sequence encoded by the biomarker gene from the UniProt knowledge base database and use the online GPS tool to predict the proteins post-transitional modification residues. To annotate the protein-protein interactions and there enriched functional modules, open the string web server page and use the string database to search the lift for the genes of interest to find their orchestrated properties. To export the detected biomarker subsets for further analysis, click export the table and select the appropriate text format for saving the files.
Then, export the visualization plots as individual image files, clicking save under each plot and selecting the appropriate image format for saving each file. In this representative experiment, two data sets were formatted as CSV files and loaded into the software as demonstrated. In the first data set, 128 samples with 12, 625 features and individual class labels were loaded with the final data Matrix containing 95 negative samples and 33 positive samples.
Similar operations were also conducted for the second difficult data set. Searching for a user-specific keyword in the feature names reveals a histogram of the features for each data set. After executing the pipeline algorithm for each data set, 120 qualified biomarker subsets were detected for the easy to discriminate data set, with 57 triplet biomarker subsets demonstrating a 100%accuracy.
Only 76 biomarker subsets where detected for the difficult data set, however. And, with a lower biomarkers subset accuracy suggesting that biomarkers are phenotype specific, another major challenge in biomarker detection. While using this procedure, it's important to remember that a future selection problem has multiple solutions.
Read the SIM best of performance. After its development, this technique paved the way for biomedical researchers to explore biomedical detection with multiple solutions.