To begin, initiate a new Jupyter Notebook session by opening a new terminal window and typing Jupyter Notebook. Then press Enter. On the Jupyter Notebook homepage, select the notebook titled M01 expression data pre-processing.
ipynb to open it in a new browser tab. This notebook will normalize and scale the input data, handle missing data, and remove outliers. In the second cell of the notebook, replace the placeholder your_dataset_name.
csv with the actual name of the dataset file. In the last cell of the notebook, replace M01_output_data. csv with the preferred name for the output data file.
For each data type, such as proteomics, metabolomics, continuous clinical data, and binary clinical data, use the command in the fourth cell to determine the indices corresponding to the first and last columns. Check the column names to locate the columns corresponding to the proteomics data, metabolomics data, and clinical data. Specify the column positions for different data types in the fifth cell by replacing col_start and col_end with the first and last column indices for each data type.
Select Cell, then Run All from the menu bar in Jupyter to create the output data file in the specified folder.