The significance of this experimental manuscript lies in the applications of near-infrared spectroscopy and the data mining algorithms to the online monitoring of actual process industry. The biggest advantages of this technology lie in its rapidity. The nondestructive property of its near-infrared detection technology and the practice of partial least squares, or PLS, algorithm.
To begin, connect the spectrometer as described in the text protocol. To set up measurement parameters, use the OPUS software. Navigate to the Measure menu, and select the Advanced Measurement command.
On the dialog that opens, define the measurement parameters on the different tabs. To store the experiment file, click the Advanced tab. Define the resolution as four inverse centimeters.
Define the number of scans as 16 scans in the sample background scan time entry fields. Define the path to automatically store the measuring data from 4, 000 inverse centimeters to 12, 500 inverse centimeters. Determine the data type for the result spectrum as absorbance, and save the experiment file.
Now click the Optic tab. Click the aperture setting dropdown list, and select the same value used to acquire a sample spectrum. Then click the Basic tab.
Click the Background Single Channel button, and place the sample into the optical path of the spectrometer to measure the sample spectrum. Define the sample description and sample form in the particular entry field. This information is stored together with the spectrum.
Now click the Sample Single Channel button to start the online measurement. Save the NIR spectrum of each scan as an OPUS file. Use the software OPUS to read the original spectral set.
On the File menu, click the Load File command. On the dialog that opens, select the particular spectrum file. Click the Open button.
The spectrum is displayed in the spectrum window. With the spectral pre-processing function in, obtain spectral data set pre-processed with the first-order derivative. First open the Unscrambler, which is a multivariate data analysis and experimental design software.
Then select the Import command under File. Import the OPUS file as an original NIR spectral dataset. Select the Transform command under Modify.
Then select the Savitzky-Golay derivatives under Derivatives. Define the samples and variables as all samples and all variables in Scope. Also define the number of smoothing points as 13 and the derivative as first derivative in Parameters.
Click OK to start the derivative. Perform vector normalization on the sample spectra to normalize the value of the absorbance. Select the normalization command under Modify.
Define the samples and variables as all samples and all variables in Scope. Select vector normalization in the type. Click OK to perform vector normalization.
To select the appropriate number of principal components, open MATLAB and import the MAT file containing the preprocessed near-infrared spectral data by dragging the MAT file to the work space. Open the programmed M-file in the editor. Click Open under the Editor option, select the compiled M-file in the file storage directory and then click Confirm.
Work in MATLAB to extract 15 principal components according to the optimization objective and the OLSR model between the extracted principal components and the predicted values of the O-cresol concentration. Determine the R-squared values and the trend with increasing number of principal components. Select 10 as the appropriate number of principal components with the R-squared value of 0.9917.
To validate the goodness of fit and accuracy of the PLSR model, repeat the modeling process with 10 principal components. Evaluate the model based on a 10-fold cross validation using the plots of the percent variance explained in the NIR spectral data, the residuals, and the mean square prediction error of cross validation or MSPECV. Plotted here are the residuals, which refer to the difference between the O-cresol content reference value and the PLSR model estimate.
The plotted data shows that PLSR for the measurement of the O-cresol content based on the NIR spectral data has high accuracy. The cross validation mean square error is a measure of the degree of difference between the reference and the predicted O-cresol content. The smaller the value, the better the accuracy of the predictive model describing O-cresol content.
Mean square prediction error of cross validation for the O-cresol concentration measurement based on the PLSR decreases as the number of principal components increases. The error reaches an acceptable minimum at 10 principal components. This proves that the PLSR results in high stability for the measurement of O-cresol concentration using NIRS.
While attempting this procedure, the most important step is to accurately obtain the reference values of compositions because this is the basis for all the preprocessing and the modeling performed in the later stage. In addition to PLS, some current popular machine learning algorithms such as deep learning and decision tree can be used in this procedure, too. Combined with NIRS detection technology, we believe that the proposed data mining is a meaningful template for application in the process of industrial automation to intelligent transformation in modern industry.