This article provides a workflow for scientists to construct an experimental design table and to analyze experimental results over a variety of mixture and process factors, without requiring tedious and potentially volatile statistical decisions. The resulting models can be jointly optimized over multiple responses and used to produce informative graphics to summarize both the joint response surface and the predictions of the individual responses. These graphics are easier to interpret than the parameter estimates from the underlying statistical models, and are helpful in representing the factor settings that produce the most desirable responses.
Lipid and nanoparticle formulation scientists frequently need to construct new recipes for different payloads or when changing lipids or process settings. We provide a robust approach to formulation optimization that minimizes the potential for error in design construction, and avoids the need for extensive statistical and knowledge during analysis. Summarize the purpose of the experiment in a date stamped document.
List the responses that will be measured during the experiment. Select the factors that will be varied and those that will be held constant during the study. Establish the ranges for the varying factors and the relevant decimal precision for each.
Decide the study design size using the minimum and maximum heuristics. Open jump and navigate the menu bar to DOI, Special Purpose, Space Filling Design. Enter the study responses.
Enter the study factors and the ranges. Input the predetermined number of runs for the design. Generate the space filling design table for the chosen factors and run size.
Add a notes column to the table for annotating any manually created runs. If applicable, manually incorporate benchmark control runs into the design table. Include a replicate for one of the controlled benchmarks.
Mark the benchmark name in the notes column and color code the benchmark replicate rows for easy correct identification. Round the mixture of factor levels to the appropriate granularity. Copy the rounded values and paste them into the original columns.
Delete the redundant copies of the rounded columns. After rounding the lipid factors, verify their sum equals 100%If any rows sum does not equal one manually adjust one of the mixture factors, ensuring it stays within the factor range. Delete the sum column after adjustments are done.
Follow the same procedure used for rounding the mixture factors to round the process factors to their respective granularity. Format the lipid columns to appear as percentages with a desired number of decimals. If you have added manual runs such as benchmarks, rerandomize the table row order, add a new column with rounded values.
Sort this column in ascending order by right clicking on its column header, and then delete the column. Generate ternary plots to visualize the design points over the lipid factors. Also examine the run distribution over the process factors.
The formulation scientists should confirm feasibility of all the runs. If infeasible runs exist, restart the design considering the newly discovered constraints. Run the experiment in the order provided by the design table.
Record the readouts in the column built into the experimental table. Plot the readings and examine the distributions of the responses. Examine the relative distance between the color coded replicate runs if one was included, this allows for the understanding of the total process and analytic variation at the benchmark, compared to the variability due to the changes in the factor settings across the entire factor space.
Craft the runs on a ternary plots. Color the points according to the responses to get a model independent view of the behavior over the mixture factors. Right click on any of the resulting graphs, select Row Legend and then select the response column.
Repeat this for each response. Build an independent model for each response as a function of the study factors. Delete the model scripts that was created by the space filling design.
Select Analyze, Fit Model. Construct a full model comprising all candidate effects. This model should include the main effects of each factor, two and three-way interactions, quadratic and partial cubic terms in the process factors and Scheffe cubic terms for the mixture factors.
Select all of the study factors. Change the entry for degree to three from the default of two. Then select a Factorial to Degree.
Select only the non mixture factors and then select Macros, Partial Cubic. Select only the mixture factors and then select Macros, Scheffe Cubic. Disable the default no intercept option.
Specify the response column, change the Personality to Generalized Regression. Save this model set up to the data table for easy recall. Select Save to Data Table.
Click Run. For estimation method, select SVEM Forward Selection. Expand the Advanced Controls Force Terms menus and uncheck the boxes corresponding to the mixture main effects.
Only the Intercept term will remain checked. Click Go.Plot the actual responses by their predicted responses from the SVEM model to verify a reasonable predictability. Click the red triangle next to SVEM Forward Selection and select Save Columns, Save Prediction Formula.
This creates a new column containing the prediction formula in the data table Repeat the model building steps for each response. After all of the responses have prediction columns saved to the data table, plot the response traces for all of the predicted response columns using a profiler function. Select Graph Profiler and select all of the prediction columns created in the previous step for Y prediction formula, click Okay.
Identify candidate optimal formulations. Set the desirability function for each response whether it should be maximized, minimized or matched to a target. This also entails setting the relative importance weights for each response.
To generate the first candidate, set any primary responses to use importance weight 1.0 and any secondary responses to use importance weight 0.2. Instruct the profiler to find the optimal factor settings that maximize the desirability function. Select Optimization Desirability, Maximize Desirability.
Record the optimal factor settings along with a note about the importance weightings used for each response. For categorical factors such as ionizable lipid type, find the conditionally optimal formulations for each factor level. First, set the desired level of each factor in the profiler.
Then hold the control key and left click inside the graph of that factor and select Lock Factor Setting. This select optimization and desirability, maximize desirability to find the conditional optimum with this factor locked in its current setting. When finished, unlock the factor settings before proceeding.
Repeat the optimization process after modifying the importance weights of the response, perhaps only optimizing the primary responses or setting some of the secondary responses to have more or less importance weight, whereby setting their goal to none. Record the new optimal candidate. Produce graphical summaries of the optimal regions of the factor space.
Create a data table that contains 50, 000 rows that are populated with randomly generated factor settings within the allowed factor space, along with the corresponding predicted values from the reduced models for each of the responses, as well as the joint desirability function. Select Output Random Table. Change the value of how many runs to simulate to 50, 000 and click Okay.
In the newly created table, add a new column that calculates the percentile of the desirability column. Use this percentile column in the ternary plots instead of the raw desirability column. Right click the Desirability column header and select New Formula Column, Distributional, Cumulative Probability.
Generate the following graphics. Repeatedly alter the color scheme of the graphics in order to display the predictions for each response, and for the cumulative probability column. Construct ternary plots for the four lipid factors.
In the table, navigate to graph ternary plot. Select the mixture factors for X plotting and click Okay. Right click in one of the resulting graphs, select Row Legend and then select the predicted response column.
Change the colors drop down to Jet. This will show the best and worst performing regions with respect to the lipid factors. The current figure shows the percentiles of the joint desirability when considering maximizing potency with important sequel to 1, and minimizing size with important sequel to the 0.2.
While averaging over any factors that are not shown on the ternary plot axis. Repeatedly change the color scheme of the graphics in order to display the predictions for each response. Similarly, plot the 50, 000 color coded points representing unique formulations against the non-mixture process factors either singular or jointly, and look for relationships between the responses and the factors.
Look for the factor settings that produce the points yield in the highest desirability. This figure shows the joint desirability of all of the formulations that could be formed with each of the three ionized lipid types. The most desirable formulations use H102, with H101 providing some potentially competitive alternatives.
Explore different combinations of factors that might lead to different responses. Save the prediction profiler and its remembered settings back to the data table. Prepare a table listing the optimal candidates identified previously.
Include the benchmark control with the set of candidate runs that'll be formulated and measured. If any of the formulations from the experiment were found to yield desirable results, perhaps by outperforming the benchmark, select the best to add to the candidate table and retest along with new formulations. Right click on the remembered settings table in the profiler and select Make into Data Table.
Carry out the confirmation runs, construct the formulations and gather the readouts. Compare the performance of the candidate optimal formulations. The workflow has been used in many applications.
In most cases, we have observed at least a four to five times improvement in potency when we compare to benchmark formulations that have been set using one factor at a time optimization. Improvements are especially noticeable when secondary responses are jointly targeted. It is also possible to use simulation to show the expected quality of the optimal candidates produced by this procedure.
Using a known data generating function for the example experiment described in the paper, we can compare the quality of the candidate optimal formulations obtained from the space filling designs and SVEM based analysis used in this workflow to those obtained from the traditional mixture analysis techniques. With the quality of optimal formulation shown on the vertical axis and the number of runs in the design shown on the horizontal axis, the blue points represent the performance of the unreduced full statistical model over 150 simulations. The amber points represent the performance of the traditional single shot forward selection based on the AICC objective function.
The green points represent the performance of the SVEM based forward selection approach used in this workflow. The SVEM analysis allows us to obtain better optimal candidates and fewer runs. There will be occasional studies with additional complexity that require the help of a statistician for design and analysis.
Studies that are extremely high priority, where the run size is more limited than usual, or there are large number of categorical factors, or a single categorical factor with a large number of levels may be approached differently by a statistician. Using either optimal or hybrid designs in place of the space filling design specified in the workflow.