Currently the mechanism of secretion for the Salmonella type three secretion system (T3SS) is not understood thus selecting proteins to secrete via the T3SS is mostly just a guess. However, the labor required to verify a protein’s compatibility with the T3SS is labor intensive making incorrect guesses costly. In response, my goal was to construct an empirical, statistically derived model to predict whether or not a protein can be secreted by the engineered T3SS which was developed in part by the Tullman-Ercek lab. As an additional constraint, only variables that are computationally derived were used as input so that a prediction could be made with the amino acid sequence as the sole input. Initial models based on secretion attempts made through the Tullman-Ercek Lab’s history of research on T3SS have yielded encouraging results. Motivated by this success, the training set was expanded to 80 proteins to add stability and better coverage of the protein parameter space. I wrote over 1000 lines of Matlab code for this project. Approaches applied include Partial Least Squares coupled Discriminant Analysis (PLS-DA) and Support Vector Machines (SVM). Modeling was unsuccessful because the training set is still too small for machine learning approaches. Current efforts are focusing on dissemination of T3SS production system so that other users can expand the training set through their endeavors to apply it to various use cases.

Metcalf, K., J. Bevington, S. Rosales, E. Valdivia, and D. Tullman-Ercek. 2016. Proteins adopt functionally active confirmations in the extracellular space after secretion by the type III secretion system. Microbial Cell Factories.