P02-19

Prediction of antibody non-specificity and identification of antibody candidates using machine learning with NGS data from selection experiments

Kenta SUMITOMO *1, Ryo MATSUNAGA1, 2, Takanori YOKOO1, Seisho KINOSHITA2, Makoto NAKAKIDO1, 2, Yamaguchi KIYOSHI3, Eigo SHIMIZU3, Seiya IMOTO3, Yoichi FURUKAWA3, Kouhei TSUMOTO1, 2, 3

1Dept. of Chem. Biotech., Sch. of Eng, The university of Tokyo
2Dept. of Bioeng., Sch. of Eng, The university of Tokyo
3IMSUT, The university of Tokyo


[Purpose]
Biopanning using phage display is an effective method for isolating high-affinity antibodies from a library. However, a significant challenge is that selected antibodies do not always exhibit the desired function and may bind non-specifically to off-target proteins. Our objective is to establish a method for selecting a diverse set of antibodies with distinct binding modes and kinetic parameters by analyzing the phage display screening process. To address this, our study focuses on identifying promising antibody candidates from the early rounds of biopanning.
[Methods]
In our laboratory, phage display experiments targeting multiple antigens were conducted using a synthetic VHH library. Using next-generation sequencing (NGS), we obtained extensive sequence data. From this data, we defined antibody sequences common across different antigen experiments as non-specific and developed a "Non-Specific Score" to quantify this property.
Next, using a pre-trained protein language model, we built a model to predict non-specific sequences within the library. After filtering these out, we created an early-round dataset populated with potentially antigen-specific antibody sequences. We then trained a machine learning model on the frequently observed sequences in the dataset. This model, combined with clustering, was then applied to the entire early-round dataset to identify potential antibody candidates.
[Results and Discussion]
Our model successfully predicted the non-specificity of sequences, even those not identified by NGS analysis alone, which enabled a library-scale assessment of this property. By leveraging this predictive power to filter out non-specific binders, our approach led to the identification of a diverse set of promising antibody candidates, including those difficult to discover using conventional enrichment-based methods. Interestingly, many of these candidates identified from the early rounds were not significantly enriched in subsequent panning rounds. Despite this lack of enrichment, they still demonstrated binding to the target antigen.
[Conclusions]
Our findings suggest that the early rounds of biopanning contain a diverse pool of valuable antibodies. These antibodies can interact effectively with the target antigen, even if they are not enriched during the conventional selection process. This highlights the potential of our computational approach to rescue promising antibody candidates that would otherwise be overlooked.