Leveraging Machine Learning and Protein Language Models for Antibody Developability Triage

Monoclonal antibodies (mAbs) have emerged as a successful biologic drug class, with over 130 regulatory-approved mAbs by early 2025. Despite their potential, the development of new mAbs faces high failure rates in clinical trials, necessitating more efficient screening methods. Recent advancements in single-cell sequencing have enabled the generation of vast paired antibody sequence datasets, leading to the development of screening statistics to predict antibody developability. Physicochemical properties play a crucial role in antibody developability, with features like surface hydrophobic patches and stability influencing production scalability and clinical success.

To address the challenges in antibody selection, a novel pipeline integrating machine learning and protein language models has been proposed. This pipeline aims to identify antibodies with developable traits by encoding sequences using protein language models and comparing them with known clinical mAbs. By leveraging unsupervised learning techniques, the pipeline clusters library antibodies with properties similar to clinical mAbs, enabling the selection of candidates with promising developability characteristics. Additionally, supervised machine learning models are employed to distinguish between approved and discontinued clinical antibodies, offering insights into the likelihood of clinical success.

The pipeline’s effectiveness was demonstrated using a test dataset of B-cell receptor sequences, showcasing its ability to identify antibodies with properties akin to therapeutic mAbs. By incorporating physicochemical filtering, unsupervised clustering, and supervised classification, the pipeline streamlines the selection of potential therapeutic candidates from large antibody libraries. The adjustable stringency levels at each triaging step allow for tailored antibody screening based on specific criteria, enhancing the efficiency and accuracy of candidate selection.

Furthermore, the pipeline’s ability to improve the minimum and mean TAP scores of selected antibodies suggests its potential in identifying candidates with enhanced developability and clinical success probabilities. The integration of machine learning algorithms and protein language models offers a data-driven approach to antibody triaging, reducing the risk of late-stage failures in drug development. The pipeline’s versatility enables its application across various antibody libraries, presenting a valuable tool for accelerating monoclonal antibody discovery and development processes.

Tags: monoclonal antibodies, phage display, clinical trials, bioinformatics, chromatography, hydrophobic interaction, regulatory, validation