DeNovo Nanobody design with Neural Networks, AlphaFold and Docking Algorithms
Problem I am trying to solve
Covid 19 has already killed 5.6 million lives becoming one of the deadliest pandemics in history. Nano bodies are emerging as an effective solution to counter the rise of viral mutations as they retain the binding affinity of anti bodies at one tenth of the size. But screening an development of therapeutic nano-bodies remains an expensive an d time consuming proces
Goal of my project
To significantly accelerate the screening and development of therapeutic nano-bodies with high specificity and affinity using purely machine learning techniques. In a first attempt to couple ensemble-stacking with nano-body optimization, I aspired to make improvements in predicting optimal nano-body CDR-H3 sequences that result in accelerated drug discovery
To significantly accelerate the screening and development of therapeutic nano-bodies with high specificity and affinity using purely machine learning techniques. In a first attempt to couple ensemble-stacking with nano-body optimization, I aspired to make improvements in predicting optimal nano-body CDR-H3 sequences that result in accelerated drug discovery
Design Criteria
Obtain nano-bodies with a high binding affinity measured by a change in Gibbs free energy (∆G) of at least -10 kcal/mol
Make predictions of binding affinity (enrichment) with an R2 of at least 0.60
Maximum training/prediction time of 96 hours
The cost should not exceed $1,000.
Model Design
Machine Learning Pipeline
Datasets consisted of CDR-H3 sequences (most critical region of nano-bodies)
Acquired from a previous phage display experiment against antigen target ranibizumab
Ensemble Architecture
Regression Ensemble for prediction of exact enrichment values
Classification Ensemble for prediction of strong and weak binding
6 different neural network architectures
5 different convolution neural networks (CNN) and one artificial neural network (ANN)
Novel sequence produced by the gradient checked to see if they meet the biological requirements for folding a valid protein structure
Sequences produced by the gradient ascent evaluated by the interpreter network to rank their enrichment
Top 50 highest enriched sequences evaluated for their binding affinity through a robust simulation process
Results
Conclusions
Mentors & Advisors
Mr. Sam Fung , Chemistry, Homestead High School, Cupertino CA
Dr. Robert Damoiseaux, Molecular and Medical Pharmacology, UCLA
Recognition 2022
ISEF Grand Award 4th, Computational Biology and Bioinformatics
CSEF Best of BioPhysics
CSEF Honorable Mention, Computational Systems and Analysis
Grand Prize - Best of Championship, Board of Directors Awards
1st Award, RRI Physical Science and Engineering Category