Best Practices for Machine Learning-Assisted Protein Engineering
Davari, Mehdi D.
- 1Leibniz Institut fur Pflanzenbiochemie
- 2
- 3University of Stuttgart
Journal
Journal of Chemical Information and Modeling
ISSN
1549-9596
1549-960X
Open Access
closed
Volume
65
Start page
12655
End page
12667
Data-driven modeling based on machine learning (ML) is becoming a central component of protein engineering workflows. This perspective presents the elements necessary to develop effective, reliable, and reproducible ML models, and a set of guidelines for ML developments for protein engineering. This includes a critical discussion of software engineering good practices for the development and evaluation of ML-based protein engineering projects, emphasizing supervised learning. These guidelines cover all of the necessary steps for ML development, from data acquisition to model deployment. Additionally, the present perspective provides practical resources for the implementation of the outlined guidelines. These recommendations are also intended to support editors and scientific journals in enforcing good practices in ML-based protein engineering publications, promoting high standards across the community. With this, the aim is to further contribute to improved ML transparency and credibility by easing the adoption of software engineering best practices into ML development for protein engineering. We envision that the wide adoption and continuous update of best practices will encourage informed use of ML on real-world problems related to protein engineering.
Name
best-practices-for-machine-learning-assisted-protein-engineering.pdf
Type
Main Article
Size
5.72 MB
Format
Adobe PDF
Checksum
(MD5):7f129dfca947d32d0cd9f099b026c52a
