RUDEUS: A Machine Learning Classification System to Study DNA-Binding Proteins
Journal
International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings
ISSN
2184-3228
Open Access
green
Volume
1
Start page
302
End page
310
DNA-binding proteins play crucial roles in biological processes such as replication, transcription, packaging, and chromatin remodeling. Their study has gained importance across scientific fields, with computational biology complementing traditional methods. While machine learning has advanced bioinformatics, generalizable pipelines for identifying DNA-binding proteins and their specific interactions remain scarce. We present RUDEUS, a Python library with hierarchical classification models to identify DNA-binding proteins and distinguish between single- and double-stranded DNA interactions. RUDEUS integrates protein language models, supervised learning, and Bayesian optimization, achieving 95% precision in DNA-binding identification and 89% accuracy in distinguishing interaction types. The library also includes tools for annotating unknown sequences and validating DNA-protein interactions through molecular docking. RUDEUS delivers competitive performance and is easily integrated into protein engineering workflows. It is available under the MIT License, with the source code and models available on the GitHub repository https://github.com/ProteinEngineering-PESB2/RUDEUS.
Name
129465.pdf
Size
10.47 MB
Format
Adobe PDF
Checksum
(MD5):8d63272936cb60260eef011bcf5a1674