Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations
Viruses significantly impact ecosystems by influencing microbial diversity and facilitating genetic exchange, but their genomes remain poorly annotated. Accurate viral genome annotation is challenging due to limited viral protein representation in databases and rapid sequence divergence. We present...
- Autores:
-
Puentes Mozo, Juanita
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2024
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/74920
- Acceso en línea:
- https://hdl.handle.net/1992/74920
- Palabra clave:
- Phage protein classification
Multi-modality approach
Viral proteins
Transformer models
Deep learning
Artificial intelligence
PHROGs database
PHROG-function prediction
Transfer learning in virology
Microbiología
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
id |
UNIANDES2_6cd6305c7d07f4b4cbb33014f6a7ab0a |
---|---|
oai_identifier_str |
oai:repositorio.uniandes.edu.co:1992/74920 |
network_acronym_str |
UNIANDES2 |
network_name_str |
Séneca: repositorio Uniandes |
repository_id_str |
|
dc.title.eng.fl_str_mv |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
title |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
spellingShingle |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations Phage protein classification Multi-modality approach Viral proteins Transformer models Deep learning Artificial intelligence PHROGs database PHROG-function prediction Transfer learning in virology Microbiología |
title_short |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
title_full |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
title_fullStr |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
title_full_unstemmed |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
title_sort |
Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations |
dc.creator.fl_str_mv |
Puentes Mozo, Juanita |
dc.contributor.advisor.none.fl_str_mv |
García Botero, Camilo Reyes Muñoz, Alejandro |
dc.contributor.author.none.fl_str_mv |
Puentes Mozo, Juanita |
dc.contributor.researchgroup.none.fl_str_mv |
Facultad de Ciencias::Biología Computacional y Ecología Microbiana |
dc.subject.keyword.eng.fl_str_mv |
Phage protein classification Multi-modality approach Viral proteins Transformer models Deep learning Artificial intelligence PHROGs database PHROG-function prediction Transfer learning in virology |
topic |
Phage protein classification Multi-modality approach Viral proteins Transformer models Deep learning Artificial intelligence PHROGs database PHROG-function prediction Transfer learning in virology Microbiología |
dc.subject.themes.none.fl_str_mv |
Microbiología |
description |
Viruses significantly impact ecosystems by influencing microbial diversity and facilitating genetic exchange, but their genomes remain poorly annotated. Accurate viral genome annotation is challenging due to limited viral protein representation in databases and rapid sequence divergence. We present a novel approach for viral protein classification by integrating text embeddings from protein language models (pLMs) and visual features from 3Di FASTA representations using transformer models. Leveraging pre-trained models such as ProteinBERT, ProteinBFD, and ESM, we performed a series of viral protein classification experiments at two levels: category level (9 classes) and PHROGs family level (1159 classes). Our model achieved superior results with PHROGs labels, attaining precision, recall, and F-score values of 0.784, 0.789, and 0.786, respectively, at the category level. The integration of 3Di image features with FASTA sequences further improved classification accuracy, enhancing true positive rates across most classes. These findings highlight the importance of accurate functional annotations and demonstrate the potential of transformer-based models in viral protein classification. The results also suggest that homology-based labels, such as those used by Pharokka, may introduce inconsistencies, warranting further investigation. Our dual-modality approach provides a robust framework for future research, promoting more precise and comprehensive protein classification methodologies. |
publishDate |
2024 |
dc.date.accessioned.none.fl_str_mv |
2024-08-02T18:46:45Z |
dc.date.available.none.fl_str_mv |
2024-08-02T18:46:45Z |
dc.date.issued.none.fl_str_mv |
2024-08-02 |
dc.type.none.fl_str_mv |
Trabajo de grado - Pregrado |
dc.type.driver.none.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
dc.type.version.none.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.none.fl_str_mv |
http://purl.org/coar/resource_type/c_7a1f |
dc.type.content.none.fl_str_mv |
Text |
dc.type.redcol.none.fl_str_mv |
http://purl.org/redcol/resource_type/TP |
format |
http://purl.org/coar/resource_type/c_7a1f |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/1992/74920 |
dc.identifier.instname.none.fl_str_mv |
instname:Universidad de los Andes |
dc.identifier.reponame.none.fl_str_mv |
reponame:Repositorio Institucional Séneca |
dc.identifier.repourl.none.fl_str_mv |
repourl:https://repositorio.uniandes.edu.co/ |
url |
https://hdl.handle.net/1992/74920 |
identifier_str_mv |
instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/ |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.references.none.fl_str_mv |
Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., Wang, J., Cong, Q., Kinch, L. N., Schaeffer, R. D., et al. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871–876. Bebis, G., & Georgiopoulos, M. (1994). Feed-forward neural networks. Ieee Potentials, 13(4), 27–31. Bouras, G., Nepal, R., Houtak, G., Psaltis, A. J., Wormald, P.-J., & Vreugde, S. (2023). Pharokka: A fast scalable bacteriophage annotation tool. Bioinformatics, 39(1), btac776. Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., & Linial, M. (2022). Proteinbert: A universal deep-learning model of protein sequence and function. Bioinformatics, 38(8), 2102–2110. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901. Câmara, G. B., Coutinho, M. G., Silva, L. M. d., Gadelha, W. V. d. N., Torquato, M. F., Barbosa, R. d. M., & Fernandes, M. A. (2022). Convolutional neural network applied to sars-cov-2 sequence classification. Sensors, 22(15), 5730. Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: A comprehensive review. Biology, 12(7), 1033. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. Elnaggar, A., Heinzinger, M., Dallago, C., et al. (2020). Prottrans: Towards cracking the language of life’s code through 500 self-supervised deep learning and high performance computing [j]. IEEE Trans, 685. Fang, Z., Feng, T., Zhou, H., & Chen, M. (2022). Deepvp: Identification and classification of phage virion proteins using deep learning. Gigascience, 11, giac076. Fang, Z., & Zhou, H. (2021). Virionfinder: Identification of complete and partial prokaryote virus virion protein from virome data using the sequence and biochemical properties of amino acids. Frontiers in microbiology, 12, 615711. Flamholz, Z. N., Biller, S. J., & Kelly, L. (2024). Large language models improve annotation of prokaryotic viral proteins. Nature Microbiology, 9(2), 537–549. Hatfull, G. F., & Hendrix, R. W. (2011). Bacteriophages and their genomes. Current opinion in virology, 1(4), 298–303. Heinzinger, M., Weissenow, K., Sanchez, J. G., Henkel, A., Mirdita, M., Steinegger, M., & Rost, B. (2023). Bilingual language model for protein sequence and structure. bioRxiv, 2023–07. Jain, P., & Hirst, J. D. (2010). Automatic structure classification of small proteins using random forest. BMC bioinformatics, 11, 1–14. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al. (2021). Highly accurate protein structure prediction with alphafold. nature, 596(7873), 583–589. Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J., & Chang, K.-W. (2019). Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557. McNair, K., Zhou, C., Dinsdale, E. A., Souza, B., & Edwards, R. A. (2019). Phanotate: A novel approach to gene identification in phage genomes. Bioinformatics, 35(22), 4537–4542. Modak, S., Mehta, S., Sehgal, D., & Valadi, J. (2019). Application of support vector machines in viral biology. Global Virology III: Virology in the 21st Century, 361–403. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. Shen, Y., Chen, Z., Mamalakis, M., He, L., Xia, H., Li, T., Su, Y., He, J., & Wang, Y. G. (2024). A fine-tuning dataset and benchmark for large language models for protein understanding. arXiv preprint arXiv:2406.05540. Smug, B. J., Szczepaniak, K., Rocha, E. P., Dunin-Horkawicz, S., & Mostowy, R. J. (2023). Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts. Nature Communications, 14(1), 7460. Terzian, P., Olo Ndela, E., Galiez, C., Lossouarn, J., Pérez Bucio, R. E., Mom, R., Toussaint, A., Petit, M.-A., & Enault, F. (2021). Phrog: Families of prokaryotic virus proteins clustered using remote homology. NAR Genomics and Bioinformatics, 3(3), lqab067. Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288. Van Kempen, M., Kim, S. S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C. L., Söding, J., & Steinegger, M. (2024). Fast and accurate protein structure search with foldseek. Nature biotechnology, 42(2), 243–246. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. |
dc.rights.en.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International |
dc.rights.uri.none.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.accessrights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.none.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.none.fl_str_mv |
22 páginas |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidad de los Andes |
dc.publisher.program.none.fl_str_mv |
Microbiología |
dc.publisher.faculty.none.fl_str_mv |
Facultad de Ciencias |
dc.publisher.department.spa.fl_str_mv |
Departamento de Ciencias Biológicas |
publisher.none.fl_str_mv |
Universidad de los Andes |
institution |
Universidad de los Andes |
bitstream.url.fl_str_mv |
https://repositorio.uniandes.edu.co/bitstreams/8e003380-4546-4df5-a7a3-082b7a10aa66/download https://repositorio.uniandes.edu.co/bitstreams/aced0295-62b1-4315-8a95-157817e4cbdb/download https://repositorio.uniandes.edu.co/bitstreams/2b2fb0b1-3b01-4dcb-8d80-99426a667e04/download https://repositorio.uniandes.edu.co/bitstreams/5bdb38de-304d-4483-85d0-966db4b55342/download https://repositorio.uniandes.edu.co/bitstreams/44fcbab4-0a2b-4515-9d4b-c8ec6e528492/download https://repositorio.uniandes.edu.co/bitstreams/39691d56-8fb0-426b-b6c9-21d5e9d92729/download https://repositorio.uniandes.edu.co/bitstreams/d1aebec7-3919-4dbd-ac93-ba4bea6be8c0/download https://repositorio.uniandes.edu.co/bitstreams/658b729d-8c46-40c6-a311-a36a1caca51d/download |
bitstream.checksum.fl_str_mv |
92e670e8c05956900492ba0f065966e1 e6a613d0486149d2364b8df1d441589d ae9e573a68e7f92501b6913cc846c39f 4460e5956bc1d1639be9ae6146a50347 e4c760122dccd010bde0c0be2d73afd7 a53b6c3bc21fad6674ad36582fb65201 d4518ac73012d5b40f2df50de9254998 a4e9bdfd31e9aa7bc8c56057618a4cc2 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio institucional Séneca |
repository.mail.fl_str_mv |
adminrepositorio@uniandes.edu.co |
_version_ |
1831927821697548288 |
spelling |
García Botero, CamiloReyes Muñoz, Alejandrovirtual::19647-1Puentes Mozo, JuanitaFacultad de Ciencias::Biología Computacional y Ecología Microbiana2024-08-02T18:46:45Z2024-08-02T18:46:45Z2024-08-02https://hdl.handle.net/1992/74920instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Viruses significantly impact ecosystems by influencing microbial diversity and facilitating genetic exchange, but their genomes remain poorly annotated. Accurate viral genome annotation is challenging due to limited viral protein representation in databases and rapid sequence divergence. We present a novel approach for viral protein classification by integrating text embeddings from protein language models (pLMs) and visual features from 3Di FASTA representations using transformer models. Leveraging pre-trained models such as ProteinBERT, ProteinBFD, and ESM, we performed a series of viral protein classification experiments at two levels: category level (9 classes) and PHROGs family level (1159 classes). Our model achieved superior results with PHROGs labels, attaining precision, recall, and F-score values of 0.784, 0.789, and 0.786, respectively, at the category level. The integration of 3Di image features with FASTA sequences further improved classification accuracy, enhancing true positive rates across most classes. These findings highlight the importance of accurate functional annotations and demonstrate the potential of transformer-based models in viral protein classification. The results also suggest that homology-based labels, such as those used by Pharokka, may introduce inconsistencies, warranting further investigation. Our dual-modality approach provides a robust framework for future research, promoting more precise and comprehensive protein classification methodologies.PregradoBiología Computacional22 páginasapplication/pdfengUniversidad de los AndesMicrobiologíaFacultad de CienciasDepartamento de Ciencias BiológicasAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA RepresentationsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPPhage protein classificationMulti-modality approachViral proteinsTransformer modelsDeep learningArtificial intelligencePHROGs databasePHROG-function predictionTransfer learning in virologyMicrobiologíaBaek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., Wang, J., Cong, Q., Kinch, L. N., Schaeffer, R. D., et al. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871–876.Bebis, G., & Georgiopoulos, M. (1994). Feed-forward neural networks. Ieee Potentials, 13(4), 27–31.Bouras, G., Nepal, R., Houtak, G., Psaltis, A. J., Wormald, P.-J., & Vreugde, S. (2023). Pharokka: A fast scalable bacteriophage annotation tool. Bioinformatics, 39(1), btac776.Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., & Linial, M. (2022). Proteinbert: A universal deep-learning model of protein sequence and function. Bioinformatics, 38(8), 2102–2110.Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.Câmara, G. B., Coutinho, M. G., Silva, L. M. d., Gadelha, W. V. d. N., Torquato, M. F., Barbosa, R. d. M., & Fernandes, M. A. (2022). Convolutional neural network applied to sars-cov-2 sequence classification. Sensors, 22(15), 5730.Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: A comprehensive review. Biology, 12(7), 1033.Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.Elnaggar, A., Heinzinger, M., Dallago, C., et al. (2020). Prottrans: Towards cracking the language of life’s code through 500 self-supervised deep learning and high performance computing [j]. IEEE Trans, 685.Fang, Z., Feng, T., Zhou, H., & Chen, M. (2022). Deepvp: Identification and classification of phage virion proteins using deep learning. Gigascience, 11, giac076.Fang, Z., & Zhou, H. (2021). Virionfinder: Identification of complete and partial prokaryote virus virion protein from virome data using the sequence and biochemical properties of amino acids. Frontiers in microbiology, 12, 615711.Flamholz, Z. N., Biller, S. J., & Kelly, L. (2024). Large language models improve annotation of prokaryotic viral proteins. Nature Microbiology, 9(2), 537–549.Hatfull, G. F., & Hendrix, R. W. (2011). Bacteriophages and their genomes. Current opinion in virology, 1(4), 298–303.Heinzinger, M., Weissenow, K., Sanchez, J. G., Henkel, A., Mirdita, M., Steinegger, M., & Rost, B. (2023). Bilingual language model for protein sequence and structure. bioRxiv, 2023–07.Jain, P., & Hirst, J. D. (2010). Automatic structure classification of small proteins using random forest. BMC bioinformatics, 11, 1–14.Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al. (2021). Highly accurate protein structure prediction with alphafold. nature, 596(7873), 583–589.Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J., & Chang, K.-W. (2019). Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557.McNair, K., Zhou, C., Dinsdale, E. A., Souza, B., & Edwards, R. A. (2019). Phanotate: A novel approach to gene identification in phage genomes. Bioinformatics, 35(22), 4537–4542.Modak, S., Mehta, S., Sehgal, D., & Valadi, J. (2019). Application of support vector machines in viral biology. Global Virology III: Virology in the 21st Century, 361–403.Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118.Shen, Y., Chen, Z., Mamalakis, M., He, L., Xia, H., Li, T., Su, Y., He, J., & Wang, Y. G. (2024). A fine-tuning dataset and benchmark for large language models for protein understanding. arXiv preprint arXiv:2406.05540.Smug, B. J., Szczepaniak, K., Rocha, E. P., Dunin-Horkawicz, S., & Mostowy, R. J. (2023). Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts. Nature Communications, 14(1), 7460.Terzian, P., Olo Ndela, E., Galiez, C., Lossouarn, J., Pérez Bucio, R. E., Mom, R., Toussaint, A., Petit, M.-A., & Enault, F. (2021). Phrog: Families of prokaryotic virus proteins clustered using remote homology. NAR Genomics and Bioinformatics, 3(3), lqab067.Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.Van Kempen, M., Kim, S. S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C. L., Söding, J., & Steinegger, M. (2024). Fast and accurate protein structure search with foldseek. Nature biotechnology, 42(2), 243–246.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.201814823Publicationhttps://scholar.google.es/citations?user=hbXF8UEAAAAJhttps://scholar.google.es/citations?user=hbXF8UEAAAAJvirtual::19647-10000-0003-2907-32650000-0003-2907-3265virtual::19647-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000395927https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000395927virtual::19647-1f71489e5-69f6-4e6b-90a6-c6b1d3fecec7virtual::19647-1f71489e5-69f6-4e6b-90a6-c6b1d3fecec7f71489e5-69f6-4e6b-90a6-c6b1d3fecec7virtual::19647-1ORIGINAL201824823_ForAutEntTesis_TraGraSisBib_202410.pdf201824823_ForAutEntTesis_TraGraSisBib_202410.pdfHIDEapplication/pdf344206https://repositorio.uniandes.edu.co/bitstreams/8e003380-4546-4df5-a7a3-082b7a10aa66/download92e670e8c05956900492ba0f065966e1MD51Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations.pdfDual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations.pdfapplication/pdf5010989https://repositorio.uniandes.edu.co/bitstreams/aced0295-62b1-4315-8a95-157817e4cbdb/downloade6a613d0486149d2364b8df1d441589dMD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/2b2fb0b1-3b01-4dcb-8d80-99426a667e04/downloadae9e573a68e7f92501b6913cc846c39fMD54CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://repositorio.uniandes.edu.co/bitstreams/5bdb38de-304d-4483-85d0-966db4b55342/download4460e5956bc1d1639be9ae6146a50347MD55TEXT201824823_ForAutEntTesis_TraGraSisBib_202410.pdf.txt201824823_ForAutEntTesis_TraGraSisBib_202410.pdf.txtExtracted texttext/plain2078https://repositorio.uniandes.edu.co/bitstreams/44fcbab4-0a2b-4515-9d4b-c8ec6e528492/downloade4c760122dccd010bde0c0be2d73afd7MD56Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations.pdf.txtDual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations.pdf.txtExtracted texttext/plain49653https://repositorio.uniandes.edu.co/bitstreams/39691d56-8fb0-426b-b6c9-21d5e9d92729/downloada53b6c3bc21fad6674ad36582fb65201MD58THUMBNAIL201824823_ForAutEntTesis_TraGraSisBib_202410.pdf.jpg201824823_ForAutEntTesis_TraGraSisBib_202410.pdf.jpgGenerated Thumbnailimage/jpeg11045https://repositorio.uniandes.edu.co/bitstreams/d1aebec7-3919-4dbd-ac93-ba4bea6be8c0/downloadd4518ac73012d5b40f2df50de9254998MD57Dual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations.pdf.jpgDual-Modality Transformer-Based Approach for Viral Protein Classification Integrating Protein Language Models and 3Di FASTA Representations.pdf.jpgGenerated Thumbnailimage/jpeg11332https://repositorio.uniandes.edu.co/bitstreams/658b729d-8c46-40c6-a311-a36a1caca51d/downloada4e9bdfd31e9aa7bc8c56057618a4cc2MD591992/74920oai:repositorio.uniandes.edu.co:1992/749202024-11-14 14:51:46.088http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K |