Improving the Annotation of the Venom Gland Transcriptome of Pamphobeteus verdolaga, Prospecting Novel Bioactive Peptides

ABSTRACT: Spider venoms constitute a trove of novel peptides with biotechnological interest. Paucity of next-generation-sequencing (NGS) data generation has led to a description of less than 1% of these peptides. Increasing evidence supports the underestimation of the assembled genes a single transc...

Full description

Autores:
Salinas Restrepo, Cristian Felipe
Misas Rivas, Elizabeth
Estrada Gómez, Sebastián
Quintana Castillo, Juan Carlos
Guzmán, Fanny
Calderón Vélez, Juan Camilo
Giraldo Cadavid, Marco Antonio
Segura Latorre, Cesar
Tipo de recurso:
Article of investigation
Fecha de publicación:
2022
Institución:
Universidad de Antioquia
Repositorio:
Repositorio UdeA
Idioma:
eng
OAI Identifier:
oai:bibliotecadigital.udea.edu.co:10495/36078
Acceso en línea:
https://hdl.handle.net/10495/36078
Palabra clave:
Perfilación de la Expresión Génica
Gene Expression Profiling
Secuenciación de Nucleótidos de Alto Rendimiento
High-Throughput Nucleotide Sequencing
Péptidos - Genética
Peptides - Genetics
Programas Informáticos
Software
Venenos de Araña - química
Spider Venoms - chemistry
Venenos de Araña - genética
Spider Venoms - genetics
Transcriptoma
Transcriptome
Rights
openAccess
License
https://creativecommons.org/licenses/by/4.0/
Description
Summary:ABSTRACT: Spider venoms constitute a trove of novel peptides with biotechnological interest. Paucity of next-generation-sequencing (NGS) data generation has led to a description of less than 1% of these peptides. Increasing evidence supports the underestimation of the assembled genes a single transcriptome assembler can predict. Here, the transcriptome of the venom gland of the spider Pamphobeteus verdolaga was re-assembled, using three free access algorithms, Trinity, SOAPdenovo-Trans, and SPAdes, to obtain a more complete annotation. Assembler’s performance was evaluated by contig number, N50, read representation on the assembly, and BUSCO’s terms retrieval against the arthropod dataset. Out of all the assembled sequences with all software, 39.26% were common between the three assemblers, and 27.88% were uniquely assembled by Trinity, while 27.65% were uniquely assembled by SPAdes. The non-redundant merging of all three assemblies’ output permitted the annotation of 9232 sequences, which was 23% more when compared to each software and 28% more when compared to the previous P. verdolaga annotation; moreover, the description of 65 novel theraphotoxins was possible. In the generation of data for non-model organisms, as well as in the search for novel peptides with biotechnological interest, it is highly recommended to employ at least two different transcriptome assemblers.