Parallel algorithm for global alignment of long aminoacid and nucleotide sequences

R. K. Tetuev; M. I. Pyatkov; A. N. Pankratov

Geodesic

Parcourir par

Parallel algorithm for global alignment of long aminoacid and nucleotide sequences

R. K. Tetuev ; M. I. Pyatkov ; A. N. Pankratov

Matematičeskaâ biologiâ i bioinformatika, Tome 12 (2017) no. 1, pp. 137-150.

Voir la notice de l'article provenant de la source Math-Net.Ru

Résumé

A parallel algorithm is developed for the global alignment of long DNA and protein sequences. The algorithm uses an arbitrary substitution matrix. An affine internal and end gap penalty systems might be set up separately for each of the two sequences. The possibility to control the choice of an optimal alignment from the set of alternative ones is implemented. The two parameters of the parallel algorithm which are called grid steps were introduced. They are used to split the global alignment matrix into blocks of specified size. The analysis of these parameters was performed in order to obtain optimal values that reduce either time complexity or memory complexity of the algorithm. It is shown that when using the block sizes that provide lowest memory complexity, the algorithm allows to align two long sequences of length

L

using the memory of size

O (L^{4 / 3})

. It is shown that the algorithm is well scaled on multi-core systems, reaching superlinear execution time acceleration. The algorithm is implemented as a high-performance parallel web application in JavaScript and is freely available at http://sbars.impb.ru/aligner.html.

Export
Comment citer

@article{MBB_2017_12_1_a2,
     author = {R. K. Tetuev and M. I. Pyatkov and A. N. Pankratov},
     title = {Parallel algorithm for global alignment of long aminoacid and nucleotide sequences},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {137--150},
     publisher = {mathdoc},
     volume = {12},
     number = {1},
     year = {2017},
     language = {ru},
     url = {https://geodesic-test.mathdoc.fr/item/MBB_2017_12_1_a2/}
}

TY  - JOUR
AU  - R. K. Tetuev
AU  - M. I. Pyatkov
AU  - A. N. Pankratov
TI  - Parallel algorithm for global alignment of long aminoacid and nucleotide sequences
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2017
SP  - 137
EP  - 150
VL  - 12
IS  - 1
PB  - mathdoc
UR  - https://geodesic-test.mathdoc.fr/item/MBB_2017_12_1_a2/
LA  - ru
ID  - MBB_2017_12_1_a2
ER  -

%0 Journal Article
%A R. K. Tetuev
%A M. I. Pyatkov
%A A. N. Pankratov
%T Parallel algorithm for global alignment of long aminoacid and nucleotide sequences
%J Matematičeskaâ biologiâ i bioinformatika
%D 2017
%P 137-150
%V 12
%N 1
%I mathdoc
%U https://geodesic-test.mathdoc.fr/item/MBB_2017_12_1_a2/
%G ru
%F MBB_2017_12_1_a2

R. K. Tetuev; M. I. Pyatkov; A. N. Pankratov. Parallel algorithm for global alignment of long aminoacid and nucleotide sequences. Matematičeskaâ biologiâ i bioinformatika, Tome 12 (2017) no. 1, pp. 137-150. https://geodesic-test.mathdoc.fr/item/MBB_2017_12_1_a2/

Bibliographie
Cité par

[1] Oplachko E. S., Ustinin D. M., Ustinin M. N., “Cloud Computing Technologies and their Application in Problems of Computational Biology”, Mathematical Biology and Bioinformatics, 8:2 (2013), 449–466 (in Russ.) | DOI

[2] Daugelaite J., O'Driscoll A., Sleator R., “An Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics”, ISRN Biomathematics, 2013 (2013), 1–14 | DOI

[3] Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T. L., “BLAST+: architecture and applications”, BMC Bioinformatics, 10 (2009), 421 | DOI

[4] Needleman S., Wunsch C., “A general method applicable to the search for similarities in the amino acid sequence of two proteins”, J. Mol. Biol., 48:3 (1970), 443–453 | DOI

[5] Pankratov A., Pyatkov M., Tetuev R., Nazipova N., Dedus F. F., “Search for Extended Repeats in Genomes Based on the Spectral-Analytical Method”, Mathematical Biology and Bioinformatics, 7:2 (2012), 476–492 (in Russ.) | DOI

[6] Pyatkov M. I., Pankratov A. N., “SBARS: fast creation of dotplots for DNA sequences on different scales using GA-,GC-content”, Bioinformatics, 30:12 (2014), 1765–1766 | DOI

[7] NCBI BLAST: web site, (accessed 14 April 2017) https://blast.ncbi.nlm.nih.gov/Blast.cgi

[8] Rice P., Longden I., Bleasby A., “EMBOSS: The European Molecular Biology Open Software Suite”, Trends in Genetics, 16:6 (2000), 276–277 | DOI

[9] Gotoh O., “An improved algorithm for matching biological sequences”, J. Mol. Biol., 162:3 (1982), 705–708 | DOI

[10] Press W., Teukolsky S., Vetterling W., Flannery B., Numerical Recipes: The Art of Scientific Computing, Cambridge University Press, 2007, 1256 pp. | MR

[11] Myers E., Miller W., “Optimal alignments in linear space”, Comput. Appl. Biosci., 4:1 (1988), 11–17 | DOI

[12] Hirschberg D. S., “A linear space algorithm for computing maximal common subsequences”, Communications of the ACM, 18:6 (1975), 341–343 | DOI | MR

[13] Altschul S., Gish W., Miller W., Myers E., Lipman D., “Basic local alignment search tool”, J. Mol. Biol., 215 (1990), 403–410 | DOI

[14] Driga A., Lu P., Schaeffer J., Szafron D., Charter K., Parsons I., “FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment”, Algorithmica, 45 (2006), 337–375 | DOI | MR

[15] Chakraborty A., Bandyopadhyay S., “FOGSAA: Fast Optimal Global Sequence Alignment Algorithm”, Scientific Reports, 2013, no. 3, 1746 | DOI

[16] Loving J., Hernandez Y., Benson G., “BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm”, Bioinformatics, 30:22 (2014), 3166–3173 | DOI | MR

[17] Farrar M., “Striped Smith-Waterman speeds database searches six times over other SIMD implementations”, Bioinformatics, 23:2 (2007), 156–161 | DOI

[18] Huson D., Chao Xie C., “A poor man's BLASTX — high-throughput metagenomic protein database search using PAUDA”, Bioinformatics, 30:1 (2014), 38–39 | DOI

[19] Galvez S., Diaz D., Hernandez P., Esteban F. J., Caballero J. A., Dorado G., “Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment”, Bioinformatics, 26 (2010), 683–686 | DOI

[20] Blom J., Jakobi T., Doppmeier D., Jaenicke S., Kalinowski J., Stoye J., Goesmann A., “Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming”, Bioinformatics, 27 (2011), 1351–1358 | DOI

[21] Levenshtein V. I., “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10 (1966), 707–710 | MR

[22] Dayhoff M., Schwartz R., Orcutt B., “A model of Evolutionary Change in Proteins”, Atlas of protein sequence and structure, 5 (1978), 345–358

[23] Henikoff S., Henikoff G., “Amino acid substitution matrices from protein blocks”, Proc. Natl. Acad. Sci. USA, 89:22 (1992), 10915–10919 | DOI

[24] Hamming R. W., “Error Detecting and Error Correcting Codes”, The Bell System Technical Journal, 29:2 (1950), 147–160 | DOI | MR

[25] Xuhua X., Bioinformatics and the Cell. Modern Computational Approaches in Genomics, Proteomics and Transcriptomics, Springer, 2007, 124–127

[26] Ibarra I., Melo F., “Interactive software tool to comprehend the calculation of optimal sequence alignments with dynamic programming”, Bioinformatics, 26:13 (2010), 1664–1669 | DOI