lnu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Combinatorial optimization of DNA sequence analysis on heterogeneous systems
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM), Institutionen för datavetenskap (DV). (Parallel Computing ; DISA ; HPCC)
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM), Institutionen för datavetenskap (DV). (Parallel Computing ; DISA ; HPCC)
2017 (Engelska)Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, nr 7, artikel-id e4037Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Analysis of DNA sequences is a data and computational intensive problem, and therefore, it requires suitable parallel computing resources and algorithms. In this paper, we describe our parallel algorithm for DNA sequence analysis that determines how many times a pattern appears in the DNA sequence. The algorithm is engineered for heterogeneous platforms that comprise a host with multi-core processors and one or more many-core devices. For combinatorial optimization, we use the simulated annealing algorithm. The optimization goal is to determine the number of threads, thread affinities, and DNA sequence fractions for host and device, such that the overall execution time of DNA sequence analysis is minimized. We evaluate our approach experimentally using real-world DNA sequences of various organisms on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processing device. By running only about 5% of possible experiments, our optimization method finds a near-optimal system configuration for DNA sequence analysis that yields with average speedup of 1.6 ×  and 2 ×  compared with the host-only and device-only execution.

Ort, förlag, år, upplaga, sidor
John Wiley & Sons, 2017. Vol. 29, nr 7, artikel-id e4037
Nationell ämneskategori
Datorsystem
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-58995DOI: 10.1002/cpe.4037ISI: 000398712500007Scopus ID: 2-s2.0-85006508024OAI: oai:DiVA.org:lnu-58995DiVA, id: diva2:1056047
Konferens
The 18th IEEE international conference on computational science and engineering (CSE2015)
Tillgänglig från: 2016-12-13 Skapad: 2016-12-13 Senast uppdaterad: 2019-09-06Bibliografiskt granskad
Ingår i avhandling
1. Programming and Optimization of Big-Data Applications on Heterogeneous Computing Systems
Öppna denna publikation i ny flik eller fönster >>Programming and Optimization of Big-Data Applications on Heterogeneous Computing Systems
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The next-generation sequencing instruments enable biological researchers to generate voluminous amounts of data. In the near future, it is projected that genomics will be the largest source of big-data. A major challenge of big data is the efficient analysis of very large data-sets. Modern heterogeneous parallel computing systems, which comprise multiple CPUs, GPUs, and Intel Xeon Phis, can cope with the requirements of big-data analysis applications. However, utilizing these resources to their highest possible extent demands advanced knowledge of various hardware architectures and programming frameworks. Furthermore, optimized software execution on such systems demands consideration of many compile-time and run-time system parameters.

In this thesis, we study and develop parallel pattern matching algorithms for heterogeneous computing systems. We apply our pattern matching algorithm for DNA sequence analysis. Experimental evaluation results show that our parallel algorithm can achieve more than 50x speedup when executed on host CPUs and more than 30x when executed on Intel Xeon Phi compared to the sequential version executed on the CPU.

Thereafter, we combine machine learning and search-based meta-heuristics to determine near-optimal parameter configurations of parallel matching algorithms for efficient execution on heterogeneous computing systems. We use our approach to distribute the workload of the DNA sequence analysis application across the available host CPUs and accelerating devices and to determine the system configuration parameters of a heterogeneous system that comprise Intel Xeon CPUs and Xeon Phi accelerator. Experimental results show that the execution that uses the resources of both host CPUs and accelerating device outperforms the host-only and the device-only executions.

Furthermore, we propose programming abstractions, a source-to-source compiler, and a run-time system for heterogeneous stream computing. Given a source code annotated with compiler directives, the source-to-source compiler can generate device-specific code. The run-time system can automatically distribute the workload across the available host CPUs and accelerating devices. Experimental results show that our solution significantly reduces the programming effort and the generated code delivers better performance than the CPUs-only or GPUs-only executions.

Ort, förlag, år, upplaga, sidor
Växjö: Linnaeus University Press, 2018
Serie
Linnaeus University Dissertations ; 335/2018
Nyckelord
Big Data, Heterogeneous Parallel Computing, Software Optimization, Source-to-source Compilation
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-79192 (URN)978-91-88898-14-2 (ISBN)978-91-88898-15-9 (ISBN)
Disputation
2018-12-20, D1136, Hus D, Växjö, 15:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2018-12-17 Skapad: 2018-12-13 Senast uppdaterad: 2018-12-17Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Memeti, SuejbPllana, Sabri

Sök vidare i DiVA

Av författaren/redaktören
Memeti, SuejbPllana, Sabri
Av organisationen
Institutionen för datavetenskap (DV)
I samma tidskrift
Concurrency and Computation
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 262 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf