lnu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Programming and Optimization of Big-Data Applications on Heterogeneous Computing Systems
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM), Institutionen för datavetenskap (DV). (Parallel Computing)
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The next-generation sequencing instruments enable biological researchers to generate voluminous amounts of data. In the near future, it is projected that genomics will be the largest source of big-data. A major challenge of big data is the efficient analysis of very large data-sets. Modern heterogeneous parallel computing systems, which comprise multiple CPUs, GPUs, and Intel Xeon Phis, can cope with the requirements of big-data analysis applications. However, utilizing these resources to their highest possible extent demands advanced knowledge of various hardware architectures and programming frameworks. Furthermore, optimized software execution on such systems demands consideration of many compile-time and run-time system parameters.

In this thesis, we study and develop parallel pattern matching algorithms for heterogeneous computing systems. We apply our pattern matching algorithm for DNA sequence analysis. Experimental evaluation results show that our parallel algorithm can achieve more than 50x speedup when executed on host CPUs and more than 30x when executed on Intel Xeon Phi compared to the sequential version executed on the CPU.

Thereafter, we combine machine learning and search-based meta-heuristics to determine near-optimal parameter configurations of parallel matching algorithms for efficient execution on heterogeneous computing systems. We use our approach to distribute the workload of the DNA sequence analysis application across the available host CPUs and accelerating devices and to determine the system configuration parameters of a heterogeneous system that comprise Intel Xeon CPUs and Xeon Phi accelerator. Experimental results show that the execution that uses the resources of both host CPUs and accelerating device outperforms the host-only and the device-only executions.

Furthermore, we propose programming abstractions, a source-to-source compiler, and a run-time system for heterogeneous stream computing. Given a source code annotated with compiler directives, the source-to-source compiler can generate device-specific code. The run-time system can automatically distribute the workload across the available host CPUs and accelerating devices. Experimental results show that our solution significantly reduces the programming effort and the generated code delivers better performance than the CPUs-only or GPUs-only executions.

Ort, förlag, år, upplaga, sidor
Växjö: Linnaeus University Press, 2018.
Serie
Linnaeus University Dissertations ; 335/2018
Nyckelord [en]
Big Data, Heterogeneous Parallel Computing, Software Optimization, Source-to-source Compilation
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-79192ISBN: 978-91-88898-14-2 (tryckt)ISBN: 978-91-88898-15-9 (digital)OAI: oai:DiVA.org:lnu-79192DiVA, id: diva2:1270537
Disputation
2018-12-20, D1136, Hus D, Växjö, 15:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2018-12-17 Skapad: 2018-12-13 Senast uppdaterad: 2018-12-17Bibliografiskt granskad
Delarbeten
1. Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
Öppna denna publikation i ny flik eller fönster >>Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
Visa övriga...
2019 (Engelska)Ingår i: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 101, nr 8, s. 893-936Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for software optimization at compile-time and run-time. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

Ort, förlag, år, upplaga, sidor
Springer, 2019
Nyckelord
Parallel computing, Machine learning, Meta-heuristics, Software optimization
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-73712 (URN)10.1007/s00607-018-0614-9 (DOI)000472515600001 ()2-s2.0-85045892455 (Scopus ID)
Tillgänglig från: 2018-04-27 Skapad: 2018-04-27 Senast uppdaterad: 2019-08-29Bibliografiskt granskad
2. PaREM: a Novel Approach for Parallel Regular Expression Matching
Öppna denna publikation i ny flik eller fönster >>PaREM: a Novel Approach for Parallel Regular Expression Matching
2014 (Engelska)Ingår i: 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE), IEEE Press, 2014, s. 690-697Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for large problem sizes. In this paper, we describe a novel algorithm for parallel regular expression matching via deterministic finite automata. Furthermore, we present our tool PaREM that accepts regular expressions and finite automata as input and automatically generates the corresponding code for our algorithm that is amenable for parallel execution on shared-memory systems. We evaluate our parallel algorithm empirically by comparing it with a commonly used algorithm for sequential regular expression matching. Experiments on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 21× for 48 threads.

Ort, förlag, år, upplaga, sidor
IEEE Press, 2014
Nationell ämneskategori
Teknik och teknologier Systemvetenskap, informationssystem och informatik
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-41014 (URN)10.1109/CSE.2014.146 (DOI)000380512100111 ()2-s2.0-84925250594 (Scopus ID)978-1-4799-7980-6 (ISBN)
Konferens
17th International Conference on Computational Science and Engineering (CSE), 19-21 Dec. 2014, Chengdu
Tillgänglig från: 2015-03-19 Skapad: 2015-03-19 Senast uppdaterad: 2018-12-13Bibliografiskt granskad
3. A machine learning approach for accelerating DNA sequence analysis
Öppna denna publikation i ny flik eller fönster >>A machine learning approach for accelerating DNA sequence analysis
2018 (Engelska)Ingår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 32, nr 3, s. 363-379Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The DNA sequence analysis is a data and computationally intensive problem and therefore demands suitable parallel computing resources and algorithms. In this paper, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose host central processing units (CPUs) and one or more Xeon Phi devices. We present a parallel algorithm that shares the work of DNA sequence analysis between the host CPUs and the Xeon Phi device to reduce the overall analysis time. For automatic worksharing we use a supervised machine learning approach, which predicts the performance of DNA sequence analysis on the host and device and accordingly maps fractions of the DNA sequence to the host and device. We evaluate our approach empirically using real-world DNA segments for human and various animals on a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P device with 61 cores.

Ort, förlag, år, upplaga, sidor
Sage Publications, 2018
Nyckelord
DNA sequence analysis, machine learning, heterogeneous parallel computing
Nationell ämneskategori
Data- och informationsvetenskap
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-54385 (URN)10.1177/1094342016654214 (DOI)000432133100005 ()2-s2.0-85046803969 (Scopus ID)
Tillgänglig från: 2016-06-29 Skapad: 2016-06-29 Senast uppdaterad: 2019-08-29Bibliografiskt granskad
4. Combinatorial optimization of DNA sequence analysis on heterogeneous systems
Öppna denna publikation i ny flik eller fönster >>Combinatorial optimization of DNA sequence analysis on heterogeneous systems
2017 (Engelska)Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, nr 7, artikel-id e4037Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Analysis of DNA sequences is a data and computational intensive problem, and therefore, it requires suitable parallel computing resources and algorithms. In this paper, we describe our parallel algorithm for DNA sequence analysis that determines how many times a pattern appears in the DNA sequence. The algorithm is engineered for heterogeneous platforms that comprise a host with multi-core processors and one or more many-core devices. For combinatorial optimization, we use the simulated annealing algorithm. The optimization goal is to determine the number of threads, thread affinities, and DNA sequence fractions for host and device, such that the overall execution time of DNA sequence analysis is minimized. We evaluate our approach experimentally using real-world DNA sequences of various organisms on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processing device. By running only about 5% of possible experiments, our optimization method finds a near-optimal system configuration for DNA sequence analysis that yields with average speedup of 1.6 ×  and 2 ×  compared with the host-only and device-only execution.

Ort, förlag, år, upplaga, sidor
John Wiley & Sons, 2017
Nationell ämneskategori
Datorsystem
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-58995 (URN)10.1002/cpe.4037 (DOI)000398712500007 ()2-s2.0-85006508024 (Scopus ID)
Konferens
The 18th IEEE international conference on computational science and engineering (CSE2015)
Tillgänglig från: 2016-12-13 Skapad: 2016-12-13 Senast uppdaterad: 2019-09-06Bibliografiskt granskad
5. Combinatorial Optimization of Work Distribution on Heterogeneous Systems
Öppna denna publikation i ny flik eller fönster >>Combinatorial Optimization of Work Distribution on Heterogeneous Systems
2016 (Engelska)Ingår i: Proceedings of 45th International Conference on Parallel Processing Workshops (ICPPW 2016), IEEE Press, 2016, s. 151-160Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We describe an approach that uses combinatorial optimization and machine learning to share the work between the host and device of heterogeneous computing systems such that the overall application execution time is minimized. We propose to use combinatorial optimization to search for the optimal system configuration in the given parameter space (such as, the number of threads, thread affinity, work distribution for the host and device). For each system configuration that is suggested by combinatorial optimization, we use machine learning for evaluation of the system performance. We evaluate our approach experimentally using a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P co-processor with 61 cores. Using our approach we are able to find a near-optimal system configuration by performing only about 5% of all possible experiments.

Ort, förlag, år, upplaga, sidor
IEEE Press, 2016
Serie
International Conference on Parallel Processing Workshops, ISSN 1530-2016
Nyckelord
Combinatorial Optimization, Work Distribution, Heterogeneous Systems
Nationell ämneskategori
Datorsystem
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-57096 (URN)10.1109/ICPPW.2016.35 (DOI)000392498600019 ()2-s2.0-84990889824 (Scopus ID)978-1-5090-2825-2 (ISBN)978-1-5090-2826-9 (ISBN)
Konferens
45th International Conference on Parallel Processing Workshops (ICPPW), 16-19 Aug. 2016, Philadelphia, Pennsylvania, USA
Forskningsfinansiär
KK-stiftelsen, 20150088
Tillgänglig från: 2016-10-06 Skapad: 2016-10-06 Senast uppdaterad: 2018-12-13Bibliografiskt granskad
6. Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
Öppna denna publikation i ny flik eller fönster >>Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
Visa övriga...
2017 (Engelska)Ingår i: ProceedingARMS-CC '17 Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, New York, NY, USA: Association for Computing Machinery (ACM), 2017, s. 1-6Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

Ort, förlag, år, upplaga, sidor
New York, NY, USA: Association for Computing Machinery (ACM), 2017
Nationell ämneskategori
Datorsystem Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-67141 (URN)10.1145/3110355.3110356 (DOI)978-1-4503-5116-4 (ISBN)
Konferens
ARMS-CC '17: the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, 28 July, 2017
Tillgänglig från: 2017-08-01 Skapad: 2017-08-01 Senast uppdaterad: 2018-12-13Bibliografiskt granskad
7. HSTREAM: A directive-based language extension for heterogeneous stream computing
Öppna denna publikation i ny flik eller fönster >>HSTREAM: A directive-based language extension for heterogeneous stream computing
2018 (Engelska)Ingår i: 2018 21st IEEE International Conference on Computational Science and Engineering (CSE) / [ed] Pop, F; Negru, C; GonzalezVelez, H; Rak, J, IEEE, 2018, s. 138-145Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such systems require advanced knowledge of several hardware architectures and device-specific programming models, including OpenMP and CUDA. In this paper, we present HSTREAM, a compiler directive-based language extension to support programming stream computing applications for heterogeneous parallel computing systems. HSTREAM source-to-source compiler aims to increase the programming productivity by enabling programmers to annotate the parallel regions for heterogeneous execution and generate target specific code. The HSTREAM runtime automatically distributes the workload across CPUs and accelerating devices. We demonstrate the usefulness of HSTREAM language extension with various applications from the STREAM benchmark. Experimental evaluation results show that HSTREAM can keep the same programming simplicity as OpenMP, and the generated code can deliver performance beyond what CPUs-only and GPUs-only executions can deliver. 

Ort, förlag, år, upplaga, sidor
IEEE, 2018
Serie
IEEE International Conference on Computational Science and Engineering, ISSN 1949-0828
Nyckelord
stream computing, heterogeneous parallel computing systems, source-to-source compilation
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap, Datavetenskap; Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:lnu:diva-79191 (URN)10.1109/CSE.2018.00026 (DOI)000458738400019 ()2-s2.0-85061051044 (Scopus ID)978-1-5386-7649-3 (ISBN)978-1-5386-7650-9 (ISBN)
Konferens
The 21st IEEE International Conference on Computational Science and Engineering (CSE 2018), 29-31 Oct. 2018, Bucharest
Tillgänglig från: 2018-12-13 Skapad: 2018-12-13 Senast uppdaterad: 2019-08-29Bibliografiskt granskad

Open Access i DiVA

fulltext(2237 kB)24 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2237 kBChecksumma SHA-512
d48e711a567f1ad0dbbf9f329a51d43f4a04bf5cb56d93b9e698f11f40e9def45426d9a6228dc0d35eb9ffcf687d0b94b1cd9cb6c47fe3a42b879e078abfde4d
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för datavetenskap (DV)
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 24 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 118 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf