lnu.sePublications
Change search
Refine search result
1 - 18 of 18
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Chozas, Adridan Calvo
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Memeti, Suejb
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science. Linnaeus Univ, S-35195 Vaxjo, Sweden..
    Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution2017In: International Conference on Computational Science (ICCS 2017) / [ed] Koumoutsakos, P Lees, M Krzhizhanovskaya, V Dongarra, J Sloot, P, Elsevier, 2017, p. 2121-2130Conference paper (Refereed)
    Abstract [en]

    While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems is much more difficult than programming for sequential systems. OpenMP is an extension of C++ programming language that enables to express parallelism using compiler directives. While OpenMP alleviates parallel programming by reducing the lines of code that the programmer needs to write, deciding how and when to use these compiler directives is up to the programmer. Novice programmers may make mistakes that may lead to performance degradation or unexpected program behavior. Cognitive computing has shown impressive results in various domains, such as health or marketing. In this paper, we describe the use of IBM Watson cognitive system for education of novice parallel programmers. Using the dialogue service of the IBM Watson we have developed a solution that assists the programmer in avoiding common OpenMP mistakes. To evaluate our approach we have conducted a survey with a number of novice parallel programmers at the Linnaeus University, and obtained encouraging results with respect to usefulness of our approach. (C) 2017 The Authors. Published by Elsevier B.V.

  • 2.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Li, Lu
    Linköping University.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kołodziej, Joanna
    Cracow University of Technology, Poland.
    Kessler, Christoph
    Linköping University.
    Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption2017In: ProceedingARMS-CC '17 Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, New York, NY, USA: Association for Computing Machinery (ACM), 2017, p. 1-6Conference paper (Refereed)
    Abstract [en]

    Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

  • 3.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    A machine learning approach for accelerating DNA sequence analysis2018In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 32, no 3, p. 363-379Article in journal (Refereed)
    Abstract [en]

    The DNA sequence analysis is a data and computationally intensive problem and therefore demands suitable parallel computing resources and algorithms. In this paper, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose host central processing units (CPUs) and one or more Xeon Phi devices. We present a parallel algorithm that shares the work of DNA sequence analysis between the host CPUs and the Xeon Phi device to reduce the overall analysis time. For automatic worksharing we use a supervised machine learning approach, which predicts the performance of DNA sequence analysis on the host and device and accordingly maps fractions of the DNA sequence to the host and device. We evaluate our approach empirically using real-world DNA segments for human and various animals on a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P device with 61 cores.

  • 4.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Accelerating DNA Sequence Analysis using Intel(R) Xeon Phi(TM)2015In: 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, IEEE Press, 2015, Vol. 3, p. 222-227Conference paper (Refereed)
    Abstract [en]

    Genetic information is increasing exponentially, doubling every 18 months. Analyzing this information within a reasonable amount of time requires parallel computing resources. While considerable research has addressed DNA analysis using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we present an algorithm for large-scale DNA analysis that exploits thread-level and the SIMD parallelism of the Intel Xeon Phi. We evaluate our approach for various numbers of cores and thread allocation affinities in the context of real-world DNA sequences of mouse, cat, dog, chicken, human and turkey. The experimental results on Intel Xeon Phi show speed-ups of up to 10× compared to a sequential implementation running on an Intel Xeon processor E5.

  • 5.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Analyzing large-scale DNA Sequences on Multi-core Architectures2015In: Proceedings: IEEE 18th International Conferenceon Computational Science and Engineering, CSE 2015 / [ed] Plessl, C; ElBaz, D; Cong, G; Cardoso, JMP; Veiga, L; Rauber, T, IEEE Press, 2015, p. 208-215Conference paper (Refereed)
    Abstract [en]

    Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analysing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speedups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.

  • 6.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Combinatorial optimization of DNA sequence analysis on heterogeneous systems2017In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, no 7, article id e4037Article in journal (Refereed)
    Abstract [en]

    Analysis of DNA sequences is a data and computational intensive problem, and therefore, it requires suitable parallel computing resources and algorithms. In this paper, we describe our parallel algorithm for DNA sequence analysis that determines how many times a pattern appears in the DNA sequence. The algorithm is engineered for heterogeneous platforms that comprise a host with multi-core processors and one or more many-core devices. For combinatorial optimization, we use the simulated annealing algorithm. The optimization goal is to determine the number of threads, thread affinities, and DNA sequence fractions for host and device, such that the overall execution time of DNA sequence analysis is minimized. We evaluate our approach experimentally using real-world DNA sequences of various organisms on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processing device. By running only about 5% of possible experiments, our optimization method finds a near-optimal system configuration for DNA sequence analysis that yields with average speedup of 1.6 ×  and 2 ×  compared with the host-only and device-only execution.

  • 7.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Combinatorial Optimization of Work Distribution on Heterogeneous Systems2016In: Proceedings of 45th International Conference on Parallel Processing Workshops (ICPPW 2016), IEEE Press, 2016, p. 151-160Conference paper (Refereed)
    Abstract [en]

    We describe an approach that uses combinatorial optimization and machine learning to share the work between the host and device of heterogeneous computing systems such that the overall application execution time is minimized. We propose to use combinatorial optimization to search for the optimal system configuration in the given parameter space (such as, the number of threads, thread affinity, work distribution for the host and device). For each system configuration that is suggested by combinatorial optimization, we use machine learning for evaluation of the system performance. We evaluate our approach experimentally using a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P co-processor with 61 cores. Using our approach we are able to find a near-optimal system configuration by performing only about 5% of all possible experiments.

  • 8.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    HSTREAM: A directive-based language extension for heterogeneous stream computing2018In: 2018 21st IEEE International Conference on Computational Science and Engineering (CSE) / [ed] Pop, F; Negru, C; GonzalezVelez, H; Rak, J, IEEE, 2018, p. 138-145Conference paper (Refereed)
    Abstract [en]

    Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such systems require advanced knowledge of several hardware architectures and device-specific programming models, including OpenMP and CUDA. In this paper, we present HSTREAM, a compiler directive-based language extension to support programming stream computing applications for heterogeneous parallel computing systems. HSTREAM source-to-source compiler aims to increase the programming productivity by enabling programmers to annotate the parallel regions for heterogeneous execution and generate target specific code. The HSTREAM runtime automatically distributes the workload across CPUs and accelerating devices. We demonstrate the usefulness of HSTREAM language extension with various applications from the STREAM benchmark. Experimental evaluation results show that HSTREAM can keep the same programming simplicity as OpenMP, and the generated code can deliver performance beyond what CPUs-only and GPUs-only executions can deliver. 

  • 9.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    PAPA: A Parallel Programming Assistant Powered by IBM Watson Cognitive Computing Technology2018In: Journal of Computational Science, ISSN 1877-7503, E-ISSN 1877-7511, Vol. 26, p. 275-284Article in journal (Refereed)
    Abstract [en]

    The efficient utilization of the available resources in modern parallel computing systems requires advanced parallel programming expertise. However, parallel programming is more difficult than sequential programming. To alleviate the difficulties of parallel programming, high-level programming frameworks, such as OpenMP, have been proposed. Yet, there is evidence that novice parallel programmers make common mistakes that may lead to performance degradation or unexpected program behavior. In this paper, we present our cognitive Parallel Programming Assistant (PAPA) that aims at educating and assisting novice parallel programmers to avoid common OpenMP mistakes. PAPA combines different IBM Watson services to provide a dialog-based interaction (through text and voice) for programmers. We use the Watson Conversation service to implement the dialog-based interaction, and the Speech-to-Text and Text-to-Speech services to enable the voice interaction. The Watson Natural Language Understanding and WordsAPI Synonyms services are used to train PAPA with OpenMP-related publications. We evaluate our approach using a user experience questionnaire with a number of novice parallel programmers at Linnaeus University.

  • 10.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    PaREM: a Novel Approach for Parallel Regular Expression Matching2014In: 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE), IEEE Press, 2014, p. 690-697Conference paper (Refereed)
    Abstract [en]

    Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for large problem sizes. In this paper, we describe a novel algorithm for parallel regular expression matching via deterministic finite automata. Furthermore, we present our tool PaREM that accepts regular expressions and finite automata as input and automatically generates the corresponding code for our algorithm that is amenable for parallel execution on shared-memory systems. We evaluate our parallel algorithm empirically by comparing it with a commonly used algorithm for sequential regular expression matching. Experiments on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 21× for 48 threads.

  • 11.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    The Potential of Intel Xeon Phi for DNA Sequence Analysis2015In: ACACES 2015: Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems, 2015, p. 263-266Conference paper (Other academic)
    Abstract [en]

    Genetic information is increasing exponentially, doubling every 18 months. Analyzing this information within a reasonable amount of time requires parallel computing resources. While considerable research has addressed DNA analysis using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we present an algorithm for large-scale DNA analysis that exploits the thread-level and the SIMD parallelism of the Intel Xeon Phi coprocessor. We evaluate our approach for various numbers of cores and thread allocation affinities in the context of real-world DNA sequences of mouse, cat, dog, chicken, human and turkey. The experimental results on Intel Xeon Phi show speed-ups of up to 10× compared to a sequential implementation running on an Intel Xeon processor E5.

  • 12.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Work Distribution of Data-Parallel Applications on Heterogeneous Systems2016In: High Performance Computing: ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P^3MA, VHPC, WOPSSS, Frankfurt, Germany, June 19–23, 2016,  Revised Selected Papers / [ed] Michela Taufer, Bernd Mohr, Julian M. Kunkel, Springer, 2016, p. 69-81Chapter in book (Refereed)
    Abstract [en]

    Heterogeneous computing systems offer high peak performance and energy efficiency, and utilizing this potential is essential to achieve extreme-scale performance. However, optimal sharing of the work among processing elements in heterogeneous systems is not straightforward. In this paper, we propose an approach that uses combinatorial optimization to search for optimal system configuration in a given parameter space. The optimization goal is to determine the number of threads, thread affinities, and workload partitioning, such that the overall execution time is minimized. For combinatorial optimization we use the Simulated Annealing. We evaluate our approach with a DNA sequence analysis application on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processor. The obtained results demonstrate that using the near-optimal system configuration, determined by our algorithm based on the simulated annealing, application performance is improved.

  • 13.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Binotto, Alécio
    IBM Research, Brazil.
    Kołodziej, Joanna
    Cracow University of Technology, Poland.
    Brandic, Ivona
    Vienna University of Technology, Austria.
    A Review of Machine Learning and Meta-heuristic Methods for Scheduling Parallel Computing Systems2018In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications LOPAL 2018, New York, NY, USA: Association for Computing Machinery (ACM), 2018, article id 5Conference paper (Refereed)
    Abstract [en]

    Optimized software execution on parallel computing systems demands consideration of many parameters at run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for scheduling parallel computing systems. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of scheduling parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

  • 14.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Binotto, Alécio
    IBM Research, Brazil.
    Kołodziej, Joanna
    Cracow University of Technology, Poland.
    Brandic, Ivona
    Vienna University of Technology, Austria.
    Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review2019In: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 101, no 8, p. 893-936Article in journal (Refereed)
    Abstract [en]

    While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for software optimization at compile-time and run-time. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

  • 15.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Ferati, Mexhid
    Linnaeus University, Faculty of Technology, Department of Informatics.
    Kurti, Arianit
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Jusufi, Ilir
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    IoTutor: How Cognitive Computing Can Be Applied to Internet of Things Education2019Conference paper (Refereed)
    Abstract [en]

    We present IoTutor that is a cognitive computing solution for education of students in the IoT domain. We implement the IoTutor as a platform-independent web-based application that is able to interact with users via text or speech using natural language. We train the IoTutor with selected scientific publications relevant to the IoT education. To investigate users' experience with the IoTutor, we ask a group of students taking an IoT master level course at the Linnaeus University to use the IoTutor for a period of two weeks. We ask students to express their opinions with respect to the attractiveness, perspicuity, efficiency, stimulation, and novelty of the IoTutor. The evaluation results show a trend that students express an overall positive attitude towards the IoTutor with majority of the aspects rated higher than the neutral value.

  • 16.
    Memeti, Suejb
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kołodziej, Joanna
    Cracow University of Technology, Poland.
    Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms2016In: Resource Management for Big Data Platforms: Algorithms, Modelling, and High-Performance Computing Techniques, Springer, 2016, p. 279-309Chapter in book (Refereed)
    Abstract [en]

    In this chapter, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose CPUs and one (or more) Xeon Phi coprocessors. Our parallel DNA sequence analysis algorithm is based on Finite Automata and finds patterns in large-scale DNA sequences. To determine the optimal worksharing (that is, DNA sequence fractions for the host and accelerating device) we propose a solution that combines combinatorial optimization and machine learning. The objective function that we aim to minimize is the execution time of the DNA sequence analysis. We use combinatorial optimization to efficiently explore the system configuration space and determine with machine learning the near-optimal system configuration for execution of the DNA sequence analysis. We evaluate our approach empirically using real-world DNA segments of various organisms. For experimentation, we use an accelerated platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P accelerator with 61 cores.

  • 17.
    Perez, David
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Memeti, Suejb
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    A simulation study of a smart living IoT solution for remote elderly care2018In: 2018 Third International Conference on Fog and Mobile Edge Computing (FMEC), Barcelona, Spain: IEEE, 2018, p. 227-232Conference paper (Refereed)
    Abstract [en]

    We report a simulation study of a smart living IoT solution for elderly people living in their own houses. Our study was conducted in the context of BoIT project in Sweden that investigates the use of various IoT devices for remote housing and care-giving services. We focus on a carephone device that enables to establish a voice connection via IP with care givers or relatives. We have developed a simulation model to study the IoT solution for elderly care in the Vaxjo municipality in Sweden. The simulation model can be used to address various issues, such as determining the lack or excess of resources or long waiting times, and study the system behavior when the number of alarms is increased. Simulation results indicate that a 15% increase in the arrivals rate would cause unacceptable long waiting times for patients to receive the care.

  • 18.
    Viebke, Andre
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Memeti, Suejb
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Pllana, Sabri
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Abraham, Ajith
    Machine Intelligence Research Labs (MIR Labs), USA.
    CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi2019In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 75, no 1, p. 197-227Article in journal (Refereed)
    Abstract [en]

    Deep learning is an important component of big-data analytic tools and intelligent applications, such as, self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive, and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks.In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and implicit synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103x compared to the execution on one thread of the Xeon Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x compared to the sequential execution on Intel Core i5.

1 - 18 of 18
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf