lnu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Memeti, Suejb
Publications (10 of 18) Show all publications
Viebke, A., Memeti, S., Pllana, S. & Abraham, A. (2019). CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi. Journal of Supercomputing, 75(1), 197-227
Open this publication in new window or tab >>CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi
2019 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 75, no 1, p. 197-227Article in journal (Refereed) Published
Abstract [en]

Deep learning is an important component of big-data analytic tools and intelligent applications, such as, self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive, and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks.In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and implicit synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103x compared to the execution on one thread of the Xeon Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x compared to the sequential execution on Intel Core i5.

Place, publisher, year, edition, pages
Springer, 2019
National Category
Computer Systems
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-60938 (URN)10.1007/s11227-017-1994-x (DOI)000456629400014 ()2-s2.0-85014542478 (Scopus ID)
Available from: 2017-02-25 Created: 2017-02-25 Last updated: 2019-08-29Bibliographically approved
Memeti, S., Pllana, S., Ferati, M., Kurti, A. & Jusufi, I. (2019). IoTutor: How Cognitive Computing Can Be Applied to Internet of Things Education. In: Leon Strous and Vinton G. Cerf (Ed.), : . Paper presented at IFIPIoT 2018 (pp. 1-16). Springer
Open this publication in new window or tab >>IoTutor: How Cognitive Computing Can Be Applied to Internet of Things Education
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We present IoTutor that is a cognitive computing solution for education of students in the IoT domain. We implement the IoTutor as a platform-independent web-based application that is able to interact with users via text or speech using natural language. We train the IoTutor with selected scientific publications relevant to the IoT education. To investigate users' experience with the IoTutor, we ask a group of students taking an IoT master level course at the Linnaeus University to use the IoTutor for a period of two weeks. We ask students to express their opinions with respect to the attractiveness, perspicuity, efficiency, stimulation, and novelty of the IoTutor. The evaluation results show a trend that students express an overall positive attitude towards the IoTutor with majority of the aspects rated higher than the neutral value.

Place, publisher, year, edition, pages
Springer, 2019
Series
IFIP Advances in Information and Communication Technology, ISSN 1868-4238 ; 548
Keywords
Internet of Things (IoT), education, cognitive computing, IBM Watson
National Category
Computer Systems
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-80835 (URN)10.1007/978-3-030-15651-0_18 (DOI)2-s2.0-85064686693 (Scopus ID)978-3-030-15651-0 (ISBN)978-3-030-15650-3 (ISBN)
Conference
IFIPIoT 2018
Funder
Knowledge Foundation, 20150088, 20150259
Available from: 2019-02-26 Created: 2019-02-26 Last updated: 2019-08-29Bibliographically approved
Memeti, S., Pllana, S., Binotto, A., Kołodziej, J. & Brandic, I. (2019). Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review. Computing, 101(8), 893-936
Open this publication in new window or tab >>Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
Show others...
2019 (English)In: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 101, no 8, p. 893-936Article in journal (Refereed) Published
Abstract [en]

While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for software optimization at compile-time and run-time. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

Place, publisher, year, edition, pages
Springer, 2019
Keywords
Parallel computing, Machine learning, Meta-heuristics, Software optimization
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-73712 (URN)10.1007/s00607-018-0614-9 (DOI)000472515600001 ()2-s2.0-85045892455 (Scopus ID)
Available from: 2018-04-27 Created: 2018-04-27 Last updated: 2019-08-29Bibliographically approved
Memeti, S. & Pllana, S. (2018). A machine learning approach for accelerating DNA sequence analysis. The international journal of high performance computing applications, 32(3), 363-379
Open this publication in new window or tab >>A machine learning approach for accelerating DNA sequence analysis
2018 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 32, no 3, p. 363-379Article in journal (Refereed) Published
Abstract [en]

The DNA sequence analysis is a data and computationally intensive problem and therefore demands suitable parallel computing resources and algorithms. In this paper, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose host central processing units (CPUs) and one or more Xeon Phi devices. We present a parallel algorithm that shares the work of DNA sequence analysis between the host CPUs and the Xeon Phi device to reduce the overall analysis time. For automatic worksharing we use a supervised machine learning approach, which predicts the performance of DNA sequence analysis on the host and device and accordingly maps fractions of the DNA sequence to the host and device. We evaluate our approach empirically using real-world DNA segments for human and various animals on a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P device with 61 cores.

Place, publisher, year, edition, pages
Sage Publications, 2018
Keywords
DNA sequence analysis, machine learning, heterogeneous parallel computing
National Category
Computer and Information Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-54385 (URN)10.1177/1094342016654214 (DOI)000432133100005 ()2-s2.0-85046803969 (Scopus ID)
Available from: 2016-06-29 Created: 2016-06-29 Last updated: 2019-08-29Bibliographically approved
Memeti, S., Pllana, S., Binotto, A., Kołodziej, J. & Brandic, I. (2018). A Review of Machine Learning and Meta-heuristic Methods for Scheduling Parallel Computing Systems. In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications LOPAL 2018: . Paper presented at International Conference on Learning and Optimization Algorithms: Theory and Applications (LOPAL'18), Rabat, Morocco, May 02 - 05, 2018. New York, NY, USA: Association for Computing Machinery (ACM), Article ID 5.
Open this publication in new window or tab >>A Review of Machine Learning and Meta-heuristic Methods for Scheduling Parallel Computing Systems
Show others...
2018 (English)In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications LOPAL 2018, New York, NY, USA: Association for Computing Machinery (ACM), 2018, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

Optimized software execution on parallel computing systems demands consideration of many parameters at run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for scheduling parallel computing systems. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of scheduling parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

Place, publisher, year, edition, pages
New York, NY, USA: Association for Computing Machinery (ACM), 2018
Keywords
Parallel computing, machine learning, meta-heuristics, scheduling
National Category
Computer Sciences
Identifiers
urn:nbn:se:lnu:diva-76933 (URN)10.1145/3230905.3230906 (DOI)2-s2.0-85053484990 (Scopus ID)978-1-4503-5304-5 (ISBN)
Conference
International Conference on Learning and Optimization Algorithms: Theory and Applications (LOPAL'18), Rabat, Morocco, May 02 - 05, 2018
Available from: 2018-07-17 Created: 2018-07-17 Last updated: 2019-08-29Bibliographically approved
Perez, D., Memeti, S. & Pllana, S. (2018). A simulation study of a smart living IoT solution for remote elderly care. In: 2018 Third International Conference on Fog and Mobile Edge Computing (FMEC): . Paper presented at 3rd International Conference on Fog and Mobile Edge Computing (FMEC), 23-26 April, 2018, Barcelona, Spain. (pp. 227-232). Barcelona, Spain: IEEE
Open this publication in new window or tab >>A simulation study of a smart living IoT solution for remote elderly care
2018 (English)In: 2018 Third International Conference on Fog and Mobile Edge Computing (FMEC), Barcelona, Spain: IEEE, 2018, p. 227-232Conference paper, Published paper (Refereed)
Abstract [en]

We report a simulation study of a smart living IoT solution for elderly people living in their own houses. Our study was conducted in the context of BoIT project in Sweden that investigates the use of various IoT devices for remote housing and care-giving services. We focus on a carephone device that enables to establish a voice connection via IP with care givers or relatives. We have developed a simulation model to study the IoT solution for elderly care in the Vaxjo municipality in Sweden. The simulation model can be used to address various issues, such as determining the lack or excess of resources or long waiting times, and study the system behavior when the number of alarms is increased. Simulation results indicate that a 15% increase in the arrivals rate would cause unacceptable long waiting times for patients to receive the care.

Place, publisher, year, edition, pages
Barcelona, Spain: IEEE, 2018
Keywords
remote elderly care, smart living, simulation, Internet of Things (IoT)
National Category
Human Aspects of ICT
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-74872 (URN)10.1109/FMEC.2018.8364069 (DOI)000444770700037 ()2-s2.0-85048856982 (Scopus ID)978-1-5386-5896-3 (ISBN)978-1-5386-5897-0 (ISBN)
Conference
3rd International Conference on Fog and Mobile Edge Computing (FMEC), 23-26 April, 2018, Barcelona, Spain.
Available from: 2018-06-02 Created: 2018-06-02 Last updated: 2019-08-29Bibliographically approved
Memeti, S. & Pllana, S. (2018). HSTREAM: A directive-based language extension for heterogeneous stream computing. In: Pop, F; Negru, C; GonzalezVelez, H; Rak, J (Ed.), 2018 21st IEEE International Conference on Computational Science and Engineering (CSE): . Paper presented at The 21st IEEE International Conference on Computational Science and Engineering (CSE 2018), 29-31 Oct. 2018, Bucharest (pp. 138-145). IEEE
Open this publication in new window or tab >>HSTREAM: A directive-based language extension for heterogeneous stream computing
2018 (English)In: 2018 21st IEEE International Conference on Computational Science and Engineering (CSE) / [ed] Pop, F; Negru, C; GonzalezVelez, H; Rak, J, IEEE, 2018, p. 138-145Conference paper, Published paper (Refereed)
Abstract [en]

Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such systems require advanced knowledge of several hardware architectures and device-specific programming models, including OpenMP and CUDA. In this paper, we present HSTREAM, a compiler directive-based language extension to support programming stream computing applications for heterogeneous parallel computing systems. HSTREAM source-to-source compiler aims to increase the programming productivity by enabling programmers to annotate the parallel regions for heterogeneous execution and generate target specific code. The HSTREAM runtime automatically distributes the workload across CPUs and accelerating devices. We demonstrate the usefulness of HSTREAM language extension with various applications from the STREAM benchmark. Experimental evaluation results show that HSTREAM can keep the same programming simplicity as OpenMP, and the generated code can deliver performance beyond what CPUs-only and GPUs-only executions can deliver. 

Place, publisher, year, edition, pages
IEEE, 2018
Series
IEEE International Conference on Computational Science and Engineering, ISSN 1949-0828
Keywords
stream computing, heterogeneous parallel computing systems, source-to-source compilation
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science; Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-79191 (URN)10.1109/CSE.2018.00026 (DOI)000458738400019 ()2-s2.0-85061051044 (Scopus ID)978-1-5386-7649-3 (ISBN)978-1-5386-7650-9 (ISBN)
Conference
The 21st IEEE International Conference on Computational Science and Engineering (CSE 2018), 29-31 Oct. 2018, Bucharest
Available from: 2018-12-13 Created: 2018-12-13 Last updated: 2019-08-29Bibliographically approved
Memeti, S. & Pllana, S. (2018). PAPA: A Parallel Programming Assistant Powered by IBM Watson Cognitive Computing Technology. Journal of Computational Science, 26, 275-284
Open this publication in new window or tab >>PAPA: A Parallel Programming Assistant Powered by IBM Watson Cognitive Computing Technology
2018 (English)In: Journal of Computational Science, ISSN 1877-7503, E-ISSN 1877-7511, Vol. 26, p. 275-284Article in journal (Refereed) Published
Abstract [en]

The efficient utilization of the available resources in modern parallel computing systems requires advanced parallel programming expertise. However, parallel programming is more difficult than sequential programming. To alleviate the difficulties of parallel programming, high-level programming frameworks, such as OpenMP, have been proposed. Yet, there is evidence that novice parallel programmers make common mistakes that may lead to performance degradation or unexpected program behavior. In this paper, we present our cognitive Parallel Programming Assistant (PAPA) that aims at educating and assisting novice parallel programmers to avoid common OpenMP mistakes. PAPA combines different IBM Watson services to provide a dialog-based interaction (through text and voice) for programmers. We use the Watson Conversation service to implement the dialog-based interaction, and the Speech-to-Text and Text-to-Speech services to enable the voice interaction. The Watson Natural Language Understanding and WordsAPI Synonyms services are used to train PAPA with OpenMP-related publications. We evaluate our approach using a user experience questionnaire with a number of novice parallel programmers at Linnaeus University.

Place, publisher, year, edition, pages
Elsevier, 2018
Keywords
Cognitive computing, Parallel programming, IBM Watson, OpenMP
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-69606 (URN)10.1016/j.jocs.2018.01.001 (DOI)000438001600028 ()2-s2.0-85040666206 (Scopus ID)
Note

Available online 6 January 2018

Available from: 2018-01-08 Created: 2018-01-08 Last updated: 2019-08-29Bibliographically approved
Memeti, S., Li, L., Pllana, S., Kołodziej, J. & Kessler, C. (2017). Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption. In: ProceedingARMS-CC '17 Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing: . Paper presented at ARMS-CC '17: the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, 28 July, 2017 (pp. 1-6). New York, NY, USA: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
Show others...
2017 (English)In: ProceedingARMS-CC '17 Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, New York, NY, USA: Association for Computing Machinery (ACM), 2017, p. 1-6Conference paper, Published paper (Refereed)
Abstract [en]

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

Place, publisher, year, edition, pages
New York, NY, USA: Association for Computing Machinery (ACM), 2017
National Category
Computer Systems Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-67141 (URN)10.1145/3110355.3110356 (DOI)978-1-4503-5116-4 (ISBN)
Conference
ARMS-CC '17: the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, 28 July, 2017
Available from: 2017-08-01 Created: 2017-08-01 Last updated: 2018-12-13Bibliographically approved
Memeti, S. & Pllana, S. (2017). Combinatorial optimization of DNA sequence analysis on heterogeneous systems. Paper presented at The 18th IEEE international conference on computational science and engineering (CSE2015). Concurrency and Computation, 29(7), Article ID e4037.
Open this publication in new window or tab >>Combinatorial optimization of DNA sequence analysis on heterogeneous systems
2017 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, no 7, article id e4037Article in journal (Refereed) Published
Abstract [en]

Analysis of DNA sequences is a data and computational intensive problem, and therefore, it requires suitable parallel computing resources and algorithms. In this paper, we describe our parallel algorithm for DNA sequence analysis that determines how many times a pattern appears in the DNA sequence. The algorithm is engineered for heterogeneous platforms that comprise a host with multi-core processors and one or more many-core devices. For combinatorial optimization, we use the simulated annealing algorithm. The optimization goal is to determine the number of threads, thread affinities, and DNA sequence fractions for host and device, such that the overall execution time of DNA sequence analysis is minimized. We evaluate our approach experimentally using real-world DNA sequences of various organisms on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co-processing device. By running only about 5% of possible experiments, our optimization method finds a near-optimal system configuration for DNA sequence analysis that yields with average speedup of 1.6 ×  and 2 ×  compared with the host-only and device-only execution.

Place, publisher, year, edition, pages
John Wiley & Sons, 2017
National Category
Computer Systems
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-58995 (URN)10.1002/cpe.4037 (DOI)000398712500007 ()2-s2.0-85006508024 (Scopus ID)
Conference
The 18th IEEE international conference on computational science and engineering (CSE2015)
Available from: 2016-12-13 Created: 2016-12-13 Last updated: 2019-09-06Bibliographically approved
Organisations

Search in DiVA

Show all publications