lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Text-Independent Speaker ID for Automatic Video Lecture Classification Using Deep Learning
Norwegian University of Science and Technology, Norway.
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (Computer Science)ORCID iD: 0000-0002-0199-2377
Norwegian University of Science and Technology (NTNU), Norway.
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (Computer Science)ORCID iD: 0000-0003-0512-6350
2019 (English)In: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, April 19-22, 2019, Bali, Indonesia, ACM Publications, 2019, p. 175-180Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes to use acoustic features employing deep neural network (DNN) and convolutional neural network (CNN) models for classifying video lectures in a massive open online course (MOOC). The models exploit the voice pattern of the lecturer for identification and for classifying the video lecture according to the right speaker category. Filter bank and Mel frequency cepstral coefficient (MFCC) feature along with first and second order derivatives (Δ/ΔΔ) are used as input features to the proposed models. These features are extracted from the speech signal which is obtained from the video lectures by separating the audio from the video using FFmpeg.

The deep learning models are evaluated using precision, recall, and F1 score and the obtained accuracy is compared for both acoustic features with traditional machine learning classifiers for speaker identification. A significant improvement of 3% to 7% classification accuracy is achieved over the DNN and twice to that of shallow machine learning classifiers for 2D-CNN with MFCC. The proposed 2D-CNN model with an F1 score of 85.71% for text-independent speaker identification makes it plausible to use speaker ID as a classification approach for organizing video lectures automatically in a MOOC setting.

Place, publisher, year, edition, pages
ACM Publications, 2019. p. 175-180
Keywords [en]
2D-CNN, DNN, MFCC Filter banks, MOOC, Speaker identification, deep learning, video classification
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-88114DOI: 10.1145/3330482.3330508ISBN: 978-1-4503-6106-4 (print)OAI: oai:DiVA.org:lnu-88114DiVA, id: diva2:1343663
Conference
5th International Conference on Computing and Artificial Intelligence, April 19-22, 2019, Bali, Indonesia
Available from: 2019-08-19 Created: 2019-08-19 Last updated: 2019-09-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Kastrati, ZenunKurti, Arianit

Search in DiVA

By author/editor
Kastrati, ZenunKurti, Arianit
By organisation
Department of computer science and media technology (CM)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 16 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf