lnu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Text-Independent Speaker ID for Automatic Video Lecture Classification Using Deep Learning
Norwegian University of Science and Technology, Norway.
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM). (Computer Science)ORCID-id: 0000-0002-0199-2377
Norwegian University of Science and Technology (NTNU), Norway.
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM). (Computer Science)ORCID-id: 0000-0003-0512-6350
2019 (Engelska)Ingår i: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, April 19-22, 2019, Bali, Indonesia, ACM Publications, 2019, s. 175-180Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper proposes to use acoustic features employing deep neural network (DNN) and convolutional neural network (CNN) models for classifying video lectures in a massive open online course (MOOC). The models exploit the voice pattern of the lecturer for identification and for classifying the video lecture according to the right speaker category. Filter bank and Mel frequency cepstral coefficient (MFCC) feature along with first and second order derivatives (Δ/ΔΔ) are used as input features to the proposed models. These features are extracted from the speech signal which is obtained from the video lectures by separating the audio from the video using FFmpeg.

The deep learning models are evaluated using precision, recall, and F1 score and the obtained accuracy is compared for both acoustic features with traditional machine learning classifiers for speaker identification. A significant improvement of 3% to 7% classification accuracy is achieved over the DNN and twice to that of shallow machine learning classifiers for 2D-CNN with MFCC. The proposed 2D-CNN model with an F1 score of 85.71% for text-independent speaker identification makes it plausible to use speaker ID as a classification approach for organizing video lectures automatically in a MOOC setting.

Ort, förlag, år, upplaga, sidor
ACM Publications, 2019. s. 175-180
Nyckelord [en]
2D-CNN, DNN, MFCC Filter banks, MOOC, Speaker identification, deep learning, video classification
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-88114DOI: 10.1145/3330482.3330508ISI: 000698607100030Scopus ID: 2-s2.0-85071081958ISBN: 978-1-4503-6106-4 (tryckt)OAI: oai:DiVA.org:lnu-88114DiVA, id: diva2:1343663
Konferens
5th International Conference on Computing and Artificial Intelligence, April 19-22, 2019, Bali, Indonesia
Tillgänglig från: 2019-08-19 Skapad: 2019-08-19 Senast uppdaterad: 2022-11-03Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Kastrati, ZenunKurti, Arianit

Sök vidare i DiVA

Av författaren/redaktören
Kastrati, ZenunKurti, Arianit
Av organisationen
Institutionen för datavetenskap och medieteknik (DM)
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 193 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf