lnu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Text-Independent Speaker ID for Automatic Video Lecture Classification Using Deep Learning
Norwegian University of Science and Technology, Norway.
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM). (Computer Science)ORCID-id: 0000-0002-0199-2377
Norwegian University of Science and Technology (NTNU), Norway.
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM). (Computer Science)ORCID-id: 0000-0003-0512-6350
2019 (engelsk)Inngår i: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, April 19-22, 2019, Bali, Indonesia, ACM Publications, 2019, s. 175-180Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper proposes to use acoustic features employing deep neural network (DNN) and convolutional neural network (CNN) models for classifying video lectures in a massive open online course (MOOC). The models exploit the voice pattern of the lecturer for identification and for classifying the video lecture according to the right speaker category. Filter bank and Mel frequency cepstral coefficient (MFCC) feature along with first and second order derivatives (Δ/ΔΔ) are used as input features to the proposed models. These features are extracted from the speech signal which is obtained from the video lectures by separating the audio from the video using FFmpeg.

The deep learning models are evaluated using precision, recall, and F1 score and the obtained accuracy is compared for both acoustic features with traditional machine learning classifiers for speaker identification. A significant improvement of 3% to 7% classification accuracy is achieved over the DNN and twice to that of shallow machine learning classifiers for 2D-CNN with MFCC. The proposed 2D-CNN model with an F1 score of 85.71% for text-independent speaker identification makes it plausible to use speaker ID as a classification approach for organizing video lectures automatically in a MOOC setting.

sted, utgiver, år, opplag, sider
ACM Publications, 2019. s. 175-180
Emneord [en]
2D-CNN, DNN, MFCC Filter banks, MOOC, Speaker identification, deep learning, video classification
HSV kategori
Forskningsprogram
Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-88114DOI: 10.1145/3330482.3330508ISI: 000698607100030Scopus ID: 2-s2.0-85071081958ISBN: 978-1-4503-6106-4 (tryckt)OAI: oai:DiVA.org:lnu-88114DiVA, id: diva2:1343663
Konferanse
5th International Conference on Computing and Artificial Intelligence, April 19-22, 2019, Bali, Indonesia
Tilgjengelig fra: 2019-08-19 Laget: 2019-08-19 Sist oppdatert: 2022-11-03bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Kastrati, ZenunKurti, Arianit

Søk i DiVA

Av forfatter/redaktør
Kastrati, ZenunKurti, Arianit
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 193 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf