lnu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Automatic subject classification of Swedish DDC: Impact of tuning and training data set
Linnéuniversitetet, Fakulteten för konst och humaniora (FKH), Institutionen för kulturvetenskaper (KV). (Library and Information Science;DISA-DH)ORCID-id: 0000-0003-4169-4777
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM).ORCID-id: 0000-0002-8591-1035
2019 (engelsk)Inngår i: 19th European NKOS Workshop, 23rd TPDL: Oslo, 12 September 2019, Networked Knowledge Organization Systems/Services/Structures, NKOS , 2019Konferansepaper, Oral presentation with published abstract (Fagfellevurdert)
Abstract [en]

The presentation builds on the NKOS 2018 presentation of automatically produced Dewey Decimal Classification (DDC) classes for Swedish union catalogue (LIBRIS). Based on a dataset of 143,838 records, Support Vector Machine with linear kernel outperforms Multinomial Naïve Bayes algorithm. Impact of features shows that using keywords or combining titles and keywords gives better results than using only titles as input. Stemming only marginally improves the results. Removed stop-words reduced accuracy in most cases, while removing less frequent words increased it marginally. Word embeddings combined with different types of neural networks (Simple linear network, Standard neural network, 1D convolutional neural network, Recurrent neural network) produced worse results than Naïve Bayes /Support Vector Machine, but reach close results. The greatest impact is produced by the number of training examples: 81.37% accuracy on the training set is achieved when at least 1,000 records per class are available, and 66.13% when few records on which to train are available.

sted, utgiver, år, opplag, sider
Networked Knowledge Organization Systems/Services/Structures, NKOS , 2019.
HSV kategori
Forskningsprogram
Humaniora, Biblioteks- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-89737OAI: oai:DiVA.org:lnu-89737DiVA, id: diva2:1362386
Konferanse
19th European NKOS Workshop, 23rd TPDL. Oslo, 12 September 2019
Tilgjengelig fra: 2019-10-18 Laget: 2019-10-18 Sist oppdatert: 2020-01-08bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

AbstractPresentation

Personposter BETA

Golub, KoraljkaHagelbäck, Johan

Søk i DiVA

Av forfatter/redaktør
Golub, KoraljkaHagelbäck, Johan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 5 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf