lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic subject classification of Swedish DDC: Impact of tuning and training data set
Linnaeus University, Faculty of Arts and Humanities, Department of Cultural Sciences. (Library and Information Science;DISA-DH)ORCID iD: 0000-0003-4169-4777
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).ORCID iD: 0000-0002-8591-1035
2019 (English)In: 19th European NKOS Workshop, 23rd TPDL: Oslo, 12 September 2019, Networked Knowledge Organization Systems/Services/Structures, NKOS , 2019Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

The presentation builds on the NKOS 2018 presentation of automatically produced Dewey Decimal Classification (DDC) classes for Swedish union catalogue (LIBRIS). Based on a dataset of 143,838 records, Support Vector Machine with linear kernel outperforms Multinomial Naïve Bayes algorithm. Impact of features shows that using keywords or combining titles and keywords gives better results than using only titles as input. Stemming only marginally improves the results. Removed stop-words reduced accuracy in most cases, while removing less frequent words increased it marginally. Word embeddings combined with different types of neural networks (Simple linear network, Standard neural network, 1D convolutional neural network, Recurrent neural network) produced worse results than Naïve Bayes /Support Vector Machine, but reach close results. The greatest impact is produced by the number of training examples: 81.37% accuracy on the training set is achieved when at least 1,000 records per class are available, and 66.13% when few records on which to train are available.

Place, publisher, year, edition, pages
Networked Knowledge Organization Systems/Services/Structures, NKOS , 2019.
National Category
Information Studies
Research subject
Humanities, Library and Information Science
Identifiers
URN: urn:nbn:se:lnu:diva-89737OAI: oai:DiVA.org:lnu-89737DiVA, id: diva2:1362386
Conference
19th European NKOS Workshop, 23rd TPDL. Oslo, 12 September 2019
Available from: 2019-10-18 Created: 2019-10-18 Last updated: 2020-01-08Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

AbstractPresentation

Authority records

Golub, KoraljkaHagelbäck, Johan

Search in DiVA

By author/editor
Golub, KoraljkaHagelbäck, Johan
By organisation
Department of Cultural SciencesDepartment of computer science and media technology (CM)
Information Studies

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 99 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf