lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated classification of bibliographic data using SVM and Naive Bayes
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Classification of scientific bibliographic data is an important and increasingly more time-consuming task in a “publish or perish” paradigm where the number of scientific publications is steadily growing. Apart from being a resource-intensive endeavor, manual classification has also been shown to be often performed with a quite high degree of inconsistency. Since many bibliographic databases contain a large number of already classified records supervised machine learning for automated classification might be a solution for handling the increasing volumes of published scientific articles. In this study automated classification of bibliographic data, based on two different machine learning methods; Naive Bayes and Support Vector Machine (SVM), were evaluated. The data used in the study were collected from the Swedish research database SwePub and the features used for training the classifiers were based on abstracts and titles in the bibliographic records. The accuracy achieved ranged between a lowest score of 0.54 and a highest score of 0.84. The classifiers based on Support Vector Machine did consistently receive higher scores than the classifiers based on Naive Bayes. Classification performed at the second level in the hierarchical classification system used clearly resulted in lower scores than classification performed at the first level. Using abstracts as the basis for feature extraction yielded overall better results than using titles, the differences were however very small.

Place, publisher, year, edition, pages
2018. , p. 73
Keywords [en]
automated classification, machine learning, Naive Bayes, Support Vector Machine, SVM, bibliographic data, SwePub
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-75167OAI: oai:DiVA.org:lnu-75167DiVA, id: diva2:1214459
Subject / course
Computer Science
Supervisors
Examiners
Available from: 2018-06-07 Created: 2018-06-06 Last updated: 2018-06-07Bibliographically approved

Open Access in DiVA

fulltext(1902 kB)20 downloads
File information
File name FULLTEXT01.pdfFile size 1902 kBChecksum SHA-512
cd05472694d8eb20069722631e0811c06094c3ec76e02eb39458dbb7d82fdc1770de5301e59e6d18b37958fbaee00f3aa53faa8816c65da078327f93d683c69c
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nordström, Jesper
By organisation
Department of computer science and media technology (CM)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 20 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf