lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparing and combining two approaches to automated subject classification of text
Lunds universitet. (Library and Information Science)ORCID iD: 0000-0003-4169-4777
Lunds universitet.
2006 (English)In: Research and advanced technology for digital libraries / [ed] Julio Gonzalo, Constantino Thanos, M. Felisa Verdej and Rafael C. Carrasco, Springer, 2006, 467-470 p.Conference paper, Published paper (Refereed)
Abstract [en]

A machine-learning and a string-matching approach to automated subject classification of text were compared, as to their performance, advantages and downsides. The former approach was based on an SVM algorithm, while the latter comprised string-matching between a controlled vocabulary and words in the text to be classified. Data collection consisted of a subset from Compendex, classified into six different classes. It was shown that SVM on average outperforms the string-matching approach: our hypothesis that SVM yields better recall and string-matching better precision was confirmed only on one of the classes. The two approaches being complementary, we investigated different combinations of the two based on combining their vocabularies. The results have shown that the original approaches, i.e. machine-learning approach without using background knowledge from the controlled vocabulary, and string-matching approach based on controlled vocabulary, outperform approaches in which combinations of automatically and manually obtained terms were used. Reasons for these results need further investigation, including a larger data collection and combining the two using predictions.

Place, publisher, year, edition, pages
Springer, 2006. 467-470 p.
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 4172
National Category
Information Studies
Research subject
Humanities, Library and Information Science
Identifiers
URN: urn:nbn:se:lnu:diva-37066DOI: 10.1007/11863878_45Libris ID: 11430390ISBN: 978-3-540-44636-1 (print)ISBN: 978-3-540-44638-5 (print)OAI: oai:DiVA.org:lnu-37066DiVA: diva2:747740
Conference
10th European Conference, EDCL 2006, Alicante Spain, September 17-22, 2006
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2015-09-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Golub, Koraljka
Information Studies

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 64 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf