lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The role of different thesauri terms in automated subject classification of text
Lunds universitet.ORCID iD: 0000-0003-4169-4777
2006 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from general engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows performance improvement, whereas the stop-word list does not have a significant impact.

Place, publisher, year, edition, pages
IEEE Press, 2006. 961-965 p.
National Category
Information Studies
Research subject
Humanities, Library and Information Science
Identifiers
URN: urn:nbn:se:lnu:diva-37064DOI: 10.1109/WI.2006.169ISBN: 0-7695-2747-7 (print)OAI: oai:DiVA.org:lnu-37064DiVA: diva2:747735
Conference
IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, December 18-22, 2006
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2015-09-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Golub, Koraljka
Information Studies

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 46 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf