lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Importance of HTML structural elements and metadata in automated subject classification
Lunds universitet.ORCID iD: 0000-0003-4169-4777
Lunds universitet.
2005 (English)In: Research and advanced technology for digital libraries / [ed] Andreas Rauber, Stavros Christodoulakis, A Min Tjoa, Springer, 2005, 368-378 p.Conference paper, Published paper (Refereed)
Abstract [en]

The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline.

Place, publisher, year, edition, pages
Springer, 2005. 368-378 p.
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 3652
National Category
Information Studies
Research subject
Humanities, Library and Information Science
Identifiers
URN: urn:nbn:se:lnu:diva-37071DOI: 10.1007/11551362_33ISBN: 978-3-540-28767-4 (print)ISBN: 978-3-540-31931-3 (print)OAI: oai:DiVA.org:lnu-37071DiVA: diva2:747760
Conference
Research and Advanced Technology for Digital Libraries, Proceedings of ECDL 2005 – the 9th European Conference on Research and Advanced Technology for Digital Libraries, Vienna, Austria, September 18-23, 2005
Available from: 2014-09-17 Created: 2014-09-17 Last updated: 2015-09-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Golub, Koraljka
Information Studies

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 60 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf