lnu.sePublications
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The impact of deep learning on document classification using semantically rich representations
Norwegian University of Science and Technology, Norway.ORCID iD: 0000-0002-0199-2377
Norwegian University of Science and Technology, Norway.
Norwegian University of Science and Technology, Norway.
2019 (English)In: Information Processing & Management, ISSN 0306-4573, E-ISSN 1873-5371, Vol. 56, no 5, p. 1618-1632Article in journal (Refereed) Published
Abstract [en]

This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used. Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.

Place, publisher, year, edition, pages
Elsevier, 2019. Vol. 56, no 5, p. 1618-1632
Keywords [en]
Document representation, Document classification, Deep learning, Ontology, Machine learning
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Information Systems
Identifiers
URN: urn:nbn:se:lnu:diva-88769DOI: 10.1016/j.ipm.2019.05.003ISI: 000474504100002Scopus ID: 2-s2.0-85065664667OAI: oai:DiVA.org:lnu-88769DiVA, id: diva2:1346435
Available from: 2019-08-27 Created: 2019-08-27 Last updated: 2024-09-03Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kastrati, Zenun

Search in DiVA

By author/editor
Kastrati, Zenun
In the same journal
Information Processing & Management
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 135 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf