lnu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
The impact of deep learning on document classification using semantically rich representations
Norwegian University of Science and Technology, Norway.ORCID-id: 0000-0002-0199-2377
Norwegian University of Science and Technology, Norway.
Norwegian University of Science and Technology, Norway.
2019 (Engelska)Ingår i: Information Processing & Management, ISSN 0306-4573, E-ISSN 1873-5371, Vol. 56, nr 5, s. 1618-1632Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used. Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.

Ort, förlag, år, upplaga, sidor
Elsevier, 2019. Vol. 56, nr 5, s. 1618-1632
Nyckelord [en]
Document representation, Document classification, Deep learning, Ontology, Machine learning
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Data- och informationsvetenskap, Informatik
Identifikatorer
URN: urn:nbn:se:lnu:diva-88769DOI: 10.1016/j.ipm.2019.05.003ISI: 000474504100002Scopus ID: 2-s2.0-85065664667OAI: oai:DiVA.org:lnu-88769DiVA, id: diva2:1346435
Tillgänglig från: 2019-08-27 Skapad: 2019-08-27 Senast uppdaterad: 2024-09-03Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Kastrati, Zenun

Sök vidare i DiVA

Av författaren/redaktören
Kastrati, Zenun
I samma tidskrift
Information Processing & Management
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 145 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf