lnu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
The impact of deep learning on document classification using semantically rich representations
Norwegian University of Science and Technology, Norway.ORCID-id: 0000-0002-0199-2377
Norwegian University of Science and Technology, Norway.
Norwegian University of Science and Technology, Norway.
2019 (engelsk)Inngår i: Information Processing & Management, ISSN 0306-4573, E-ISSN 1873-5371, Vol. 56, nr 5, s. 1618-1632Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used. Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.

sted, utgiver, år, opplag, sider
Elsevier, 2019. Vol. 56, nr 5, s. 1618-1632
Emneord [en]
Document representation, Document classification, Deep learning, Ontology, Machine learning
HSV kategori
Forskningsprogram
Data- och informationsvetenskap, Informatik
Identifikatorer
URN: urn:nbn:se:lnu:diva-88769DOI: 10.1016/j.ipm.2019.05.003ISI: 000474504100002Scopus ID: 2-s2.0-85065664667OAI: oai:DiVA.org:lnu-88769DiVA, id: diva2:1346435
Tilgjengelig fra: 2019-08-27 Laget: 2019-08-27 Sist oppdatert: 2024-09-03bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Kastrati, Zenun

Søk i DiVA

Av forfatter/redaktør
Kastrati, Zenun
I samme tidsskrift
Information Processing & Management

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 138 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf