Open this publication in new window or tab >>2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
This Ph.D. project explores how to improve the effectiveness and efficiency ofknowledge discovery in databases (KDD) in medical research through the effectiveintegration of domain expertise. The aim was to develop and evaluate a KDDframework that improves the efficiency and accuracy of knowledge discoveryfrom real-world health data.Knowledge discovery from electronic health records (EHRs) is complex due todata inconsistencies. Collaborative feature engineering, termed knowledge-drivenfeature engineering (KDFE), is crucial. During KDFE, new variables, referred to asfeatures, are generated through collaboration between domain experts, computerscientists, and medical researchers.A case study, involving two medical projects, demonstrated the significantimpact of manual KDFE (mKDFE) on classification performance, measured bythe area under the receiver operating characteristic curve (AUROC). Compared to abaseline, mKDFE increased the average AUROC from 0.62 to 0.82 in Project 1 andfrom 0.61 to 0.89 in Project 2 (p < 0.001).To optimise KDD, an automated KDFE (aKDFE) framework was developed.This framework supports automated feature engineering, constructing informativefeatures from EHR data. The framework effectively collects and aggregatesdomain knowledge to generate features that are more informative than thosedirectly recorded in EHRs or manually engineered (mKDFE), as is common inmany medical research projects today. aKDFE outperforms mKDFE by automatingmanual processes and enhancing predictive power.Clinical decision support systems (CDSSs), like Janusmed Riskprofile, containvaluable domain knowledge in the form of risk scores. Studies were conducted toexplore CDSS risk scores and their impact on aKDFE effectiveness. These findingshighlight the potential of aKDFE to streamline medical research by leveragingboth automated feature engineering and expert knowledge.aKDFE offers several advantages over mKDFE: (i) increased efficiency throughautomated knowledge discovery and feature engineering (FE) processes; (ii)enhanced effectiveness due to superior predictive power; and (iii) explicit andtransparent operation sequences for data pivoting and feature generation fromEHR features.The long-term objective is to equip medical researchers with augmentedcomputational expertise, minimising dependence on data scientists. Futureimprovements may include: (i) assessing advanced event-based models; (ii)leveraging large language models (LLMs) to capture and structure domainknowledge; and (iii) exploring multi-agent knowledge discovery.
Place, publisher, year, edition, pages
Växjö: Linnaeus University Press, 2025. p. 100
Series
Linnaeus University Dissertations ; 573
Keywords
Feature Engineering (FE), Medical Registry Research, Knowledge Discovery in Databases (KDD), Electronic Health Record (EHR), Healthcare domain Knowledge, Iterative feature Engineering, Clinical Decision Support System (CDSS)
National Category
Information Systems
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-138540 (URN)10.15626/LUD.573.2025 (DOI)978-91-8082-305-0 (ISBN)978-91-8082-304-3 (ISBN)
Public defence
2025-06-09, Azur, Våningsplan 2, hus Vita, Kalmar, 13:00 (English)
Opponent
Supervisors
2025-05-202025-05-162025-05-20Bibliographically approved