lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
May predictive uncertainty be improved by efficient use of experimental information for QSARs?: Weighting versus averaging in linear regression
Linnaeus University, Faculty of Health and Life Sciences, Department of Biology and Environmental Science.
2013 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Experimental values of physiochemical properties or activities used to build QSARs are subject to both uncertainty and variability, which more or less affect the uncertainty in QSAR predictions. Sources of variation come from having experiments done at different labs, and by using different practices. QSARs are mostly developed by using only one value for each compound even though there may be more than one experimental value for a given compound. Here it was investigated whether an inclusion of more information, i.e. of multiple point estimates, of endpoint values instead of averages may enhance predictive performance of a QSAR regression. Regressions built on averages for each compound were compared to weighted regressions built on all experimental values where each weight was assigned such that all compounds were given equal weights in total. Predictive performances were compared for QSAR data from models in Papa et al. (2009). For two of the four models the weighted model showed an improved predictivity as compared to the average model indicating that uncertainty in QSAR predictions might be improved by using weighting instead of averaging. In order to make general conclusions a simulation experiment was done on artificially generated QSAR data sets. The comparisons between modeling approaches were done on models judged as having good predictivity on average, which were those with R> 0.6 for the training data, and where at least one of the approaches succeeded reasonably well in assessing the predictive uncertainty. None of the two modeling approaches had always better predictive performance than the other, and the difference in predictive performance as judged by Kullback-Leibler divergences were often found within the “barely worth mentioning” zone. Weighted linear regression performed on average worse than the other and the performance got worse with increasing expected number of experimental values per compound (p-value less than 2*10-16). Neither the number of compounds per descriptor nor expected total variance influenced the relative performances of the models. The general conclusion is that there is no specific model type that is always in favor in terms of model predictivity, and which approach that is best depends on the specific data set. Therefore it could be worthwhile to consider both types when developing a QSAR by linear regression.

Place, publisher, year, edition, pages
2013. , 35 p.
Keyword [en]
QSAR, predictive uncertainty, linear regression
National Category
Environmental Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-25375OAI: oai:DiVA.org:lnu-25375DiVA: diva2:617111
Subject / course
Environmental Science
Educational program
Environmental Risk Analysis Master Programme, 60 credits
Uppsok
Life Earth Science
Supervisors
Examiners
Available from: 2013-04-23 Created: 2013-04-22 Last updated: 2013-04-23Bibliographically approved

Open Access in DiVA

No full text

By organisation
Department of Biology and Environmental Science
Environmental Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 44 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf