lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning and Dynamical Systems
Linnaeus University, Faculty of Technology, Department of Mathematics.ORCID iD: 0000-0003-2756-5186
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis concerns reinforcement learning and dynamical systems in finite discrete problem domains. Artificial intelligence studies through reinforcement learning involves developing models and algorithms for scenarios when there is an agent that is interacting with an environment. By taking actions the agent may induce changes in the observed environment, where a modeled reward system reinforces correct behavior through learning algorithms. Reinforcement learning can be used in a wide variety of different  domains, such as finance, robotics, games, routing and health care. However as the subject matures there is an increasing need to more heavily rely on advanced concepts in mathematics and deep learning to further our understanding of existing problems and find new algorithmic insights. Discrete dynamical systems arise in virtually any setting as soon as there is a set of elements subjected to iteration by a defining function. The function may be seen to represent the passing of time or to define the rules for state transitions. If the set of elements is finite but very large then we may find applications in several different fields such as operations research, cryptography and biology, where understanding properties of the structure and long-term behavior without explicit enumeration is key. 

In Paper I we extend the model of categorical reinforcement learning with a group-aided training procedure involving multiple agents. By having the agents learn through shared distributional information but act independently we argue for an accelerated learning process. We empirically show that the procedure may lead to much more robust learning, stronger individual agent performance and good ensemble efficiency.

In Paper II we continue to build upon distributional reinforcement learning for finite Markov processes. A common approach among

algorithms is to apply transformations on agent returns for stability and flexibility over a variety of different tasks. We show that one of the most successful methods may not work for a stochastic process. As a solution we introduce a new distributional operator that handles a large class of transformations with guaranteed theoretical convergence. We also propose an approximating single-actor algorithm based on these novel insights, which when tested achieves state-of-the-art performance compared to similar algorithms.

In Paper III we focus on the issue of efficient exploration in reinforcement learning by studying the regret that a learning algorithm might have versus an optimal policy. Specifically, in the paper we derive a Bayesian regret bound for Thompson sampling on linear multi-armed bandits with Gaussian reward models when we have environment uncertainty described by a set of multivariate normal-gamma distributions.

In Paper IV we derive sharp bounds on the number of iterations needed for any linear finite dynamical system to stabilize on an inner set of cycles. These bounds may be used in cycle analysis and criterion tests to understand the long-term behavior of extremely large systems.

Place, publisher, year, edition, pages
Växjö: Linnaeus University Press, 2023. , p. 38
Series
Linnaeus University Dissertations ; 494
Keywords [en]
artificial intelligence, distributional reinforcement learning, Markov decision processes, Bellman operators, deep learning, multi-armed bandits, Bayesian bandits, conjugate priors, Thompson sampling, linear finite dynamical systems, cycle orbits, fixed-point systems
National Category
Mathematics
Research subject
Natural Science, Mathematics; Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-120715DOI: 10.15626/LUD.494.2023ISBN: 9789180820301 (print)ISBN: 9789180820318 (electronic)OAI: oai:DiVA.org:lnu-120715DiVA, id: diva2:1756782
Public defence
2023-06-09, Weber, hus K, Linnaeus University, Växjö, 14:30 (English)
Opponent
Supervisors
Available from: 2023-05-16 Created: 2023-05-15 Last updated: 2023-05-16Bibliographically approved
List of papers
1. Distributional reinforcement learning with ensembles
Open this publication in new window or tab >>Distributional reinforcement learning with ensembles
2020 (English)In: Algorithms, E-ISSN 1999-4893, Vol. 13, no 5, p. 1-13, article id 118Article in journal (Refereed) Published
Abstract [en]

It is well known that ensemble methods often provide enhanced performance in reinforcement learning. In this paper, we explore this concept further by using group-aided training within the distributional reinforcement learning paradigm. Specifically, we propose an extension to categorical reinforcement learning, where distributional learning targets are implicitly based on the total information gathered by an ensemble. We empirically show that this may lead to much more robust initial learning, a stronger individual performance level, and good efficiency on a per-sample basis.

Place, publisher, year, edition, pages
MDPI, 2020
Keywords
distributional reinforcement learning, multiagent learning, ensembles, categorical reinforcement learning
National Category
Mathematics
Research subject
Natural Science, Mathematics
Identifiers
urn:nbn:se:lnu:diva-94230 (URN)10.3390/a13050118 (DOI)000568115000016 ()2-s2.0-85085486191 (Scopus ID)
Available from: 2020-05-08 Created: 2020-05-08 Last updated: 2023-05-15Bibliographically approved
2. Conjugated Discrete Distributions for Distributional Reinforcement Learning
Open this publication in new window or tab >>Conjugated Discrete Distributions for Distributional Reinforcement Learning
2022 (English)In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the advancement of artificial intelligence , 2022, Vol. 7, p. 7516-7524Conference paper, Published paper (Refereed)
Abstract [en]

In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cramér distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.

Place, publisher, year, edition, pages
Association for the advancement of artificial intelligence, 2022
Series
Proceedings of the AAAI Conference on Artificial Intelligence ; 36
National Category
Probability Theory and Statistics
Research subject
Mathematics, Applied Mathematics
Identifiers
urn:nbn:se:lnu:diva-115774 (URN)10.1609/aaai.v36i7.20716 (DOI)2-s2.0-85140362116 (Scopus ID)9781577358763 (ISBN)
Conference
The Thirty-Sixth AAAI Conference on Artificial Intelligence was held virtually from February 22-March 1, 2022
Available from: 2022-08-16 Created: 2022-08-16 Last updated: 2023-06-27Bibliographically approved
3. Stabilization bounds for linear finite dynamical systems
Open this publication in new window or tab >>Stabilization bounds for linear finite dynamical systems
2018 (English)In: Journal of Algebra, ISSN 0021-8693, E-ISSN 1090-266X, Vol. 511, p. 516-534Article in journal (Refereed) Published
Abstract [en]

A common problem to all applications of linear finite dynamical systems is analyzing the dynamics without enumerating every possible state transition. Of particular interest is the long term dynamical behaviour. In this paper, we study the number of iterations needed for a system to settle on a fixed set of elements. As our main result, we present two upper bounds on iterations needed, and each one may be readily applied to a fixed point system test or a computation of limit cycles. The bounds are based on submodule properties of iterated images and reduced systems modulo a prime. We also provide examples where our bounds are optimal. (C) 2018 Elsevier Inc. All rights reserved.

Place, publisher, year, edition, pages
Elsevier, 2018
Keywords
Finite rings, Linear finite dynamical systems, Fixed point systems, Height
National Category
Mathematics
Research subject
Natural Science, Mathematics
Identifiers
urn:nbn:se:lnu:diva-77488 (URN)10.1016/j.jalgebra.2018.04.009 (DOI)000441495100028 ()2-s2.0-85045314644 (Scopus ID)
Available from: 2018-08-31 Created: 2018-08-31 Last updated: 2023-05-15Bibliographically approved
4. Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors
Open this publication in new window or tab >>Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We consider Thompson sampling for linear bandit problems with finitely many independent arms, where rewards are sampled from normal distributions that are linearly dependent on unknown parameter vectors and with unknown variance. Specifically, with a Bayesian formulation we consider multivariate normal-gamma priors to represent environment uncertainty for all involved parameters. We show that our chosen sampling prior is a conjugate prior to the reward model and derive a Bayesian regret bound for Thompson sampling under the condition that the 5/2-moment of the variance distribution exist.

Keywords
multi-armed bandits, linear bandits, Thompson sampling
National Category
Computer Sciences Probability Theory and Statistics
Research subject
Mathematics, Applied Mathematics; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-120710 (URN)10.48550/arXiv.2303.03348 (DOI)
Projects
Algoritmer för förstärkningsinlärning
Available from: 2023-05-15 Created: 2023-05-15 Last updated: 2023-05-26Bibliographically approved

Open Access in DiVA

Comprehensive summary(2358 kB)137 downloads
File information
File name FULLTEXT01.pdfFile size 2358 kBChecksum SHA-512
154bee0a56dd11801383ee098688f55bc5093e59ed350d2263d181d146c270587d34505b2aedf2800bdfd16d0fdcdebcb5356ed4a3a7ae96f007b5fd60494c93
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Lindenberg, Björn

Search in DiVA

By author/editor
Lindenberg, Björn
By organisation
Department of Mathematics
Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 137 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1037 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf