- Boxes
- definitions
- Ellipses
- theorems and lemmas
- Blue border
- the statement of this result is ready to be formalized; all prerequisites are done
- Orange border
- the statement of this result is not ready to be formalized; the blueprint needs more work
- Blue background
- the proof of this result is ready to be formalized; all prerequisites are done
- Green border
- the statement of this result is formalized
- Green background
- the proof of this result is formalized
- Dark green background
- the proof of this result and all its ancestors are formalized
- Dark green border
- this is in Mathlib
A finite set of natural numbers is defined as \(\mathbb {N}_n := \left\{ 0, \dots , n-1 \right\} \).
A list \(\mathbb {L}_n \colon \mathbb {N}_n \to \tau \) of type \(\tau \) and length \(n\) is a function defined for \(\mathbb {N}_{n} := \left\{ 0, \dots , n-1 \right\} \) of type \(\tau \).
The preimage \(f^{-1}\colon \mathcal{Y} \to \mathcal{X}\) of a function \(f\colon \mathcal{X} \to \mathcal{Y}\) is defined as
A finite probability distribution \(\Delta (n)\) for \(n\in \mathbb {N}\) is defined as
We call a probability distribution degenerate if \(p_i = 1\) for some \(i\); otherwise the probability distribution is supported.
The measure \(\mathbb {P}[\mathcal{S}]\) of the set \(\mathcal{S} \subseteq \mathbb {N}\) given a probability space is defined as
The definition requires that the set is decidable.
A random variable is a measurable function \(\tilde{x}\colon \mathbb {N}\to \mathcal{Y}\) where \(\mathcal{Y}\) is a measurable set. Note that the way we define random variables, they are independent of a particular probability space. The measurability requirement is vacuous in probability spaces.
For any complete probability space, \((\Omega , \mathcal{F}, P)\) and a random variable \(\tilde{b}\colon \Omega \to \mathbb {B}\), the operator \(\mathbb {P}\) is defined as
where the boolean is interpreted as \(0\) for false and \(1\) for true.
For any probability space, \((\Omega , \mathcal{F}, P)\) and a random variable \(\tilde{b}\colon \Omega \to \mathbb {B}\):
The cumulative distribution function \(F_{\tilde{x}}\colon \mathbb {R}\to [0,1]\) of a real-valued random variable \(\tilde{x}\colon \Omega \to \mathbb {R}\) is defined as
The probability mass function \(p_{\tilde{x}}\colon \mathcal{E} \to [0,1]\) for a discrete random variable \(\tilde{x}\colon \Omega \to \mathcal{E}\) with a finite \(\mathcal{E}\) is defined as
Events \(A,B \in \mathcal{F}\) are independent when
Random variables \(\tilde{x}\colon \Omega \to \mathcal{X}\) and \(\tilde{y}\colon \Omega \to \mathcal{Y}\) are independent when
Assume \(\tilde{x},\tilde{y}\colon \Omega \to \mathbb {R}\) and \(c\in \mathbb {R}\). Then:
A random variable defined on a finite probability space \(P\) is a mapping \(\tilde{x}\colon \Omega \to \mathbb {R}\).
A boolean set is \(\mathbb {B} = \left\{ \operatorname {false}, \operatorname {true}\right\} \).
We use \(\tilde{s}_k\colon \mathcal{H} \to \mathcal{S}\) to denote the 0-based \(k\)-th state of each history.
We use \(\tilde{a}_k\colon \mathcal{H} \to \mathcal{A}\) to denote the 0-based \(k\)-th action of each history.
The optimal history-dependent value function \(\hat{u}_t^{\star }\colon \mathcal{H} \to \mathbb {R}\) is defined for a horizon \(t\in \mathbb {N}\) as
Suppose that \(t\in \mathbb {N}\). Then:
Assume a horizon \(T\in \mathbb {N}\). Then:
The set of decision rules \(\mathcal{D}\) is defined as \(\mathcal{D} := \mathcal{A}^{\mathcal{S}}. \) A single action \(a \in \mathcal{A}\) can also be interpreted as a decision rule \(\mathcal{d} := s \mapsto a\).
The history-dependent optimal Bellman operator \(L_{\mathrm{h}}^{\star }\colon \mathcal{U} \to \mathcal{U}\) is defined as
where the value function \(\tilde{u}\) is interpreted as a random variable on defined on the sample space \(\Omega = \mathcal{H}\).
The history-dependent policy Bellman operator \(L_{\mathrm{h}}^{\pi } \colon \mathcal{U} \to \mathcal{U}\) is defined for each \(\pi \in \Pi _{\mathrm{HR}}\) as
where the value function \(\tilde{u}\) is interpreted as a random variable on defined on the sample space \(\Omega = \mathcal{H}\).
A Markov Bellman operator \(L^{\star }\colon \mathcal{V} \to \mathcal{V}\) is defined as
A Markov policy Bellman operator \(L_k^{\pi }\colon \mathcal{V} \to \mathcal{V}\) for \(\pi \in \Pi \) is defined as
The expectation of a random variable \(\tilde{x} \colon \Omega \to \mathbb {R}\) is
The conditional expectation of \(\tilde{x} \colon \Omega \to \mathbb {R}\) conditioned on \(\tilde{b} \colon \Omega \to \mathbb {B}\) is defined as
where we define that \(x / 0 = 0\) for each \(x\in \mathbb {R}\).
The random conditional expectation of a random variable \(\tilde{x} \colon \Omega \to \mathbb {R}\) conditioned on \(\tilde{y} \colon \Omega \to \mathcal{Y}\) for a finite set \(\mathcal{Y}\) is the random variable \(\mathbb {E}\left[ \tilde{x} \mid \tilde{y} \right]\colon \Omega \to \mathbb {R}\) is defined as
The history-dependent expectation is defined for each \(t\in \mathbb {N}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(\hat{h}\in \mathcal{H}\) and a \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\) as
In the \(\mathbb {E}\) operator above, the random variable \(\tilde{x}\) lives in a probability space \((\Omega , p)\) where \(\Omega = \mathcal{H}(\hat{h}, t)\) and \(p(h) = p^{\mathrm{h}}(h, \pi ), \forall h\in \Omega \). Moreover, if \(\hat{h}\) is a state, then it is interpreted as a history with the single initial state.
The history-dependent expectation is defined for each \(t\in \mathbb {N}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(\hat{h}\in \mathcal{H}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\), \(\tilde{b}\colon \mathcal{H} \to \mathcal{\mathbb {B}}\) as
In the \(\mathbb {E}\) operator above, the random variables \(\tilde{x}\) and \(\tilde{b}\) live in a probability space \((\Omega , p)\) where \(\Omega = \mathcal{H}(\hat{h}, t)\) and \(p(h) = p^{\mathrm{h}}(h, \pi ), \forall h\in \Omega \). Moreover, if \(\hat{h}\) is a state, then it is interpreted as a history with the single initial state.
The history-dependent expectation is defined for each \(t\in \mathbb {N}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(\hat{h}\in \mathcal{H}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\), \(\tilde{y}\colon \mathcal{H} \to \mathcal{\mathcal{V}}\) as
In the \(\mathbb {E}\) operator above, the random variables \(\tilde{x}\) and \(\tilde{h}\) live in a probability space \((\Omega , p)\) where \(\Omega = \mathcal{H}(\hat{h}, t)\) and \(p(h) = p^{\mathrm{h}}(h, \pi ), \forall h\in \Omega \). Moreover, if \(\hat{h}\) is a state, then it is interpreted as a history with the single initial state.
Suppose that \(\tilde{x}\colon \Omega \to \mathbb {R}\) is a random variable and that \(x(\Omega ) := \left\{ x(\omega ) \mid \omega \in \Omega \right\} \) is finite. Then
Suppose that \(\tilde{x}\colon \Omega \to \mathbb {R}\) is a random variable and that \(x(\Omega ) := \left\{ x(\omega ) \mid \omega \in \Omega \right\} \) is finite. Then
A history \(h\) in a set of histories \(\mathcal{H}\) is a sequence of states and actions defined for \(M\) recursively as
where \(s_0, s \in \operatorname {Fin}(S)\), \(a \in \operatorname {Fin}(A)\), and \(h'\in \mathcal{H}\).
The history probability distribution \(p^{\mathrm{h}}_T \colon \Pi _{\mathrm{HR}}\to \Delta (\mathcal{H}(h,t))\) and \(\pi \in \Pi _{\mathrm{HR}}\) is defined for each \(T \in \mathbb {N}\) and \(h\in \mathcal{H}(\hat{h},t)\) as
Moreover, the function \(p^{\mathrm{h}}\) maps policies to correct probability distribution.
The length \(|h|\in \mathbb {N}\) of a history \(h\in \mathcal{H}\) is defined as
The set \(\mathcal{H}_{\mathrm{NE}}\) of non-empty histories is
Following histories \(\mathcal{H}(h,t) \subseteq \mathcal{H}\) for \(h\in \mathcal{H}\) of length \(t\in \mathbb {N}\) are defined recursively as
The set of histories \(\mathcal{H}_{t}\) of length \(t \in \mathbb {N}\) is defined recursively as
An indicator function \(\mathbb {I}\colon \mathbb {B} \to \left\{ 0, 1 \right\} \) is defined for \(b\in \mathbb {B}\) as
The joint probability of two random variables \(\tilde{x}\colon \Omega \to \mathcal{X}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) is defined as
One can also define a joint PMF for two discrete random variables \(p_{\tilde{x}, \tilde{y}}(x,y)\) for \(x\in \mathcal{X}\) and \(y\in \mathcal{Y}\) as
A Markov decision process \(M := (S, A, P, r)\) consists of:
a positive integer \(S \in \mathbb {N}_{{\gt}0}\) representing the number of states, with index set \(\operatorname {Fin}(S) = \{ 0,1, \dots , S-1\} \)
a positive integer \(A \in \mathbb {N}_{{\gt}0}\) representing the number of actions, with index set \(\operatorname {Fin}(A) = \{ 0,1, \dots , A-1\} \)
a transition function \(P \colon \operatorname {Fin}(S) \times \operatorname {Fin}(A) \to \Delta (\operatorname {Fin}(S))\), mapping each state–action pair to finite probability distribution over next states
a reward function \(r \colon \operatorname {Fin}(S) \times \operatorname {Fin}(A) \times \operatorname {Fin}(S) \to \mathbb {R}\), mapping each transition \((s,a,s')\) to a real number
A finite horizon objective definition is given by \(O := (s_0, T)\) where \(s_0\in \mathcal{S}\) is the initial state and \(T\in \mathbb {N}\) is the horizon.
A policy \(\pi ^{\star }\in \Pi _{\mathrm{HR}}\) is return optimal for an objective \(O\) if
For each \(t\in \mathbb {N}\), a policy \(\pi ^{\star }\in \Pi _{\mathrm{HR}}\) is optimal if
A policy \(\pi ^{\star }\in \Pi _{\mathrm{HR}}\) optimal in definition 0.3.1.7 is also optimal in definition 0.3.1.7 for any initial state \(s_0\) and horizon \(T\).
The optimal finite-horizon policy \(\pi ^{\star }_t, t\in \mathbb {N}\) is defined as
where \(a_0\) is an arbitrary action.
The set of history-dependent policies is \(\Pi _{\mathrm{HR}}:= \Delta (\mathcal{A})^{\mathcal{H}}. \)
The set of Markov deterministic policies \(\Pi _{\mathrm{MD}}\) is \(\Pi _{\mathrm{MD}}:= \mathcal{D}^{\mathbb {N}}. \) A Markov deterministic policy \(\pi \in \Pi _{\mathrm{MD}}\) can also be interpreted as \(\bar{\pi } \in \Pi _{\mathrm{HR}}\):
where \(\delta \) is the Dirac distribution, and \(s_{|h|}\) is the history’s last state.
The set of stationary deterministic policies \(\Pi _{\mathrm{SD}}\) is defined as \(\Pi _{\mathrm{SD}} := \mathcal{D}. \) A stationary policy \(\pi \in \Pi _{\mathrm{SD}}\) can be interpreted as \(\bar{\pi } \in \Pi _{\mathrm{HR}}\):
where \(\delta \) is the Dirac distribution and \(s_{|h|}\) is the history’s last state.
A function \(f\colon \mathcal{X} \to \mathcal{Y}\) with \(\sigma \)-algebras \(\mathcal{D} \subseteq 2^{\mathcal{X}}\) and \(\mathcal{E} \subseteq 2^{\mathcal{Y}}\) is measurable when the pre-image of each \(E\in \mathcal{E}\) is in \(\mathcal{D}\):
Here, the application to \(f^{-1}\) to a set \(E \subseteq \mathcal{Y}\) is defined as:
A finite probability measure \(p\colon \Omega \to \mathbb {R}_+\) on a finite set \(\Omega \) is any function that satisfies
A complete probability space consists of:
A sample space \(\Omega := \mathbb {N}_n\), which is the set of all possible outcomes.
An implicit event space \(\mathcal{F} := 2^{\Omega }\), which we do not define explicitly in Lean.
A probability function \(\mathbb {P} \in \Delta _N\) that satisfies that
\(\mathbb {P}(\Omega ) = 1\)
\(\mathbb {P}(A_1 \cup A_2) = \mathbb {P}(A_1) + \mathbb {P}(A_2)\) for all \(A_1, A_2\in \mathcal{F}\) such that \(A_1 \cap A_2 = \emptyset \).
The conditional probability of \(\tilde{b}\colon \Omega \to \mathbb {B}\) on \(\tilde{c}\colon \Omega \to \mathbb {B}\) is defined as
The history-reward random variable \(\tilde{r}^{\mathrm{h}}\colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined recursively as
The history-dependent DP value function \(u_t^{\star }\in \mathcal{U}\) for \(t\in \mathbb {N}\) is defined as
The history-dependent DP value function \(u_t^{\pi } \in \mathcal{U}\) for a policy \(\pi \in \Pi _{\mathrm{HR}}\) and \(t\in \mathbb {N}\) is defined as
A history-dependent policy value function \(\hat{u}^{\pi }_t\colon \mathcal{H} \to \mathbb {R}\) for each \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), and \(t\in \mathbb {N}\) is defined as
The optimal value function \(v^{\star }_t\in \mathcal{V}, t\in \mathbb {N}\) is defined as
The Markov policy value function \(v^{\pi }_t\in \mathcal{V}_{\mathrm{M}}, t \in \mathbb {N}\) for \(\pi \in \Pi _{\mathrm{MD}}\) is defined as
The set of independent value functions is defined as \(\mathcal{V} := \mathbb {R}^{\mathcal{S}}\).
The set of history-dependent value functions \(\mathcal{U}\) is defined as
The set of independent value functions is defined as \(\mathcal{V}_{\mathrm{M}} := \mathbb {R}^{\mathbb {N}\times \mathcal{S}}\).
Suppose that \(\tilde{b}, \tilde{c} \colon \Omega \to \mathbb {B}\). Then:
where the equality applies for all \(\omega \in \Omega \).
Let \(\tilde{y}\colon \Omega \to \mathcal{Y}\) with a finite \(\mathcal{Y}\). Then
The history-reward random variable \(\tilde{r}^{\mathrm{h}}_k \colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined as the \(k\)-th reward (0-based) of a history.
The history-reward random variable \(\tilde{r}^{\mathrm{h}}_{\ge k}\colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined as the sum of \(k\)-th or later reward (0-based) of a history.
The history-reward random variable \(\tilde{r}^{\mathrm{h}}_{\le k} \colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined as the sum of all \(k\)-th or earlier rewards (0-based) of a history.
The finite horizon objective function for and objective \(O\) is \(\pi \in \Pi _{\mathrm{HR}}\) is defined as
Suppose that \(u^1, u^2 \in \mathcal{U}\) satisfy that \(u^1(h) \ge u^2(h), \forall h\in \mathcal{H}\). Then
For each \(\pi \in \Pi _{\mathrm{HR}}\) and \(t\in \mathbb {N}\):
For each \(t\in \mathbb {N}\):
Suppose that \(\tilde{x} \colon \Omega \to \mathbb {R}\) and \(c\in \mathbb {R}\). Then
Random variables \(\tilde{x}, \tilde{y} \colon \Omega \to \mathbb {R}\) satisfy that
Suppose that \(\tilde{x}, \tilde{y} \colon \Omega \to \mathbb {R}\) and \(\tilde{z}\colon \Omega \to \mathcal{V}\) are random variables and \(c\in \mathbb {R}\), such that \(\tilde{y}(\omega ) = c + \tilde{x}(\omega )\). Then
Suppose that \(\tilde{x}, \tilde{z} \colon \Omega \to \mathbb {R}\) satisfy that
Then
A random variable \(\tilde{x}\colon \Omega \to \mathbb {R}\) and \(c\in \mathbb {R}\) satisfies that
Suppose that the random variable \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\) satisfies for some \(k, t \in \mathbb {N}\), with \(k \le t\), that
where \(h_{\le k}\) is the prefix of \(h\) of length \(k\). Then for each \(h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}\):
Suppose that \(\tilde{x}, \tilde{y} \colon \Omega \to \mathbb {R}\) satisfy that
Then
Let \(\tilde{x} \colon \Omega \to \mathbb {R}\) be a random variable. Then:
Let \(\tilde{x} \colon \Omega \to \mathbb {R}\) and \(\tilde{b} \colon \Omega \to \mathcal{Y}\) be random variables. Then:
Let \(\tilde{x} \colon \Omega \to \mathbb {R}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) be random variables with \(\mathcal{Y}\) finite. Then:
Suppose that \(\tilde{c} \colon \Omega \to \mathbb {B}\) such that \(\mathbb {P}\left[ \tilde{c} \right] = 0\). Then for any \(\tilde{x} \colon \Omega \to \mathbb {R}\):
Suppose that \(\tilde{x}, \tilde{y} \colon \mathcal{H} \to \mathbb {R}\) satisfy that \(\tilde{x}(h) = \tilde{y}(h), \forall h \in \mathcal{H}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):
Suppose that \(\tilde{x}, \tilde{y} \colon \mathcal{H} \to \mathbb {R}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):
For each \(\beta {\gt}0\), \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(t\in \mathbb {N}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\), \(s\in \mathcal{S}\), \(a\in \mathcal{A}\):
Assume \(\tilde{x} \colon \mathcal{H} \to \mathbb {R}\) and \(c \in \mathbb {R}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):
Suppose that \(c\in \mathbb {R}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):
Assume \(h \in \mathcal{H}\) and \(f \colon \mathcal{H} \to \mathbb {R}\) such that \(s_0 := s_{|h|}(h)\)
Then
For each \(\hat{h}\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), and \(t\in \mathbb {N}\):
where \(\tilde{\operatorname {id}}(h)\) is the identity function, \(|\cdot |\) is the length of a history (0-based), \(\tilde{s}_k\colon \mathcal{H} \to \mathcal{S}\) and \(\tilde{a}_k\colon \mathcal{H} \to \mathcal{A}\) are the 0-based \(k\)-th state and action, respectively of each history.
For \(h \in \mathcal{H}\):
Suppose that \(\tilde{b}, \tilde{c} \colon \Omega \to \mathbb {B}\), then
Suppose that \(\tilde{c} \colon \Omega \to \mathbb {B}\) such that \(\mathbb {P}\left[ \tilde{c} \right] = 0\). Then for any \(\tilde{b} \colon \Omega \to \mathbb {R}\):
For each \(\hat{h}\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(t\in \mathbb {N}\), \(h\in \mathcal{H}\):
where \(k_0:= |\hat{h}|\).
For each \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), and \(t\in \mathbb {N}\):
where \(k_0:=|h|\).
Let \(\tilde{x} \colon \Omega \to \mathcal{X}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) be random variables with a finite set \(\mathcal{Y}\). Then:
For each \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(t\in \mathbb {N}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\) and \(\tilde{y}\colon \mathcal{H} \to \mathcal{V}\):
Let \(\tilde{b} \colon \Omega \to \mathbb {B}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) be random variables with a finite set \(\mathcal{Y}\). Then:
For any discrete \(\tilde{x}\colon \Omega \to \mathbb {R}\) and \(g\colon \mathbb {R}\to \mathbb {R}\):