Probability

UNH Elite Team

Remark 0.0.0.1

This document omits basic intuitive definitions, such as the comparison or arithmetic operations on random variables. Arithmetic operations on random variables are performed element-wise for each element of the sample set. Please see the Lean file for complete details.

0.1 Probability

The treatment of the probability follows mostly the treatment in

Deisenroth, M.P., Faisal, A.A. and Ong, C.S. (2021) Mathematics for machine learning.

Although this document states that the values are in \(\mathbb {R}\), the lean file is all defined in terms of rational numbers \(\mathbb {Q}\) for computability reasons.

0.1.1 Prelude

This section sets up some basic tools for working with probabilities. See Probability/Prelude.lean.

Definition 0.1.1.1

✓

A \(p\in \mathbb {R}\) is a probability value when \(p \ge 0 \wedge p \le 1\).

The lean file also defines several basic properties for definition 0.1.1.1 which we do not summarize here.

Definition 0.1.1.2

A finite set of natural numbers is defined as \(\mathbb {N}_n := \left\{ 0, \dots , n-1 \right\} \).

Definition 0.1.1.3

A list \(\mathbb {L}_n \colon \mathbb {N}_n \to \tau \) of type \(\tau \) and length \(n\) is a function defined for \(\mathbb {N}_{n} := \left\{ 0, \dots , n-1 \right\} \) of type \(\tau \).

The following definition generalizes the inverse of a function to non-injective function which do not have an inverse.

Definition 0.1.1.4 Preimage

The preimage \(f^{-1}\colon \mathcal{Y} \to \mathcal{X}\) of a function \(f\colon \mathcal{X} \to \mathcal{Y}\) is defined as

\begin{equation} \label{eq:preimage-set} f^{-1}(y) := \left\{ x \in \mathcal{X} \mid f(x) = y \right\} . \end{equation}

0.1.2 Defs

This section contains definitions of probability spaces, probability operators, and the expectation operator. See Probability/Defs.lean.

Definition 0.1.2.1

A finite probability distribution \(\Delta (n)\) for \(n\in \mathbb {N}\) is defined as

\[ \Delta _n := \left\{ p \in \mathbb {L}_n \mid \sum _{i \in \mathbb {N}_n} p_i = 1, p_i \ge 0, \forall i \in \mathbb {N}_n \right\} . \]

We call a probability distribution degenerate if \(p_i = 1\) for some \(i\); otherwise the probability distribution is supported.

The formal model we use to represent random events and processes is the following.

Definition 0.1.2.2 Complete Probability Space

A complete probability space consists of:

A sample space \(\Omega := \mathbb {N}_n\), which is the set of all possible outcomes.
An implicit event space \(\mathcal{F} := 2^{\Omega }\), which we do not define explicitly in Lean.
A probability function \(\mathbb {P} \in \Delta _N\) that satisfies that
1. \(\mathbb {P}(\Omega ) = 1\)
2. \(\mathbb {P}(A_1 \cup A_2) = \mathbb {P}(A_1) + \mathbb {P}(A_2)\) for all \(A_1, A_2\in \mathcal{F}\) such that \(A_1 \cap A_2 = \emptyset \).

The notation \(2^{\Omega }\) in definition 0.1.2.2 represents the power set of \(\Omega \).

The set \(\mathcal{F}\) is often, but not always, the power set of \(\Omega \). One reason it may not be a power set is that it is necessary when \(\Omega \) is infinite; see Borel sets. Another reason to define \(\mathcal{F}\) that is not a power set is to denote the available information. We will see an example of this later. Unless specified otherwise, we assume that the event set for a finite sample space is its power set and the \(\sigma \)-algebra of \(\mathbb {R}\) is the Borel set.

Definition 0.1.2.3 Measure

The measure \(\mathbb {P}[\mathcal{S}]\) of the set \(\mathcal{S} \subseteq \mathbb {N}\) given a probability space is defined as

\[ \mathbb {P}[\mathcal{S}] := \sum _{i \in \Omega } \mathbb {I}[ i\in \mathcal{S} ]. \]

The definition requires that the set is decidable.

Example 0.1.2.4

Suppose I want to model the behavior of coin flips. I have a coin that is equally likely to come up heads and tails. I flip the coin twice. One possible probability space would be

\(\Omega := \left\{ \mathrm{HH}, \mathrm{HT}, \mathrm{TH}, \mathrm{TT} \right\} \).
\(\mathcal{F} := 2^{\Omega } = \left\{ \emptyset , \Omega , \left\{ \mathrm{HH} \right\} , \left\{ \mathrm{HH}, \mathrm{HT} \right\} , \dots \right\} \).
\(P(\left\{ \mathrm{HH} \right\} ) := \frac{1}{4}, P(\left\{ \mathrm{HH}, \mathrm{HT} \right\} ) := \frac{1}{2}\). The probabilities of other sets are derived using the properties in definition 0.1.2.2.

For the sake of completeness, we include also the notion of measurability. However, since we are only concerned with complete probability spaces, all random variables we define will be measurable.

Definition 0.1.2.5

A function \(f\colon \mathcal{X} \to \mathcal{Y}\) with \(\sigma \)-algebras \(\mathcal{D} \subseteq 2^{\mathcal{X}}\) and \(\mathcal{E} \subseteq 2^{\mathcal{Y}}\) is measurable when the pre-image of each \(E\in \mathcal{E}\) is in \(\mathcal{D}\):

\[ f^{-1}(E) \in \mathcal{D}. \]

Here, the application to \(f^{-1}\) to a set \(E \subseteq \mathcal{Y}\) is defined as:

\begin{equation} f^{-1}(E) := \bigcup _{e \in E} f^{-1}(e) . \end{equation}

The measurability in the definition of random variables is required for us to reason about their probability distributions.

Definition 0.1.2.6 Random Variable

A random variable is a measurable function \(\tilde{x}\colon \mathbb {N}\to \mathcal{Y}\) where \(\mathcal{Y}\) is a measurable set. Note that the way we define random variables, they are independent of a particular probability space. The measurability requirement is vacuous in probability spaces.

When manipulating random variables we denote them with a tilde, such as \(\tilde{x}, \tilde{s}\). This corresponds to the common practice to use uppercase letters to denote random variables. We drop the tilde when treating the random variable as an ordinary function.

[TODO: We could get a lot of mileage from treating random variables as vectors. Expectation and probability are then linear operators on this vector space of random variables. This is only possible when using a complete vector space. Question: Do we need finiteness for this? That is, do random variables need to be defined for a finite \(\Omega \)?]

Remark 0.1.2.7

We define some ad-hoc operations on random variables. However, it would be better to treat them as a vector space. Then probability and expectation can be seen as linear operators on this vector space.

Other notable definition is that the equality with a scalar \(\tilde{x} = x_0\) is interpreted as a boolean random variable \(\tilde{b}: \mathbb {B} \to \left\{ 0,1 \right\} \) :

\[ b(\omega ) := (\tilde{x} = x_0)(\omega ) := \mathbb {I}[ x(\omega ) = x_0]. \]

We use the operator (function) \(\mathbb {P}\) denote the probability of the set of sample space that corresponds to a given condition. We use a non-standard definition for convenience (and show that this definition equals to the standard one later).

Definition 0.1.2.8

For any complete probability space, \((\Omega , \mathcal{F}, P)\) and a random variable \(\tilde{b}\colon \Omega \to \mathbb {B}\), the operator \(\mathbb {P}\) is defined as

\[ \mathbb {P}\left[ \tilde{b} \right] := \sum _{i\in \Omega } \mathbb {P}(i) b(i), \]

where the boolean is interpreted as \(0\) for false and \(1\) for true.

The definition can be seen as an inner product between the probabilities and the random variable.

The following theorem shows that our definition of a probability operator is equal to the standard definition.

Theorem 0.1.2.9

For any probability space, \((\Omega , \mathcal{F}, P)\) and a random variable \(\tilde{b}\colon \Omega \to \mathbb {B}\):

\[ \mathbb {P}\left[\tilde{b} \right] := \mathbb {P}[b^{-1}(\operatorname {true})]. \]

Probability distributions are typically defined for random variables. The distribution of a random variable is characterized by the cumulative distribution function (CDF), defined as follows.

Definition 0.1.2.10 CDF

The cumulative distribution function \(F_{\tilde{x}}\colon \mathbb {R}\to [0,1]\) of a real-valued random variable \(\tilde{x}\colon \Omega \to \mathbb {R}\) is defined as

\[ F_{\tilde{x}}(x) := \mathbb {P}\left[\tilde{x} \le x\right], \qquad \forall x\in \mathbb {R}. \]

Some random variables also have a probability density function (PDF), and discrete random variables have a probability mass function (PMF). In this class, we will be primarily dealing with discrete random variables.

Definition 0.1.2.11 PMF

The probability mass function \(p_{\tilde{x}}\colon \mathcal{E} \to [0,1]\) for a discrete random variable \(\tilde{x}\colon \Omega \to \mathcal{E}\) with a finite \(\mathcal{E}\) is defined as

\[ p_{\tilde{x}}(e) := \mathbb {P}\left[\tilde{x} = e\right], \qquad \forall e\in \mathcal{E} . \]

Definition 0.1.2.12

The joint probability of two random variables \(\tilde{x}\colon \Omega \to \mathcal{X}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) is defined as

\[ \mathbb {P}\left[\tilde{x} = x, \tilde{y} = y\right] := \mathbb {P}\left[\tilde{x} = x \wedge \tilde{y} = y\right] , \forall x\in \mathcal{X}, y\in \mathcal{Y}. \]

One can also define a joint PMF for two discrete random variables \(p_{\tilde{x}, \tilde{y}}(x,y)\) for \(x\in \mathcal{X}\) and \(y\in \mathcal{Y}\) as

\[ p_{\tilde{x}, \tilde{y}}(x,y) = \mathbb {P}\left[\tilde{x} = x, \tilde{y} = y\right]. \]

Definition 0.1.2.13

The conditional probability of \(\tilde{b}\colon \Omega \to \mathbb {B}\) on \(\tilde{c}\colon \Omega \to \mathbb {B}\) is defined as

\[ \mathbb {P}\left[ \tilde{b} \mid \tilde{c} \right] := \frac{ \mathbb {P}[\tilde{b} \wedge \tilde{c} ] }{ \mathbb {P}\left[ \tilde{c} \right]} \]

Definition 0.1.2.14 Independent RV

Events \(A,B \in \mathcal{F}\) are independent when

\[ \mathbb {P}[ A \cap B ] = \mathbb {P}[ A ] \cdot \mathbb {P}[ B ]. \]

Random variables \(\tilde{x}\colon \Omega \to \mathcal{X}\) and \(\tilde{y}\colon \Omega \to \mathcal{Y}\) are independent when

\[ F_{\tilde{x}, \tilde{y}}(x,y) = F_{\tilde{x}}(x) \cdot F_{\tilde{y}}(y), \forall x\in \mathcal{X}, y\in \mathcal{Y}. \]

Discrete random variables \(\tilde{x}\) and \(\tilde{y}\) are independent if and only if

\[ \mathbb {P}\left[\tilde{x} = x, \tilde{y} = y\right] = \mathbb {P}\left[\tilde{x} = x\right] \cdot \mathbb {P}\left[\tilde{y} = y\right], \forall x\in \mathcal{X}, y\in \mathcal{Y}. \]

The expectation of a random variable is its mean. As with probabilities, we use a non-standard definition for the sake of computational convenience and then show that it is equivalent to the common definition of probabilities.

Definition 0.1.2.15 Expected Value

Suppose that \(\tilde{x}\colon \Omega \to \mathbb {R}\) is a random variable and that \(x(\Omega ) := \left\{ x(\omega ) \mid \omega \in \Omega \right\} \) is finite. Then

\[ \mathbb {E}\left[\tilde{x}\right] := \sum _{\omega \in \Omega } x(\omega ) \cdot \mathbb {P}(\omega ). \]

The following theorem shows that our definition of expected value is equal to the standard definition of expected values.

Theorem 0.1.2.16 Expected Value

Suppose that \(\tilde{x}\colon \Omega \to \mathbb {R}\) is a random variable and that \(x(\Omega ) := \left\{ x(\omega ) \mid \omega \in \Omega \right\} \) is finite. Then

\[ \mathbb {E}\left[\tilde{x}\right] = \sum _{x\in \tilde{x}(\Omega )} x\cdot \mathbb {P}\left[\tilde{x} = x\right]. \]

In general probability spaces, the expectation is defined using the Lebesgue integral:

\[ \mathbb {E}\left[\tilde{x}\right] := \int _{\Omega } \tilde{x} \, dP, \]

which is equivalent to the form in the theorem above in finite probability spaces.

Some of the most important properties of expectations are as follows.

Theorem 0.1.2.17

Assume \(\tilde{x},\tilde{y}\colon \Omega \to \mathbb {R}\) and \(c\in \mathbb {R}\). Then:

\begin{align*} \mathbb {E}\left[c\cdot \tilde{x}\right] & = c \cdot \mathbb {E}\left[\tilde{x}\right]\\ \mathbb {E}\left[\tilde{x} + \tilde{y}\right] & = \mathbb {E}\left[\tilde{x}\right] + \mathbb {E}\left[\tilde{y}\right]\\ \mathbb {E}\left[\tilde{x} \cdot \tilde{y}\right] & = \mathbb {E}\left[\tilde{x}\right] \cdot \mathbb {E}\left[\tilde{y}\right] \quad \text{for independent } \tilde{x}, \tilde{y}. \end{align*}

Definition 0.1.2.18

A random variable defined on a finite probability space \(P\) is a mapping \(\tilde{x}\colon \Omega \to \mathbb {R}\).

0.1.3 Induction

This section includes properties that aim to enable one to perform induction on probability spaces.

0.1.4 Basic

This section states the basic properties of probability spaces, and operators.

Definition 0.1.4.1

A finite probability measure \(p\colon \Omega \to \mathbb {R}_+\) on a finite set \(\Omega \) is any function that satisfies

\[ \sum _{\omega \in \Omega } p(\omega ) = 1. \]

Theorem 0.1.4.2 Law of the unconscious statistician

For any discrete \(\tilde{x}\colon \Omega \to \mathbb {R}\) and \(g\colon \mathbb {R}\to \mathbb {R}\):

\[ \mathbb {E}\left[g(\tilde{x})\right] = \sum _{x\in \mathcal{X}} g(x) \cdot \mathbb {P}\left[\tilde{x} = x\right]. \]

For the remainder of ??, we assume that \(P = (\Omega , p)\) is a finite probability space. All random variables are defined on the space \(P\) unless specified otherwise.

0.1.5 OLD CONTENT: TO BE MOVED

Here, we define probability and expectation operators.

Definition 0.1.5.1

A boolean set is \(\mathbb {B} = \left\{ \operatorname {false}, \operatorname {true}\right\} \).

Definition 0.1.5.2

The expectation of a random variable \(\tilde{x} \colon \Omega \to \mathbb {R}\) is

\[ \mathbb {E}\left[ \tilde{x} \right] := \sum _{\omega \in \Omega } p(\omega ) \cdot \tilde{x}(\omega ). \]

Definition 0.1.5.3

An indicator function \(\mathbb {I}\colon \mathbb {B} \to \left\{ 0, 1 \right\} \) is defined for \(b\in \mathbb {B}\) as

\[ \mathbb {I}(b) := \begin{cases} 1 & \text{if } b = \operatorname {true}, \\ 0 & \text{if } b = \operatorname {false}. \end{cases} \]

Definition 0.1.5.4

The conditional expectation of \(\tilde{x} \colon \Omega \to \mathbb {R}\) conditioned on \(\tilde{b} \colon \Omega \to \mathbb {B}\) is defined as

\[ \mathbb {E}\left[ \tilde{x} \mid \tilde{b} \right] := \frac{1}{\mathbb {P}[\tilde{b}]} \mathbb {E}\left[ \tilde{x} \cdot \mathbb {I}\circ \tilde{b} \right], \]

where we define that \(x / 0 = 0\) for each \(x\in \mathbb {R}\).

Remark 0.1.5.5

It is common to prohibit conditioning on a zero probability event both for expectation and probabilities. In this document, we follow the Lean convention, where the division by \(0\) is \(0\); see div_zero. However, even some basic probability and expectation results may require that we assume that the conditioned event does not have probability zero for it to hold.

Definition 0.1.5.6

The random conditional expectation of a random variable \(\tilde{x} \colon \Omega \to \mathbb {R}\) conditioned on \(\tilde{y} \colon \Omega \to \mathcal{Y}\) for a finite set \(\mathcal{Y}\) is the random variable \(\mathbb {E}\left[ \tilde{x} \mid \tilde{y} \right]\colon \Omega \to \mathbb {R}\) is defined as

\[ \mathbb {E}\left[ \tilde{x} \mid \tilde{y} \right](\omega ) := \mathbb {E}\left[ \tilde{x} \mid \tilde{y} = \tilde{y}(\omega ) \right], \quad \forall \omega \in \Omega . \]

Remark 0.1.5.7

The Lean file defines expectations more broadly for a data type \(\rho \) which is more general than just \(\mathbb {R}\). The main reason to generalize to both \(\mathbb {R}\) and \(\mathbb {R}_+\). However, in principle, the definitions could be used to reason with expectations that go beyond real numbers and may include other algebras, such as vectors or matrices.

Lemma 0.1.5.8

Suppose that \(\tilde{b}, \tilde{c} \colon \Omega \to \mathbb {B}\). Then:

\[ \mathbb {I}\left( \tilde{b} \wedge \tilde{c} \right) = \mathbb {I}( \tilde{b} ) \cdot \mathbb {I}( \tilde{c} ), \]

where the equality applies for all \(\omega \in \Omega \).

Theorem 0.1.5.9

Suppose that \(\tilde{c} \colon \Omega \to \mathbb {B}\) such that \(\mathbb {P}\left[ \tilde{c} \right] = 0\). Then for any \(\tilde{x} \colon \Omega \to \mathbb {R}\):

\[ \mathbb {E}\left[ \tilde{x} \mid \tilde{c} \right] = 0. \]

Proof ▶

Immediate from the definition and the fact that \(0 \cdot x = 0\) for \(x\in \mathbb {R}\).

Theorem 0.1.5.10

Suppose that \(\tilde{c} \colon \Omega \to \mathbb {B}\) such that \(\mathbb {P}\left[ \tilde{c} \right] = 0\). Then for any \(\tilde{b} \colon \Omega \to \mathbb {R}\):

\[ \mathbb {P}\left[ \tilde{b} \mid \tilde{c} \right] = 0. \]

Proof ▶

Immediate from theorem 0.1.5.9.

Theorem 0.1.5.11

Suppose that \(\tilde{b}, \tilde{c} \colon \Omega \to \mathbb {B}\), then

\[ \mathbb {P}\left[ \tilde{b} \wedge \tilde{c} \right] = \mathbb {P}\left[ \tilde{b} \mid \tilde{c} \right] \cdot \mathbb {P}\left[ \tilde{c} \right]. \]

Proof ▶

The property holds immediately when \(\mathbb {P}\left[ \tilde{c} \right] = 0\). Assume that \(\mathbb {P}\left[ \tilde{c} \right] {\gt} 0\). Then:

\begin{align*} \mathbb {P}\left[ \tilde{b} \wedge \tilde{c} \right] & = \mathbb {E}\left[ \mathbb {I}(\tilde{b} \wedge \tilde{c}) \right] & & \text{[\cref{def:probability}]} \\ & = \mathbb {E}\left[ \mathbb {I}(\tilde{b}) \cdot \mathbb {I}(\tilde{c}) \right] & & \text{[\cref{lem:ind-and-eq-prod-ind}]}\\ & = \frac{1}{\mathbb {P}\left[ \tilde{c} \right]}\mathbb {E}\left[ \mathbb {I}(\tilde{b}) \cdot \mathbb {I}(\tilde{c}) \right] \cdot \mathbb {P}\left[ \tilde{c} \right] & & \cdot 1 \\ & = \mathbb {E}\left[ \mathbb {I}(\tilde{b}) \mid \tilde{c} \right] \cdot \mathbb {P}\left[ \tilde{c} \right] & & \text{[\cref{def:expect-cnd}]} \\ & = \mathbb {P}\left[ \tilde{b} \mid \tilde{c} \right] \cdot \mathbb {P}\left[ \tilde{c} \right] & & \text{[\cref{def:probability_cnd}]}. \end{align*}

Lemma 0.1.5.12

Let \(\tilde{y}\colon \Omega \to \mathcal{Y}\) with a finite \(\mathcal{Y}\). Then

\[ \mathbb {P}[\tilde{y} = y(\omega )] \ge p(\omega ), \quad \omega \in \Omega . \]

Proof ▶

\begin{align*} \mathbb {P}[\tilde{y} = y(\omega )] & = \sum _{\omega ' \in \Omega } p(\omega ) \cdot \mathbb {I}(\tilde{y}(\omega ’) = \tilde{y}(\omega )) & & \text{[\cref{def:probability}]} \\ & \ge p(\omega ) & & \omega \in \Omega \text{[ and ]} p(\omega ’) \ge 0, \forall \omega ’\in \Omega . \end{align*}

Remark 0.1.5.13

theorem 0.1.5.19 shows the equivalence of expectations for surely equal random variables.

Theorem 0.1.5.14

Random variables \(\tilde{x}, \tilde{y} \colon \Omega \to \mathbb {R}\) satisfy that

\[ \mathbb {E}\left[ \tilde{x} + \tilde{y} \right] = \mathbb {E}\left[ \tilde{x} \right] + \mathbb {E}\left[ \tilde{y} \right]. \]

Proof ▶

From the distributive property of sums.

Theorem 0.1.5.15

A random variable \(\tilde{x}\colon \Omega \to \mathbb {R}\) and \(c\in \mathbb {R}\) satisfies that

\[ \mathbb {E}\left[ c \right] = c. \]

Theorem 0.1.5.16

Suppose that \(\tilde{x} \colon \Omega \to \mathbb {R}\) and \(c\in \mathbb {R}\). Then

\[ \mathbb {E}\left[ c + \tilde{x} \right] = c + \mathbb {E}\left[ \tilde{x} \right]. \]

Proof ▶

From ??.

Theorem 0.1.5.17

Suppose that \(\tilde{x}, \tilde{y} \colon \Omega \to \mathbb {R}\) and \(\tilde{z}\colon \Omega \to \mathcal{V}\) are random variables and \(c\in \mathbb {R}\), such that \(\tilde{y}(\omega ) = c + \tilde{x}(\omega )\). Then

\[ \mathbb {E}\left[ \tilde{y} \mid \tilde{z} \right](\omega ) = c + \mathbb {E}\left[ \tilde{x} \mid \tilde{z} \right](\omega ), \quad \forall \omega \in \Omega . \]

Proof ▶

From theorem 0.1.5.16.

Theorem 0.1.5.18

Suppose that \(\tilde{x}, \tilde{y} \colon \Omega \to \mathbb {R}\) satisfy that

\[ \forall \omega \in \Omega , p(\omega ) {\gt} 0 \Rightarrow \tilde{x}(\omega ) \ge \tilde{y}(\omega ). \]

Then

\[ \mathbb {E}\left[\tilde{x}\right] \ge \mathbb {E}\left[\tilde{y}\right] . \]

Theorem 0.1.5.19 Congruence of Expectation

Suppose that \(\tilde{x}, \tilde{z} \colon \Omega \to \mathbb {R}\) satisfy that

\[ \forall \omega \in \Omega , p(\omega ) {\gt} 0 \Rightarrow \tilde{x}(\omega ) = \tilde{z}(\omega ). \]

Then

\[ \mathbb {E}\left[ \tilde{x} \right] = \mathbb {E}\left[ \tilde{z} \right]. \]

Proof ▶

Immediately from the congruence of sums.

0.1.6 The Laws of The Unconscious Statisticians

Theorem 0.1.6.1

Let \(\tilde{x} \colon \Omega \to \mathbb {R}\) be a random variable. Then:

\[ \mathbb {E}\left[ \tilde{x} \right] = \sum _{x\in \tilde{x}(\Omega )} \mathbb {P}\left[ \tilde{x} = x \right] \cdot x. \]

Proof ▶

Let \(\mathcal{X} := \tilde{x}(\Omega )\), which is a finite set. Then:

\begin{align*} \mathbb {E}\left[ \tilde{x} \right] & = \sum _{\omega \in \Omega } p(\omega ) \cdot \tilde{x}(\omega ) & & \text{[\cref{def:expect}]} \\ & = \sum _{\omega \in \Omega } \sum _{x\in \mathcal{X}} p(\omega ) \cdot \tilde{x}(\omega ) \cdot \mathbb {I}(x = \tilde{x}(\omega )) & & \text{[??]} \\ & = \sum _{\omega \in \Omega } \sum _{x\in \mathcal{X}} p(\omega ) \cdot x \cdot \mathbb {I}(x = \tilde{x}(\omega )) & & \text{[??]} \\ & = \sum _{x\in \mathcal{X}} x \cdot \sum _{\omega \in \Omega } p(\omega ) \cdot \mathbb {I}(x = \tilde{x}(\omega )) & & \text{[??]}\\ & = \sum _{x\in \mathcal{X}} x \cdot \mathbb {E}\left[ \mathbb {I}(x = \tilde{x}(\omega )) \right] & & \text{[\cref{def:expect}]} \\ & = \sum _{x\in \mathcal{X}} x \cdot \mathbb {P}\left[x = \tilde{x}(\omega ) \right]. & & \text{[\cref{def:probability}]} \end{align*}

The following theorem generalizes the theorem above.

Theorem 0.1.6.2

Let \(\tilde{x} \colon \Omega \to \mathbb {R}\) and \(\tilde{b} \colon \Omega \to \mathcal{Y}\) be random variables. Then:

\[ \mathbb {E}\left[ \tilde{x} \mid \tilde{b} \right] = \sum _{x\in \tilde{x}(\Omega )} \mathbb {P}\left[ \tilde{x} = x \mid \tilde{b} \right] \cdot x. \]

Theorem 0.1.6.3

Let \(\tilde{x} \colon \Omega \to \mathbb {R}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) be random variables with \(\mathcal{Y}\) finite. Then:

\[ \mathbb {E}\left[ \mathbb {E}\left[ \tilde{x} \mid \tilde{y} \right] \right] = \sum _{y\in \mathcal{Y}} \mathbb {E}\left[ \tilde{x} \mid \tilde{y} = y \right] \cdot \mathbb {P}\left[ \tilde{y} = y \right]. \]

0.1.7 Total Expectation and Probability

Theorem 0.1.7.1 Law of Total Probability

Let \(\tilde{b} \colon \Omega \to \mathbb {B}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) be random variables with a finite set \(\mathcal{Y}\). Then:

\[ \sum _{y\in \mathcal{Y}} \mathbb {P}\left[\tilde{b} \wedge (\tilde{y} = y) \right] = \mathbb {P}\left[ \tilde{b} \right]. \]

Theorem 0.1.7.2 Law of Total Expectation

Let \(\tilde{x} \colon \Omega \to \mathcal{X}\) and \(\tilde{y} \colon \Omega \to \mathcal{Y}\) be random variables with a finite set \(\mathcal{Y}\). Then:

\[ \mathbb {E}\left[\mathbb {E}\left[\tilde{x} \mid \tilde{y}\right]\right] = \mathbb {E}\left[ \tilde{x} \right]. \]

Proof ▶

Recall that we are allowing the division by 0 and assume that \(x / 0 = 0\).

\begin{align*} \mathbb {E}\left[\mathbb {E}\left[\tilde{x} \mid \tilde{y}\right]\right] & = \sum _{\omega \in \Omega } p (\omega ) \cdot \mathbb {E}\left[ \tilde{x} \mid \tilde{y} \right](\omega ) & & \text{[\cref{def:expect}]}\\ & = \sum _{\omega \in \Omega } p (\omega ) \cdot \mathbb {E}\left[ \tilde{x} \mid \tilde{y} = \tilde{y}(\omega ) \right] & & \text{[\cref{def:expect-cnd-rv}]}\\ & = \sum _{\omega \in \Omega } \frac{p(\omega )}{\mathbb {P}\left[ \tilde{y}= \tilde{y}(\omega )\right]} \sum _{\omega '\in \Omega } p(\omega ’) \cdot \tilde{x}(\omega ’) \cdot \mathbb {I}\left( \tilde{y}(\omega ’) = \tilde{y}(\omega ) \right) & & \text{[\cref{def:expect-cnd}]} \\ & = \sum _{\omega '\in \Omega } p(\omega ’) \cdot \tilde{x}(\omega ’) \cdot \sum _{\omega \in \Omega } \frac{p(\omega )}{\mathbb {P}\left[ \tilde{y}= \tilde{y}(\omega )\right]} \mathbb {I}\left( \tilde{y}(\omega ’) = \tilde{y}(\omega ) \right) & & \text{[rearrange]} \\ & = \sum _{\omega '\in \Omega } p(\omega ’) \cdot \tilde{x}(\omega ’) \cdot \sum _{\omega \in \Omega } \frac{p(\omega )}{\mathbb {P}\left[ \tilde{y} = \tilde{y}(\omega ')\right]} \mathbb {I}\left( \tilde{y}(\omega ’) = \tilde{y}(\omega ) \right) & & \text{[equals when ]} \tilde{y}(\omega ’) = \tilde{y}(\omega ) \\ & = \sum _{\omega '\in \Omega } p(\omega ’) \cdot \tilde{x}(\omega ’) & & \text{[see below]} \\ & = \mathbb {E}\left[ \tilde{x} \right]. \end{align*}

Above, we used the fact that

\[ p(\omega ') \cdot \sum _{\omega \in \Omega } \frac{p(\omega )}{\mathbb {P}\left[ \tilde{y} = \tilde{y}(\omega ')\right]} \mathbb {I}\left( \tilde{y}(\omega ') = \tilde{y}(\omega ) \right) = p(\omega '), \]

which follows by analyzing two cases. First, when \(p(\omega ') = 0\), then the equality holds immediately. If \(p(\omega ') {\gt} 0\) then by lemma 0.1.5.12, \(\mathbb {P}\left[ \tilde{y} = \tilde{y}(\omega ')\right] {\gt} 0\), we get from ?? that

\[ \sum _{\omega \in \Omega } \frac{p(\omega )}{\mathbb {P}\left[ \tilde{y} = \tilde{y}(\omega ')\right]} \mathbb {I}\left( \tilde{y}(\omega ') = \tilde{y}(\omega ) \right) = \frac{\mathbb {P}\left[ \tilde{y} = \tilde{y}(\omega ')\right]}{\mathbb {P}\left[ \tilde{y} = \tilde{y}(\omega ')\right]} = 1, \]

which completes the step.

The following proof is simpler but may require some more advanced properties.

Alternate proof ▶

\begin{align*} \mathbb {E}\left[\mathbb {E}\left[\tilde{x} \mid \tilde{y}\right] \right] & = \sum _{y\in \mathcal{Y}} \mathbb {E}\left[\tilde{x} \mid \tilde{y} = y \right] \cdot \mathbb {P}\left[ \tilde{y} = y \right] \\ & = \sum _{y\in \mathcal{Y}} \sum _{x\in \mathcal{X}} x \cdot \mathbb {P}\left[\tilde{x} = x \mid \tilde{y} = y \right] \cdot \mathbb {P}\left[ \tilde{y} = y \right] \\ & = \sum _{y\in \mathcal{Y}} \sum _{x\in \mathcal{X}} x \cdot \mathbb {P}\left[\tilde{x} = x, \tilde{y} = y \right] \\ & = \sum _{x\in \mathcal{X}} x \cdot \sum _{y\in \mathcal{Y}} \mathbb {P}\left[\tilde{x} = x, \tilde{y} = y \right] \\ & = \sum _{x\in \mathcal{X}} x \cdot \mathbb {P}\left[\tilde{x} = x \right] = \mathbb {E}\left[ \tilde{x} \right]. \end{align*}

0.2 Formal Decision Framework

0.2.1 Markov Decision Process

Definition 0.2.1.1

A Markov decision process \(M := (S, A, P, r)\) consists of:

a positive integer \(S \in \mathbb {N}_{{\gt}0}\) representing the number of states, with index set \(\operatorname {Fin}(S) = \{ 0,1, \dots , S-1\} \)
a positive integer \(A \in \mathbb {N}_{{\gt}0}\) representing the number of actions, with index set \(\operatorname {Fin}(A) = \{ 0,1, \dots , A-1\} \)
a transition function \(P \colon \operatorname {Fin}(S) \times \operatorname {Fin}(A) \to \Delta (\operatorname {Fin}(S))\), mapping each state–action pair to finite probability distribution over next states
a reward function \(r \colon \operatorname {Fin}(S) \times \operatorname {Fin}(A) \times \operatorname {Fin}(S) \to \mathbb {R}\), mapping each transition \((s,a,s')\) to a real number

0.2.2 Histories

We implicitly assume in the remainder of the section an MDP \(M = (S, A, P, r)\).

Definition 0.2.2.1

A history \(h\) in a set of histories \(\mathcal{H}\) is a sequence of states and actions defined for \(M\) recursively as

\[ h := \langle s_0 \rangle , \quad \text{[or]} \quad h := \langle h', a, s \rangle , \]

where \(s_0, s \in \operatorname {Fin}(S)\), \(a \in \operatorname {Fin}(A)\), and \(h'\in \mathcal{H}\).

Definition 0.2.2.2

The length \(|h|\in \mathbb {N}\) of a history \(h\in \mathcal{H}\) is defined as

\begin{align*} |\langle s \rangle | & := 0, \\ |\langle h’, s, a \rangle | & := 1 + |h’|, \qquad h’ \in \mathcal{H}. \end{align*}

Definition 0.2.2.3

The set \(\mathcal{H}_{\mathrm{NE}}\) of non-empty histories is

\[ \mathcal{H}_{\mathrm{NE}} := \left\{ h \in \mathcal{H} \mid |h| \ge 1 \right\} . \]

Definition 0.2.2.4

Following histories \(\mathcal{H}(h,t) \subseteq \mathcal{H}\) for \(h\in \mathcal{H}\) of length \(t\in \mathbb {N}\) are defined recursively as

\[ \mathcal{H}(h,t) := \begin{cases} \left\{ h \right\} & \text{if } t = 0, \\ \left\{ \langle h’, a, s \rangle \mid h\in \mathcal{H}(h’, t-1), a\in \mathcal{A}, s\in \mathcal{S} \right\} & \text{otherwise}. \\ \end{cases} \]

Definition 0.2.2.5

The set of histories \(\mathcal{H}_{t}\) of length \(t \in \mathbb {N}\) is defined recursively as

\[ \mathcal{H}_t = \begin{cases} \left\{ \langle s \rangle \mid s\in \mathcal{S} \right\} & \text{if } t = 0, \\ \left\{ \langle h, a, s\rangle \mid h \in \mathcal{H}_{t-1}, a\in \mathcal{A}, s\in \mathcal{S} \right\} & text{otherwise}. \end{cases} \]

Theorem 0.2.2.6

For \(h \in \mathcal{H}\):

\[ |h'| = |h| + t, \qquad \forall h'\in \mathcal{H}(h, t). \]

Proof ▶

The theorem follows by induction on \(t\) from the definition.

Definition 0.2.2.7

We use \(\tilde{s}_k\colon \mathcal{H} \to \mathcal{S}\) to denote the 0-based \(k\)-th state of each history.

Definition 0.2.2.8

We use \(\tilde{a}_k\colon \mathcal{H} \to \mathcal{A}\) to denote the 0-based \(k\)-th action of each history.

Definition 0.2.2.9

The history-reward random variable \(\tilde{r}^{\mathrm{h}}\colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined recursively as

\[ \tilde{r}^{\mathrm{h}}(h) := r(s_{|h|}(h'), a, s) + r_{\mathrm{h}}(h'). \]

Definition 0.2.2.10

The history-reward random variable \(\tilde{r}^{\mathrm{h}}_k \colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined as the \(k\)-th reward (0-based) of a history.

Definition 0.2.2.11

The history-reward random variable \(\tilde{r}^{\mathrm{h}}_{\le k} \colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined as the sum of all \(k\)-th or earlier rewards (0-based) of a history.

Definition 0.2.2.12

The history-reward random variable \(\tilde{r}^{\mathrm{h}}_{\ge k}\colon \mathcal{H} \to \mathbb {R}\) for \(h = \langle h', a, s \rangle \in \mathcal{H}\) for \(h'\in \mathcal{H}\), \(a\in \mathcal{A}\), and \(s\in \mathcal{S}\) is defined as the sum of \(k\)-th or later reward (0-based) of a history.

0.2.3 Policies

Definition 0.2.3.1

The set of decision rules \(\mathcal{D}\) is defined as \(\mathcal{D} := \mathcal{A}^{\mathcal{S}}. \) A single action \(a \in \mathcal{A}\) can also be interpreted as a decision rule \(\mathcal{d} := s \mapsto a\).

Definition 0.2.3.2

The set of history-dependent policies is \(\Pi _{\mathrm{HR}}:= \Delta (\mathcal{A})^{\mathcal{H}}. \)

Definition 0.2.3.3

The set of Markov deterministic policies \(\Pi _{\mathrm{MD}}\) is \(\Pi _{\mathrm{MD}}:= \mathcal{D}^{\mathbb {N}}. \) A Markov deterministic policy \(\pi \in \Pi _{\mathrm{MD}}\) can also be interpreted as \(\bar{\pi } \in \Pi _{\mathrm{HR}}\):

\[ \bar{\pi }(h) := \delta \left[ \pi (|h|, s_{|h|}(h)) \right], \]

where \(\delta \) is the Dirac distribution, and \(s_{|h|}\) is the history’s last state.

Definition 0.2.3.4

The set of stationary deterministic policies \(\Pi _{\mathrm{SD}}\) is defined as \(\Pi _{\mathrm{SD}} := \mathcal{D}. \) A stationary policy \(\pi \in \Pi _{\mathrm{SD}}\) can be interpreted as \(\bar{\pi } \in \Pi _{\mathrm{HR}}\):

\[ \bar{\pi }(h) := \delta \left[ \pi (s_{|h|}(h)) \right], \]

where \(\delta \) is the Dirac distribution and \(s_{|h|}\) is the history’s last state.

0.2.4 Distribution

Definition 0.2.4.1

The history probability distribution \(p^{\mathrm{h}}_T \colon \Pi _{\mathrm{HR}}\to \Delta (\mathcal{H}(h,t))\) and \(\pi \in \Pi _{\mathrm{HR}}\) is defined for each \(T \in \mathbb {N}\) and \(h\in \mathcal{H}(\hat{h},t)\) as

\[ (p^{\mathrm{h}}_T(\pi ))(h) := \begin{cases} \mathbb {I}(h = \hat{h}) & \text{if } T = 0, \\ p_{T-1}^{\mathrm{h}}(h’, \pi ) \cdot \pi (h’,a) \cdot p(s_{|h|}(h’), a , s) & \text{if } T {\gt} 1 \wedge h = \langle h’, a, s \rangle . \end{cases} \]

Moreover, the function \(p^{\mathrm{h}}\) maps policies to correct probability distribution.

TODO: This definition needs to be updated. A probability space \((\Omega _{h,t}, 2^{\Omega _{h,t} }, \hat{p}_{h,\pi })\) which is defined as

\begin{align} \label{eq:omega-def} \Omega _{h,t} & := \left\{ h’ \in \mathcal{H}_{|h| + t} \mid s_k(h) = s_k(h’) \wedge a_k(h) = a_k(h’), \forall k \le |h| \right\} , \\ \label{eq:phat-def} \hat{p}_{h,\pi }\left(\left\langle h’, a, s \right\rangle \right) & := \begin{cases} 1 & \text{if } \left\langle h’, a, s \right\rangle = h, \\ \hat{p}_{h,\pi }(h’) \cdot \pi (h’, a) \cdot p(s_{|h'|}(h’), a, s) & \text{otherwise}, \end{cases}\end{align}

for each \(\left\langle h', a, s \right\rangle \in \Omega _{h,t}\). The random variables are defined as \(\tilde{s}_k(h') := s_k(h')\), \(\tilde{a}_k(h') := a_k(h')\), \(\forall h'\in \Omega _{h,t}\). We interpret the subscripts analogously on all operators, including other risk measures, and \(\mathbb {E}\), and \(\mathbb {P}\).

Definition 0.2.4.2

The history-dependent expectation is defined for each \(t\in \mathbb {N}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(\hat{h}\in \mathcal{H}\) and a \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\) as

\begin{equation*} \mathbb {E}^{\hat{h}, \pi , t} [\tilde{x}] := \mathbb {E}\left[ \tilde{x} \right] = \sum _{h \in \mathcal{H}(\hat{h}, t)} p^{\mathrm{h}}(h, \pi ) \cdot \tilde{x}(h). \end{equation*}

In the \(\mathbb {E}\) operator above, the random variable \(\tilde{x}\) lives in a probability space \((\Omega , p)\) where \(\Omega = \mathcal{H}(\hat{h}, t)\) and \(p(h) = p^{\mathrm{h}}(h, \pi ), \forall h\in \Omega \). Moreover, if \(\hat{h}\) is a state, then it is interpreted as a history with the single initial state.

Definition 0.2.4.3

The history-dependent expectation is defined for each \(t\in \mathbb {N}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(\hat{h}\in \mathcal{H}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\), \(\tilde{b}\colon \mathcal{H} \to \mathcal{\mathbb {B}}\) as

\begin{equation*} \mathbb {E}^{\hat{h}, \pi , t} [\tilde{x} \mid \tilde{b} ] := \mathbb {E}\left[ \tilde{x} \mid \tilde{b} \right]. \end{equation*}

In the \(\mathbb {E}\) operator above, the random variables \(\tilde{x}\) and \(\tilde{b}\) live in a probability space \((\Omega , p)\) where \(\Omega = \mathcal{H}(\hat{h}, t)\) and \(p(h) = p^{\mathrm{h}}(h, \pi ), \forall h\in \Omega \). Moreover, if \(\hat{h}\) is a state, then it is interpreted as a history with the single initial state.

Definition 0.2.4.4

The history-dependent expectation is defined for each \(t\in \mathbb {N}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(\hat{h}\in \mathcal{H}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\), \(\tilde{y}\colon \mathcal{H} \to \mathcal{\mathcal{V}}\) as

\begin{equation*} \mathbb {E}^{\hat{h}, \pi , t} [\tilde{x} \mid \tilde{y} ](h) := \mathbb {E}\left[ \tilde{x} \mid \tilde{y} = \tilde{y}(h) \right](h), \quad \forall h\in \mathcal{H}(\hat{h}, t). \end{equation*}

In the \(\mathbb {E}\) operator above, the random variables \(\tilde{x}\) and \(\tilde{h}\) live in a probability space \((\Omega , p)\) where \(\Omega = \mathcal{H}(\hat{h}, t)\) and \(p(h) = p^{\mathrm{h}}(h, \pi ), \forall h\in \Omega \). Moreover, if \(\hat{h}\) is a state, then it is interpreted as a history with the single initial state.

0.2.5 Basic Properties

Theorem 0.2.5.1

Assume \(\tilde{x} \colon \mathcal{H} \to \mathbb {R}\) and \(c \in \mathbb {R}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):

\[ \mathbb {E}^{\hat{h}, \pi , t} \left[ c + \tilde{x} \right] = c + \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{x} \right]. \]

Proof ▶

Directly from theorem 0.1.5.19.

Theorem 0.2.5.2

Suppose that \(\tilde{x}, \tilde{y} \colon \mathcal{H} \to \mathbb {R}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):

\[ \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{x} + \tilde{y} \right] = \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{x} \right] + \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{y} \right]. \]

Proof ▶

From theorem 0.1.5.14.

Theorem 0.2.5.3

Suppose that \(c\in \mathbb {R}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):

\[ \mathbb {E}^{\hat{h}, \pi , t} \left[ c \right] = c. \]

Proof ▶

From theorem 0.1.5.15.

Theorem 0.2.5.4

Suppose that \(\tilde{x}, \tilde{y} \colon \mathcal{H} \to \mathbb {R}\) satisfy that \(\tilde{x}(h) = \tilde{y}(h), \forall h \in \mathcal{H}\). Then \(\forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}, t \in \mathbb {N}\):

\[ \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{x} \right] = c + \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{y} \right]. \]

Proof ▶

From theorem 0.1.5.16.

Theorem 0.2.5.5

For each \(\hat{h}\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), and \(t\in \mathbb {N}\):

\[ \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{r}^{\mathrm{h}}\right] = \mathbb {E}^{\hat{h}, \pi , t} \left[ \sum _{k=0}^{|\tilde{\operatorname {id}}|-1} r(\tilde{s}_k, \tilde{a}_k, \tilde{s}_{k+1}) \right], \]

where \(\tilde{\operatorname {id}}(h)\) is the identity function, \(|\cdot |\) is the length of a history (0-based), \(\tilde{s}_k\colon \mathcal{H} \to \mathcal{S}\) and \(\tilde{a}_k\colon \mathcal{H} \to \mathcal{A}\) are the 0-based \(k\)-th state and action, respectively of each history.

Proof ▶

Follows from theorem 0.2.5.1 and the equality of the reward function \(\tilde{r}^{\mathrm{h}}\) and the sum in the expectation.

Theorem 0.2.5.6

For each \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), and \(t\in \mathbb {N}\):

\[ \mathbb {E}^{h, \pi , t} \left[ \tilde{r}^{\mathrm{h}}\right] = \tilde{r}^{\mathrm{h}}(h) + \mathbb {E}^{h, \pi , t} \left[ \tilde{r}^{\mathrm{h}}_{\ge k_0} \right], \]

where \(k_0:=|h|\).

Proof ▶

Follows from theorem 0.2.5.4.

Theorem 0.2.5.7

For each \(\hat{h}\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(t\in \mathbb {N}\), \(h\in \mathcal{H}\):

\begin{equation*} \mathbb {P}^{\hat{h},\pi .t}[\tilde{s}_{k_0}=\tilde{s}_{k_0}(\omega ) \wedge \tilde{a}_{k_0} = \tilde{a}_{k_0}(\omega ) ] {\gt} 0 \quad \Rightarrow \quad \mathbb {E}^{\hat{h}, \pi , t} \left[ \tilde{r}^{\mathrm{h}}_{k_0} \mid \tilde{s}_{k_0}, \tilde{a}_{k_0}\right](h) = \tilde{r}^{\mathrm{h}}_{k_0}(h), \forall h \in \mathcal{H}. \end{equation*}

where \(k_0:= |\hat{h}|\).

Proof ▶

From theorem 0.1.5.15.

Theorem 0.2.5.8

Assume \(h \in \mathcal{H}\) and \(f \colon \mathcal{H} \to \mathbb {R}\) such that \(s_0 := s_{|h|}(h)\)

\[ f(\left\langle h, a, s \right\rangle ) = f(\left\langle s_0, a, a \right\rangle ) , \forall a\in \mathcal{A}, s\in \mathcal{S}. \]

Then

\[ \mathbb {E}^{h,\pi ,1}\left[ \tilde{f} \right] = \mathbb {E}^{s_0,\pi ,1}\left[ \tilde{f} \right]. \]

Proof ▶

Directly from the definition of the expectation.

0.2.6 Total Expectation

Theorem 0.2.6.1 Total Expectation

For each \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(t\in \mathbb {N}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\) and \(\tilde{y}\colon \mathcal{H} \to \mathcal{V}\):

\[ \mathbb {E}^{h, \pi , t} \left[ \mathbb {E}^{h, \pi , t} \left[ \tilde{x} \mid \tilde{y} \right] \right] = \mathbb {E}^{h, \pi , t} \left[ \tilde{x} \right]. \]

Proof ▶

From theorem 0.1.7.2.

Theorem 0.2.6.2

Suppose that the random variable \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\) satisfies for some \(k, t \in \mathbb {N}\), with \(k \le t\), that

\[ \tilde{x}(h) = \tilde{x}(h_{\le k}) , \forall h\in \mathcal{H}, \]

where \(h_{\le k}\) is the prefix of \(h\) of length \(k\). Then for each \(h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}\):

\[ \mathbb {E}^{h,\pi ,t} \left[ \tilde{x} \right] = \mathbb {E}^{h,\pi ,k} \left[ \tilde{x} \right]. \]

0.2.7 Conditional Properties

Theorem 0.2.7.1

For each \(\beta {\gt}0\), \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), \(t\in \mathbb {N}\), \(\tilde{x}\colon \mathcal{H} \to \mathbb {R}\), \(s\in \mathcal{S}\), \(a\in \mathcal{A}\):

\[ \mathbb {E}^{h,\pi ,t+1}[\tilde{x} \mid \tilde{a}_{|h|} = a, \tilde{s}_{|h|+1} = s] = \mathbb {E}^{\langle h, a, s\rangle ,\pi ,t}[\tilde{x}], \]

Proof ▶

Let

\[ \begin{aligned} b & := \mathbb {P}^{h,\pi ,t+1} \left[ \tilde{a}_{|h|} = a, \tilde{s}_{|h|+1} = s \right] \\ & = \hat{p}_{h,\pi }(h’) \cdot \pi (h’, a) \cdot p(s_{|h'|}(h’), a, s) \\ & {\gt} 0 \end{aligned} \]

where the inequality holds from the hypothesis. Also, let

\[ \mathbb {B} := \left\{ h'\in \Omega _{h,t+1} \mid a_{|h|}(h') = a \wedge s_{|h|+1}(h') = s\right\} . \]

Note that

\begin{equation} \label{eq:set-equivalence} \mathbb {B} = \Omega _{\left\langle h', a, s \right\rangle , t}, \end{equation}

which can be seen by algebraic manipulation from 12.

Using the notation above, we can show the result as

\begin{align*} \mathbb {E}^{h,\pi ,t+1}[\tilde{x} \mid \tilde{a}_{|h|} = a, \tilde{s}_{|h|+1} = s] & = \frac{1}{b} \sum _{h'\in \mathbb {B}} \hat{p}_{h,\pi }(h’) \cdot x(h’) & & \text{[definition]}\\ & = \frac{1}{b} \sum _{h'\in \Omega _{\left\langle h', a, s \right\rangle , t}} \hat{p}_{h,\pi }(h’) \cdot x(h’) & & \text{[\cref{eq:set-equivalence}]} \\ & = \sum _{h'\in \Omega _{\left\langle h', a, s \right\rangle , t}} \hat{p}_{\left\langle h, a,s \right\rangle ,\pi }(h’) \cdot x(h’) & & \text{[\cref{eq:phat-def}]} \\ & = \mathbb {E}^{\langle h, a, s\rangle ,\pi ,t}[\tilde{x}]. & & \text{[definition]} \end{align*}

0.3 Dynamic Program: History-Dependent Finite Horizon

In this section, we derive dynamic programming equations for histories. We assume an MDP \(M = (\mathcal{S}, \mathcal{A}, p, r)\) throughout this section.

The main idea of the proof is to:

Derive (exponential-size) dynamic programming equations for the history-dependent value function of history-dependent policies
1. Define the value function
2. Define an optimal value function
Show that value functions decompose to equivalence classes
Show that the value function for the equivalence classes can be computed efficiently

0.3.1 Definitions

Definition 0.3.1.1

A finite horizon objective definition is given by \(O := (s_0, T)\) where \(s_0\in \mathcal{S}\) is the initial state and \(T\in \mathbb {N}\) is the horizon.

In the reminder of the section, we assume an objective \(O = (s_0, T)\).

Definition 0.3.1.2

The finite horizon objective function for and objective \(O\) is \(\pi \in \Pi _{\mathrm{HR}}\) is defined as

\[ \rho (\pi , O) := \mathbb {E}^{s_0, \pi , T} \left[\tilde{r}^{\mathrm{h}}\right]. \]

Definition 0.3.1.3

A policy \(\pi ^{\star }\in \Pi _{\mathrm{HR}}\) is return optimal for an objective \(O\) if

\[ \rho (\pi ^{\star }, O) \ge \rho (\pi , O), \quad \forall \pi \in \Pi _{\mathrm{HR}}. \]

Definition 0.3.1.4

The set of history-dependent value functions \(\mathcal{U}\) is defined as

\[ \mathcal{U} := \mathbb {R}^{\mathcal{H}}. \]

Definition 0.3.1.5

A history-dependent policy value function \(\hat{u}^{\pi }_t\colon \mathcal{H} \to \mathbb {R}\) for each \(h\in \mathcal{H}\), \(\pi \in \Pi _{\mathrm{HR}}\), and \(t\in \mathbb {N}\) is defined as

\[ \hat{u}^{\pi }_t(h) := \mathbb {E}^{h, \pi , t} \left[ \tilde{r}^{\mathrm{h}}_{\ge |h|} \right], \]

Definition 0.3.1.6

The optimal history-dependent value function \(\hat{u}_t^{\star }\colon \mathcal{H} \to \mathbb {R}\) is defined for a horizon \(t\in \mathbb {N}\) as

\[ \hat{u}^{\star }_t(h) := \sup _{\pi \in \Pi _{\mathrm{HR}}} \hat{u}_t^{\pi }(h). \]

The following definition is another way of defining an optimal policy.

Definition 0.3.1.7

For each \(t\in \mathbb {N}\), a policy \(\pi ^{\star }\in \Pi _{\mathrm{HR}}\) is optimal if

\[ \hat{u}_t^{\pi ^{\star }}(h) \ge \hat{u}_t^{\pi }(h), \quad \forall \pi \in \Pi _{\mathrm{HR}}, h\in \mathcal{H}. \]

Theorem 0.3.1.8

A policy \(\pi ^{\star }\in \Pi _{\mathrm{HR}}\) optimal in definition 0.3.1.7 is also optimal in definition 0.3.1.7 for any initial state \(s_0\) and horizon \(T\).

0.3.2 History-dependent Dynamic Program

The following definitions of history-dependent value functions use a dynamic program formulation.

Definition 0.3.2.1

The history-dependent policy Bellman operator \(L_{\mathrm{h}}^{\pi } \colon \mathcal{U} \to \mathcal{U}\) is defined for each \(\pi \in \Pi _{\mathrm{HR}}\) as

\[ (L_{\mathrm{h}}^{\pi } \tilde{u}) (h) := \mathbb {E}^{h, \pi , 1} \left[ \tilde{r}^{\mathrm{h}}_{|h|} + \tilde{u} \right], \quad \forall h\in \mathcal{H}, \tilde{u} \in \mathcal{U}, \]

where the value function \(\tilde{u}\) is interpreted as a random variable on defined on the sample space \(\Omega = \mathcal{H}\).

Definition 0.3.2.2

The history-dependent optimal Bellman operator \(L_{\mathrm{h}}^{\star }\colon \mathcal{U} \to \mathcal{U}\) is defined as

\[ (L^{\star }_{\mathrm{h}} \tilde{u}) (h) := \max _{a\in \mathcal{A}} \mathbb {E}^{h, a, 1} \left[ \tilde{r}^{\mathrm{h}}_{|h|} + \tilde{u} \right], \quad \forall h\in \mathcal{H}, \tilde{u} \in \mathcal{U}, \]

where the value function \(\tilde{u}\) is interpreted as a random variable on defined on the sample space \(\Omega = \mathcal{H}\).

Definition 0.3.2.3

The history-dependent DP value function \(u_t^{\pi } \in \mathcal{U}\) for a policy \(\pi \in \Pi _{\mathrm{HR}}\) and \(t\in \mathbb {N}\) is defined as

\[ u_t^{\pi } := \begin{cases} 0 & \text{if } t = 0, \\ L_{\mathrm{h}}^{\pi } u_{t-1}^{\pi } & \text{otherwise}. \end{cases} \]

Definition 0.3.2.4

The history-dependent DP value function \(u_t^{\star }\in \mathcal{U}\) for \(t\in \mathbb {N}\) is defined as

\[ u_t^{\star }:= \begin{cases} 0 & \text{if } t = 0, \\ L_{\mathrm{h}}^{\star }u_{t-1}^{\star }& \text{otherwise}. \end{cases} \]

Lemma 0.3.2.5

Suppose that \(u^1, u^2 \in \mathcal{U}\) satisfy that \(u^1(h) \ge u^2(h), \forall h\in \mathcal{H}\). Then

\[ (L_{\mathrm{h}}^{\star }u^1)(h) \ge (L_{\mathrm{h}}^{\pi } u^2)(h), \quad \forall \pi \in \Pi _{\mathrm{HR}}, h\in \mathcal{H}. \]

Proof ▶

From theorem 0.1.5.18.

The following theorem shows the history-dependent value function can be computed by the dynamic program. The following theorem is akin to ( .

Theorem 0.3.2.6

For each \(\pi \in \Pi _{\mathrm{HR}}\) and \(t\in \mathbb {N}\):

\[ \hat{u}^{\pi }_t(h) = u^{\pi }_t(h), \quad \forall h\in \mathcal{H}. \]

Proof ▶

By induction on \(t\). The base case for \(t=0\) follows from the definition. The inductive case for \(t + 1\) follows for each \(h\in \mathcal{H}\) when \(|h| = k_0\) as

\begin{align*} \hat{u}_{t+1}^{\pi }(h) & = \mathbb {E}^{h, \pi , t+1} \left[ \tilde{r}^{\mathrm{h}}_{\ge k_0} \right] & & \text{[\cref{def:u-pi}]}\\ & = \mathbb {E}^{h, \pi , t+1} \left[ \mathbb {E}^{h, \pi , t+1} \left[ \tilde{r}^{\mathrm{h}}_{\ge k_0} \mid \tilde{a}_{k_0}, \tilde{s}_{k_0+1} \right] \right] & & \text{[\cref{thm:total-expectation-h}]} \\ & = \mathbb {E}^{h, \pi , t+1} \left[\tilde{r}^{\mathrm{h}}_{k_0} + \mathbb {E}^{h, \pi , t+1} \left[ \tilde{r}^{\mathrm{h}}_{\ge k_0+1} \mid \tilde{a}_{k_0}, \tilde{s}_{k_0+1} \right] \right] & & \text{[\cref{thm:sum-rew-cnd}]} \\ & = \mathbb {E}^{h, \pi , t+1} \left[\tilde{r}^{\mathrm{h}}_{k_0} + \mathbb {E}^{\langle h,\tilde{a}_{k_0},\tilde{s}_{k_0+1}\rangle , \pi , t} \left[ \tilde{r}^{\mathrm{h}}_{\ge k_0+1} \right]\right] & & \text{[\cref{thm:exph-cond-eq-hist}]} \\ & = \mathbb {E}^{h, \pi , t+1} \left[\tilde{r}^{\mathrm{h}}_{k_0} + \hat{u}_t(\langle h,\tilde{a}_{k_0},\tilde{s}_{k_0+1}\rangle ; \pi ) \right] & & \text{[\cref{def:u-pi}]} \\ & = \mathbb {E}^{h, \pi , t+1} \left[\tilde{r}^{\mathrm{h}}_{k_0} + u^{\pi }_t(\langle h,\tilde{a}_{k_0},\tilde{s}_{k_0+1}\rangle ) \right] & & \text{[inductive assm]} \\ & = \mathbb {E}^{h, \pi , 1} \left[\tilde{r}^{\mathrm{h}}+ \tilde{u}^{\pi }_t \right] & & \text{[\cref{thm:exp-horizon-cut}]} \\ & = L_{\mathrm{h}}^{\pi } u_t^{\pi } & & \text{[\cref{def:DPhpi}]} \\ & = u^{\pi }_t(h). & & \text{[\cref{def:u-dp-pi}]} \end{align*}

Also, we use \(\tilde{u}_t^{\pi }\) to emphasize when we treat \(u^{\pi }_t\) as a random variable.

The following theorem is akin to ( .

Theorem 0.3.2.7

For each \(t\in \mathbb {N}\):

\[ u^{\star }_t(h) \ge \hat{u}_t(h; \pi ), \quad \forall h\in \mathcal{H}, \pi \in \Pi _{\mathrm{HR}}. \]

Proof ▶

By induction on \(t\). The base case is immediate. The inductive case follows for \(t+1\) as follows. For each \(\pi \in \Pi _{\mathrm{HR}}\):

\begin{align*} u_{t+1}^{\star }(h) & = (L_{\mathrm{h}}^{\star }u_t^{\star })(h) & & \text{[\cref{def:DPhopt}]} \\ & \ge (L_{\mathrm{h}}^{\pi } \hat{u}_t(\cdot ; \pi ))(h) & & \text{[ind asm, \cref{thm:dp-opt-ge-dp-pi}]} \\ & = (L_{\mathrm{h}}^{\pi } u_t^{\pi })(h) & & \text{[\cref{thm:dph-correct-vf}]} \\ & = u_t^{\pi }(h) & & \text{[\cref{def:u-pi}]} \\ & = \hat{u}_t(h; \pi ). & & \text{[\cref{thm:dph-correct-vf}]} \\ \end{align*}

0.4 Expected Dynamic Program: Markov Policy

0.4.1 Optimality

We discuss results needed to prove the optimality of Markov policies.

Definition 0.4.1.1

The set of independent value functions is defined as \(\mathcal{V} := \mathbb {R}^{\mathcal{S}}\).

Definition 0.4.1.2

A Markov Bellman operator \(L^{\star }\colon \mathcal{V} \to \mathcal{V}\) is defined as

\[ (L^{\star }v)(h) := \max _{a\in \mathcal{A}} \mathbb {E}^{h, a, 1} \left[ \tilde{r}^{\mathrm{h}}+ v(\tilde{s}_{|h|}) \right], \quad \forall \tilde{u} \in \mathcal{U}, \]

Definition 0.4.1.3

The optimal value function \(v^{\star }_t\in \mathcal{V}, t\in \mathbb {N}\) is defined as

\[ v^{\star }_t := \begin{cases} 0 & \text{if } t = 0 \\ (L^{\star }v^{\star }_{t-1}) & \text{otherwise}. \end{cases} \]

Theorem 0.4.1.4

Suppose that \(t\in \mathbb {N}\). Then:

\[ v^{\star }_t(s_{|h|}(h)) = u^{\star }_t(h), \quad \forall h\in \mathcal{H}. \]

Proof ▶

By induction on \(t\). The base case follows immediately from the definition. The inductive step for \(t+1\) follows as:

\begin{align*} u^{\star }_{t+1}(h) & = \max _{a\in \mathcal{A}} \mathbb {E}^{h, a, 1} \left[ \tilde{r}^{\mathrm{h}}_{|h|} + \tilde{u}^{\star }_t \right] & & \text{[\cref{def:u-dp-opt}]} \\ & = \max _{a\in \mathcal{A}} \mathbb {E}^{h, a, 1} \left[ \tilde{r}^{\mathrm{h}}_{|h|} + v^{\star }_t(\tilde{s}_l) \right] & & \text{[inductive asm.]} \\ & = \max _{a\in \mathcal{A}} \mathbb {E}^{s_0, a, 1} \left[ \tilde{r}^{\mathrm{h}}_{|h|} + v^{\star }_t(\tilde{s}_l) \right] & & \text{[\cref{thm:exph-horizon-trim}]} \\ & = \max _{a\in \mathcal{A}} \mathbb {E}^{s_0, a, 1} \left[ \tilde{r}^{\mathrm{h}}+ v^{\star }_t(\tilde{s}_l) \right] & & \text{[\cref{thm:exph-congr}]} \\ & = v^{\star }_{t+1}(s_{|h|}(h)) & & \text{[\cref{def:v-dp-opt}]}. \end{align*}

Definition 0.4.1.5

The optimal finite-horizon policy \(\pi ^{\star }_t, t\in \mathbb {N}\) is defined as

\[ \pi ^{\star }_t(k,s) := \begin{cases} \arg \max _{a\in \mathcal{A}} \mathbb {E}^{s, a, 1} \left[ \tilde{r}^{\mathrm{h}}+ v_{t-k}^{\star }(\tilde{s}_{|h|}) \right] & \text{if } k \le t, \\ a_0 & \text{otherwise}, \end{cases} \]

where \(a_0\) is an arbitrary action.

Theorem 0.4.1.6

Assume a horizon \(T\in \mathbb {N}\). Then:

\[ v^{\star }_{T-|h|}(s_{|h|}(h)) = u^{\pi ^{\star }_{T-h}}_{T-|h|}(h), \quad \forall h\in \left\{ h\in \mathcal{H} \mid |h| \le T \right\} . \]

Proof ▶

Fix some \(T \in \mathbb {N}\). By induction on \(k\) from \(k = T\) to \(k=0\). The base case is immediate from the definition. We prove the inductive case for \(k-1\) from \(k\) as

\begin{align*} u^{\pi ^{\star }_T}_{T-k + 1}(h) & = \mathbb {E}^{h, \pi ^{\star }_T, 1} \left[ \tilde{r}^{\mathrm{h}}_k + \tilde{u}^{\pi ^{\star }_T}_{T-k} \right] & & \text{[\cref{def:DPhpi}]} \\ & = \mathbb {E}^{h, a^{\star }, 1} \left[ \tilde{r}^{\mathrm{h}}_k + \tilde{u}^{\pi ^{\star }_T}_{T-k} \right] & & \text{[???]} \\ & = \mathbb {E}^{h, a^{\star }, 1} \left[ \tilde{r}^{\mathrm{h}}_{k} + v_{T-k}^{\star }(\tilde{s}_1) \right] & & \text{[ind asm]} \\ & = \mathbb {E}^{s_0, a^{\star }, 1} \left[ \tilde{r}^{\mathrm{h}}+ v_{T-k}^{\star }(\tilde{s}_1) \right] & & \text{[\cref{thm:exph-horizon-trim}]} \\ & = \max _{a\in \mathcal{A}} \mathbb {E}^{s_0, a, 1} \left[ \tilde{r}^{\mathrm{h}}+ v_{T-k}^{\star }(\tilde{s}_1) \right] & & \text{[???]} \\ & = v_{T-k+1}^{\star }(s_0). & & \text{[\cref{def:v-dp-opt}]} \end{align*}

Here, \(k := |h|\), \(a^{\star }:= \pi ^{\star }_T(k, s_0)\), and \(s_0 := s_{|h|}(h)\)

0.4.2 Evaluation

We discuss results pertinent to the evaluation of Markov policies.

Markov value functions depend on the length of the history.

Definition 0.4.2.1

The set of independent value functions is defined as \(\mathcal{V}_{\mathrm{M}} := \mathbb {R}^{\mathbb {N}\times \mathcal{S}}\).

Definition 0.4.2.2

A Markov policy Bellman operator \(L_k^{\pi }\colon \mathcal{V} \to \mathcal{V}\) for \(\pi \in \Pi \) is defined as

\[ (L^{\pi } v)(k, s) := \max _{a\in \mathcal{A}} \mathbb {E}^{s, a, 1} \left[ \tilde{r}^{\mathrm{h}}+ v(k + 1, \tilde{s}_{|h|}) \right], \quad \forall v\in \mathcal{V}_{\mathrm{M}}, k\in \mathbb {N}, s\in \mathcal{S}. \]

Definition 0.4.2.3

The Markov policy value function \(v^{\pi }_t\in \mathcal{V}_{\mathrm{M}}, t \in \mathbb {N}\) for \(\pi \in \Pi _{\mathrm{MD}}\) is defined as

\[ v^{\pi }_t := \begin{cases} 0 & \text{if } t = 0, \\ (L^{\pi } v^{\pi }_{t-1}) & \text{otherwise}. \end{cases} \]