### New blog!!!

< change language

My blog is being migrated to mathspp.com!!

O meu blogue está a ser migrado para o meu novo site: mathspp.com!!

- RGS

- Get link
- Other Apps

< change language

My blog is being migrated to mathspp.com!!

O meu blogue está a ser migrado para o meu novo site: mathspp.com!!

- RGS

- Get link
- Other Apps

PtEn< change language
This post's problem is brought to you by my struggles while cooking. I bought 4 raw chicken hamburgers; two of them were plain chicken burgers, the other two were already seasoned, "american-style" (whatever that meant). In practice, I could tell them apart because the american-style burgers were orange and the plain chicken burgers were light-pink*ish*.

I had never had one of those "american-style" (AS) burgers and I was slightly afraid I wouldn't enjoy them, so I decided I would have half of a regular burger and half of the AS burger for dinner.

I started cooking the burgers, and at some point I couldn't tell them apart by colour, as you can see in the first picture of this post: they all looked the same colour! So I panicked a little bit: how can I be sure that for my dinner I will only have half of a regular burger and half of an AS burger? Of course in my mind I couldn't just take a bite of each, because that is not math…

I had never had one of those "american-style" (AS) burgers and I was slightly afraid I wouldn't enjoy them, so I decided I would have half of a regular burger and half of the AS burger for dinner.

I started cooking the burgers, and at some point I couldn't tell them apart by colour, as you can see in the first picture of this post: they all looked the same colour! So I panicked a little bit: how can I be sure that for my dinner I will only have half of a regular burger and half of an AS burger? Of course in my mind I couldn't just take a bite of each, because that is not math…

PtEn< change language
This problem that I am posting here today was inspired by an awesome video by 3blue1brown.

**Problem statement:** for a given $\epsilon > 0$, is there a way for you to cover all the rational numbers in the interval $[0, 1]$ with small intervals $I_k$, such that the sum of the lengths of the intervals $I_k$ is less than or equal to $\epsilon$? In other words (with almost no words), for what values of $\epsilon > 0$ is there a collection $\{I_k\}$ of intervals such that
$$\left(\mathbb{Q}\cap [0,1]\right) \subseteq \left(\cup_k I_k \right) \wedge \sum_k |I_k| < \epsilon$$

**Solution:** such a family of intervals always exists, for any value of $\epsilon > 0$. We start by noticing that the rational numbers in the interval $[0, 1]$ are countably many, which means I can order them as $q_1, q_2, q_3, \cdots$. If you haven't solved the problem yet, take the hint I just gave you and try to solve it.

After enumerating the rationals inside $[0, 1]$, we define $…

After enumerating the rationals inside $[0, 1]$, we define $…

PtEn< change language
In this previous post I defined a Markov Decision Process and explained all of its components; now, we will be exploring what the discount factor $\gamma$ really is and how it influences the MDP.

Let us start with the complete example of last post:

In this MDP the states are Hungry and Thirsty (which we will represent with $H$ and $T$) and the actions are Eat and Drink (which we will represent with $E$ and $D$). The transition probabilities are specified by the numbers on top of the arrows. In the previous post we put forward that the best policy for this MDP was defined as $$\begin{cases} \pi(H) = E\\ \pi(T) = D\end{cases}$$ but I didn't really prove that. I will do that in a second, but first what are all the other possible policies? Well, recall that the policy $\pi$ is the*"best strategy"* to be followed, and $\pi$ is formally seen as a function from the states to the actions, i.e. $\pi: S \to A$. Because of that, we must know what $\pi(H)$ a…

Let us start with the complete example of last post:

In this MDP the states are Hungry and Thirsty (which we will represent with $H$ and $T$) and the actions are Eat and Drink (which we will represent with $E$ and $D$). The transition probabilities are specified by the numbers on top of the arrows. In the previous post we put forward that the best policy for this MDP was defined as $$\begin{cases} \pi(H) = E\\ \pi(T) = D\end{cases}$$ but I didn't really prove that. I will do that in a second, but first what are all the other possible policies? Well, recall that the policy $\pi$ is the

This comment has been removed by a blog administrator.

ReplyDelete