### New blog!!!

< change language

My blog is being migrated to mathspp.com!!

O meu blogue está a ser migrado para o meu novo site: mathspp.com!!

- RGS

- Get link
- Other Apps

< change language

My blog is being migrated to mathspp.com!!

O meu blogue está a ser migrado para o meu novo site: mathspp.com!!

- RGS

- Get link
- Other Apps

PtEn< change language
In this previous post I defined a Markov Decision Process and explained all of its components; now, we will be exploring what the discount factor $\gamma$ really is and how it influences the MDP.

Let us start with the complete example of last post:

In this MDP the states are Hungry and Thirsty (which we will represent with $H$ and $T$) and the actions are Eat and Drink (which we will represent with $E$ and $D$). The transition probabilities are specified by the numbers on top of the arrows. In the previous post we put forward that the best policy for this MDP was defined as $$\begin{cases} \pi(H) = E\\ \pi(T) = D\end{cases}$$ but I didn't really prove that. I will do that in a second, but first what are all the other possible policies? Well, recall that the policy $\pi$ is the*"best strategy"* to be followed, and $\pi$ is formally seen as a function from the states to the actions, i.e. $\pi: S \to A$. Because of that, we must know what $\pi(H)$ a…

Let us start with the complete example of last post:

In this MDP the states are Hungry and Thirsty (which we will represent with $H$ and $T$) and the actions are Eat and Drink (which we will represent with $E$ and $D$). The transition probabilities are specified by the numbers on top of the arrows. In the previous post we put forward that the best policy for this MDP was defined as $$\begin{cases} \pi(H) = E\\ \pi(T) = D\end{cases}$$ but I didn't really prove that. I will do that in a second, but first what are all the other possible policies? Well, recall that the policy $\pi$ is the

PtEn< change language
This post's problem is brought to you by my struggles while cooking. I bought 4 raw chicken hamburgers; two of them were plain chicken burgers, the other two were already seasoned, "american-style" (whatever that meant). In practice, I could tell them apart because the american-style burgers were orange and the plain chicken burgers were light-pink*ish*.

I had never had one of those "american-style" (AS) burgers and I was slightly afraid I wouldn't enjoy them, so I decided I would have half of a regular burger and half of the AS burger for dinner.

I started cooking the burgers, and at some point I couldn't tell them apart by colour, as you can see in the first picture of this post: they all looked the same colour! So I panicked a little bit: how can I be sure that for my dinner I will only have half of a regular burger and half of an AS burger? Of course in my mind I couldn't just take a bite of each, because that is not math…

I had never had one of those "american-style" (AS) burgers and I was slightly afraid I wouldn't enjoy them, so I decided I would have half of a regular burger and half of the AS burger for dinner.

I started cooking the burgers, and at some point I couldn't tell them apart by colour, as you can see in the first picture of this post: they all looked the same colour! So I panicked a little bit: how can I be sure that for my dinner I will only have half of a regular burger and half of an AS burger? Of course in my mind I couldn't just take a bite of each, because that is not math…

PtEn< change language
In this post I will introduce Markov Decision Processes, a common tool used in Reinforcement Learning, a branch of Machine Learning. By the end of the post you will be able to make some sense of the figure above!

I will couple the formal details, definitions and maths with an intuitive example that will accompany us throughout this post. In later posts we will make our example more complete and use other examples to explain other properties and characteristics of the MDPs.

Let me introduce the context of the example:

From a simplistic point of view, I only have two moods: "*hungry*" and "*thirsty*". Thankfully, my parents taught me how to eat and how to drink, so that I can fulfill the needs I mentioned earlier. Of course that eating when I am hungry makes me happy, just as drinking when I am thirsty makes me happy! Not only that, but eating when I am hungry *usually* satisfies me, much like drinking when I am thirsty *usually* satisfies me.

Supp…

I will couple the formal details, definitions and maths with an intuitive example that will accompany us throughout this post. In later posts we will make our example more complete and use other examples to explain other properties and characteristics of the MDPs.

Let me introduce the context of the example:

From a simplistic point of view, I only have two moods: "

Supp…

This comment has been removed by a blog administrator.

ReplyDeleteThe article is so appealing. You should read this article before choosing the Big Data Solutions Developer you want to learn.

ReplyDelete