Posts

Showing posts from September, 2018

Markov Decision Processes 02: how the discount factor works

Image
Pt En In this previous post I defined a Markov Decision Process and explained all of its components; now, we will be exploring what the discount factor $\gamma$ really is and how it influences the MDP. Let us start with the complete example of last post: In this MDP the states are Hungry and Thirsty (which we will represent with $H$ and $T$) and the actions are Eat and Drink (which we will represent with $E$ and $D$). The transition probabilities are specified by the numbers on top of the arrows. In the previous post we put forward that the best policy for this MDP was defined as $$\begin{cases} \pi(H) = E\\ \pi(T) = D\end{cases}$$ but I didn't really prove that. I will do that in a second, but first what are all the other possible policies? Well, recall that the policy $\pi$ is the "best strategy" to be followed, and $\pi$ is formally seen as a function from the states to the actions, i.e. $\pi: S \to A$. Because of that, we must know what $\pi(H)...

Markov Decision Processes 01: the basics

Image
Pt En In this post I will introduce Markov Decision Processes, a common tool used in Reinforcement Learning, a branch of Machine Learning. By the end of the post you will be able to make some sense of the figure above! I will couple the formal details, definitions and maths with an intuitive example that will accompany us throughout this post. In later posts we will make our example more complete and use other examples to explain other properties and characteristics of the MDPs. Let me introduce the context of the example: From a simplistic point of view, I only have two moods: " hungry " and " thirsty ". Thankfully, my parents taught me how to eat and how to drink, so that I can fulfill the needs I mentioned earlier. Of course that eating when I am hungry makes me happy, just as drinking when I am thirsty makes me happy! Not only that, but eating when I am hungry usually satisfies me, much like drinking when I am thirsty usually satisfies me. ...

Twitter proof: folding my way to the moon

Image
Pt En In this twitter proof we will see how the exponential function can mess up with objects from our daily lives!.. Claim: with less than $50$ folds, a piece of paper will be so thick that it will cover the distance from the Earth to the Moon. Twitter proof: a common sheet of paper is $0.1$mm thick. If we fold it once, it becomes $0.2$mm thick. Folding twice, $0.4$mm. Folding $49$ times, the paper becomes $2^{49}\times 0.1$mm thick, which is around $5.63\times 10^{13} $mm or $5.63\times10^7$km, $141$ times the distance from the Earth to the Moon ($398818$km). Neste post vamos ver como a função exponencial pode interagir com objetos do nosso quotidiano e criar resultados inesperados. Proposição: com menos de $50$ dobras, uma folha de papel fica com uma grossura superior à distância da Terra à Lua. Prova num tweet: uma folha de papel normal tem $0.1$mm de grossura. Se a dobrarmos uma vez, fica com $0.2$mm de grossura. Dobrando de novo, fica com $0.4$mm. ...

Twitter proof: neural networks and the linear activation function

Pt En In this post we will see why it is not helpful to have two consecutive layers of neurons with linear activation functions in neural networks. With just a bit of maths we can conclude that $n$ consecutive linear layers compute the exact same functions as $1$ single layer. Claim: having two fully connected layers of neurons using linear activation functions is the same as having just one layer with a linear activation function. We just have to lay down some notation in order for the maths to be doable. Assume the two consecutive layers of linear neurons are preceded by a layer with $n_0$ neurons, whose outpus are $o_1, o_2, \cdots, o_{n_0}$. Let us say that after that layer, there is a layer of $n_1$ neurons with linear activation functions $f_i(x) = a_ix + b_i$; the weight from neuron $t$ of the previous layer to the neuron $i$ of this layer is $w_{t,i}$. The second layer of neurons has $n_2$ neurons, with linear activation functions $f_i'(x) = a_i'x+b_...

Pledging to do 100 days of Machine Learning and progress log!

Image
Pt En It is a shame but I kind of dropped this when I was $41\%$ done... I hope I man up and finish this in the near future. After watching this video from Siraj Raval, I decided to jump right on board of the #100daysofMLcode initiative! (even though I am something like 73 days late...) The goal here is to devote (at least) 1h every day, for the next 100 days, to studying ML or writing code! According to the rules posted by Siraj, I must: Make a public pledge for this, which this post is; Make a log of everything, which this post will also be; Whenever I see something related to this #100DaysofMLCode, be supportive! Progress log For the day $0$ I wrote this post and spent quite some time thinking about what I will do throughout. I am thinking of studying several topics about ML and then writing educative posts here, for the blog. For today I wrote this twitter proof , tackling a mathematical property of neural networks with linear activation functions. ...

MatchWalker, a puzzle game of shape and colour

Image
Pt En In today's post I will be sharing a game I made with just under $400$ lines in Processing, a wrapper for Java that makes drawing to the screen really easy. The goal of the game is really simple: go from the cell you are standing on (marked with the black outline of the ellipse, in the screenshot) to the cell that is framed in white. To do that, you can move a "cursor" (the black frame) with the $AWSD$ keys to choose the next cell you want to go to. To move, press the space bar. There are a couple of rules to moving, though: You can only move to the selected cell if it is in the same row or same column as the cell you are in; You can only move to the selected cell if it has the same colour or the same shape as the cell you are in. Rule number $1$ says you can only go in the directions these orange arrows cover: Rule number $2$ says that, from the cells specified by the above rule, you can only go to the white circle, diamond or vertical el...

Pocket maths: your verification code is 446267

Image
Pt En It has become quite common for online services to provide some form of 2-factor authentication when logging in from unknown devices. For example, whenever I try to access my Gmail account from a computer I never used, I get a text message with a one-time use 6-digit code. One day I was using that same service to log in into my email, when I noticed that one of the digits in the security code appeared twice, like the $1$ in $315641$. But when I read the other text messages from Google, I noticed that there were plenty more security codes with repeated digits than security codes that had six different digits. I found that weird and then decided to compute the probabilities of these events, just to check whether my intuition was tricking me or not... We are about to compute some probabilities regarding these $6$-digit codes - which I will start calling PINs for the sake of brevity - with the rather intuitive formula $$P(\text{some property}\ A) = \frac{\text{# P...