Deep Q

less than 1 minute read

Published: September 12, 2018

Discount factor allows for rewards in the future to be slightly minimized.
The further you advance in time, the more divergence with rewards. (e.g coins might not be placed in the same place in the future).
We do not want machine to memorize a sequence, hence discounting rewards in the future will discourage it from pursuing and exact sequence.

Deep Q Learning Algorithm

My helpful screenshot

less than 1 minute read

Published: February 17, 2019

Supervised Learning algorithms require a target output

Underfitting can be a result of having a lack of hidden neurons or lack of data

Overtraining can be a result of too many hidden neurons

Each neuron has many inputs but only one output

Activation layer of hidden layer must be nonlinear