Deep Q

less than 1 minute read

Published:

  • Discount factor allows for rewards in the future to be slightly minimized.
  • The further you advance in time, the more divergence with rewards. (e.g coins might not be placed in the same place in the future).
  • We do not want machine to memorize a sequence, hence discounting rewards in the future will discourage it from pursuing and exact sequence.

Deep Q Learning Algorithm

My helpful screenshot