Deep Q
Published:
- Discount factor allows for rewards in the future to be slightly minimized.
- The further you advance in time, the more divergence with rewards. (e.g coins might not be placed in the same place in the future).
- We do not want machine to memorize a sequence, hence discounting rewards in the future will discourage it from pursuing and exact sequence.