Be the first user to complete this post

  • 0
Add to List
Beginner

5. Epsilon (ε), Epsilon-Greedy Policy and Epsilon Decay

Epsilon-Greedy Policy

In reinforcement learning, the epsilon-greedy policy is a strategy used to balance exploration and exploitation:

  • Exploration: The agent randomly selects actions to explore the environment and discover new knowledge.
  • Exploitation: The agent selects actions based on its current knowledge to maximize the reward.

Epsilon (ε)

  • Epsilon (ε): A parameter that determines the probability of choosing a random action (exploration) versus the best-known action (exploitation).
    • When ε is high, the agent explores more.
    • When ε is low, the agent exploits more.

Epsilon Decay

  • Epsilon Decay: To ensure the agent initially explores the environment but gradually shifts to exploiting its knowledge, ε is decreased over time.
    • self.epsilon_decay: A factor by which ε is multiplied after each episode to reduce its value gradually.
    • self.epsilon_min: The minimum value of ε to ensure that the agent always retains a small probability of exploring.

Purpose of the Code

The specific code snippet checks if the current value of ε is greater than the minimum threshold (self.epsilon_min). If it is, ε is multiplied by the decay factor (self.epsilon_decay) to decrease its value gradually. This ensures:

  1. Initial Exploration: At the start, the agent explores the environment widely due to a higher ε.
  2. Gradual Shift to Exploitation: Over time, as the agent learns, ε decreases, leading the agent to exploit its learned policy more frequently.
  3. Prevent Stagnation: By ensuring ε never goes below a certain minimum value (self.epsilon_min), the agent retains some degree of exploration to avoid getting stuck in local optima.