Release notes

MRL v0.3.0

Added a new predefined policy, AlphaBetaPolicy, implementing the alpha-beta search algorithm with rollouts.
Made report generator policies configurable, enabling monitoring of the trained policy against user-defined combinations of test policies.
Replaced the model retention scheme based on direct comparison with previous models with a tournament-based ranking system using the TrueSkill rating algorithm.
Standardized terminology: reward now refers to immediate game transition outcomes, while payoff denotes the outcome of an entire game or play sequence.
Renamed PayoffPerspective to RewardPerspective and PayoffObservable to RewardObservable.

Fixed issues that were negatively affecting training effectiveness
Added support for Dirichlet root noise in MCTS simulations to increase training data diversity
Made the evaluation policy configurable, enabling optimization of trained models for specific evaluation strategies
Fixed an issue that prevented training from resuming correctly after an initial session
Improved validation messages for incorrect configurations

The initial release v0.1.0 includes: