Release notes

MRL v0.3.0

  • Added a new predefined policy, AlphaBetaPolicy, implementing the alpha-beta search algorithm with rollouts.

  • Made report generator policies configurable, enabling monitoring of the trained policy against user-defined combinations of test policies.

  • Replaced the model retention scheme based on direct comparison with previous models with a tournament-based ranking system using the TrueSkill rating algorithm.

  • Standardized terminology: reward now refers to immediate game transition outcomes, while payoff denotes the outcome of an entire game or play sequence.

  • Renamed PayoffPerspective to RewardPerspective and PayoffObservable to RewardObservable.

MRL v0.2.0

  • Fixed issues that were negatively affecting training effectiveness

  • Added support for Dirichlet root noise in MCTS simulations to increase training data diversity

  • Made the evaluation policy configurable, enabling optimization of trained models for specific evaluation strategies

  • Fixed an issue that prevented training from resuming correctly after an initial session

  • Improved validation messages for incorrect configurations

MRL v0.1.0

The initial release v0.1.0 includes:

  • The game framework and the game runner;

  • An implementation of AlphaZero;

  • The implementation of example games: TicTacToe, StraightFour and Xiangqi;

  • Documentation, tutorials and examples.