############### Release notes ############### ************ MRL v0.3.0 ************ - Added a new predefined policy, AlphaBetaPolicy, implementing the alpha-beta search algorithm with rollouts. - Made report generator policies configurable, enabling monitoring of the trained policy against user-defined combinations of test policies. - Replaced the model retention scheme based on direct comparison with previous models with a tournament-based ranking system using the TrueSkill rating algorithm. - Standardized terminology: reward now refers to immediate game transition outcomes, while payoff denotes the outcome of an entire game or play sequence. - Renamed PayoffPerspective to RewardPerspective and PayoffObservable to RewardObservable. ************ MRL v0.2.0 ************ - Fixed issues that were negatively affecting training effectiveness - Added support for Dirichlet root noise in MCTS simulations to increase training data diversity - Made the evaluation policy configurable, enabling optimization of trained models for specific evaluation strategies - Fixed an issue that prevented training from resuming correctly after an initial session - Improved validation messages for incorrect configurations ************ MRL v0.1.0 ************ The initial release v0.1.0 includes: - The game framework and the game runner; - An implementation of AlphaZero; - The implementation of example games: TicTacToe, StraightFour and Xiangqi; - Documentation, tutorials and examples.