###### Game ###### A game defines the rules and structure of an interactive environment in which one or more players make decisions over time. Conceptually, a game has two main responsibilities: - **State generation**: it acts as a factory for creating the initial state of a new game. - **State transitions**: it defines how the game evolves when players take actions, producing new states according to the game rules. A game also defines **perspectives**. A perspective represents how a specific player observes the game and which actions are available to that player. This abstraction allows the framework to support: - perfect-information games (all players observe the full state), - partial-information games (players observe only part of the state), - different observation encodings for different algorithms. A game must implement the following protocols: .. code:: python @runtime_checkable class Game(Protocol[StateCo, Player]): @abstractmethod def make_initial_state(self) -> StateCo: """Create the initial state of the game""" @abstractmethod def get_players(self) -> tuple[Player, ...]: """Returns the player identifiers.""" @abstractmethod def get_perspectives(self) -> Mapping[Player, Perspective]: """Return the player perspectives.""" class Perspective(Protocol[StateContra, ObservationCo, ActionSpaceCo]): @abstractmethod def get_observation(self, state: StateContra) -> ObservationCo: """Returns an observation made by a player under the perspective""" @abstractmethod def get_action_space(self, state: StateContra) -> ActionSpaceCo: """Returns the space of actions the player is allowed to make in the state""" A game compatible with the AlphaZero algorithm must also implement the following protocols: .. code:: python @runtime_checkable class Restorable(Protocol[StateCo, ObservationContra]): @abstractmethod def restore(self, observation: ObservationContra) -> StateCo: """Returns a state that is compatible with the given observation""" @runtime_checkable class TurnBased(Protocol[State, ActionContra]): @abstractmethod def update(self, state: State, action: ActionContra) -> State: """Updates the game state according to the action of the active player""" class RewardObservable(Protocol[StateContra, Player, ObservationCo]): @abstractmethod def get_perspectives(self) -> Mapping[Player, RewardPerspective[StateContra, ObservationCo]]: """Return the player perspectives.""" In addition the perspective for an MCTS game must implemet the following protocols: .. code:: python @runtime_checkable class RewardPerspective(Perspective, Protocol[StateContra, ObservationCo]): @abstractmethod def get_reward(self, state: StateContra) -> Reward: """Returns observed reward in the given state""" @runtime_checkable class HasActionSpaceDimension(Protocol): @property @abstractmethod def action_space_dimension(self) -> int: """Returns the number of distinct actions allowed in the action space.""" @runtime_checkable class MCTSPerspective(Protocol[StateContra, ActionCo]): @abstractmethod def get_core(self, state: StateContra) -> np.ndarray: """Return the essential part of the observation, used by the oracle to compute value and probabilities. """ @abstractmethod def get_action_space(self, state: StateContra) -> tuple[ActionCo, ...]: """Returns the observed action space as a tuple of actions""" Note: The signature of ``get_action_space`` in ``MCTSPerspective`` requires the ActionSpace to be a tuple of Action objects, unlike the more generic ``get_action_space`` signature defined in ``Perspective``. Below there is a list of games which are built-in the mrl library. *********** TicTacToe *********** .. code:: yaml name: TicTacToe first_player: X name: MCTSTicTacToe first_player: random This is a classic 3×3 grid game in which two players alternately place their symbol (``X`` or ``O``) on the board. The objective is to create a line of three identical symbols horizontally, vertically, or diagonally. Actions are represented by integers from ``0`` to ``8`` corresponding to the nine cells of the grid. Two implementations are provided: - ``TicTacToe``: a standard implementation suitable for manual play and basic policy evaluation. - ``MCTSTicTacToe``: an implementation designed for use with MCTS and AlphaZero training. In the MCTS variant, the board state is encoded as a vector of 18 elements by stacking two binary representations: one for the current player's pieces and one for the opponent's pieces. The ``first_player`` parameter determines who moves first. It can be ``X``, ``O``, or ``random``. The game includes built-in terminal and Tk GUI interfaces for manual play. *************** Straight Four *************** .. code:: yaml name: StraightFour first_player: O name: MCTSStraightFour first_player: random Two players attempt to create a line of four of their symbols on a 7×7 board by dropping tokens into columns. Actions are represented by integers from ``0`` to ``6`` corresponding to the seven columns of the board. Two implementations are provided: - ``StraightFour``: a standard implementation suitable for manual play and policy evaluation. - ``MCTSStraightFour``: a variant adapted for MCTS and AlphaZero training. In the MCTS variant, the board is encoded as two ``7×7`` matrices, representing the positions of each player's tokens. The ``first_player`` parameter determines which player moves first (``X``, ``O``, or ``random``). Terminal and Tk GUI interfaces are provided for manual play. ********* Xiangqi ********* .. code:: yaml name: Xiangqi first_player: Red step_limit: 200 name: MCTSXiangqi first_player: Black This is an implementation of **Xiangqi**, a two-player strategy game played on a 9×10 board. Rules can be found at `xiangqi.com `_. Note that repetition rules are not enforced as in the standard ruleset. In this implementation, the game always ends in a draw once the step_limit is reached (200 by default). Actions are represented as tuples: ``(x_origin, y_origin, x_destination, y_destination)`` which specify the piece to move and its destination square. Two implementations are available: - ``Xiangqi``: a standard version for manual play and evaluation. - ``MCTSXiangqi``: a variant designed for MCTS and AlphaZero training. In the MCTS version, actions are encoded as integers in the range ``0``\ –\ ``127``. The board state is represented as a tensor of shape ``14 × 10 × 9``, where each channel corresponds to the presence of a specific piece type. The ``first_player`` parameter determines whether ``Red`` or ``Black`` moves first, although ``Red`` is always the first player in the official rules. Terminal and GUI interfaces are provided for manual play. ********************* Rock Paper Scissors ********************* .. code:: yaml name: RockPaperScissors number_of_rounds: 5 This is an example **discrete-time simultaneous game**. In each round, both players simultaneously choose one of three actions: ``rock``, ``paper``, or ``scissors``. The winner of the round is determined by the standard rules: - rock beats scissors - scissors beats paper - paper beats rock The game is repeated for ``number_of_rounds`` rounds. The overall winner is the player who wins the most rounds. Actions are represented by their symbolic names (``rock``, ``paper``, ``scissors``). This example demonstrates how the framework can represent games where all players act simultaneously rather than sequentially.