######## Oracle ######## An oracle is an object that estimates the expected payoff for a player in a given state and determines the optimal strategy for that state. A strategy is represented as a probability distribution over the actions available to the player. An oracle must implement the following protocol: .. code:: python class Oracle(ABC, Generic[Observation]): @abstractmethod def get_value(self, observation: Observation) -> Payoff: """Return the expected payoff in the given observation""" @abstractmethod def get_probabilities( self, observation: Observation, legal_mask: LegalMask ) -> Probabilities: """Returns the probabilities for each action. The legal mask defines which actions are allowed. """ Below there is a list of oracles which are built-in the mrl library. For most uses in the Game Runner, implementing this ``Oracle`` protocol is enough. AlphaZero training is stricter: ``run_alpha_zero`` requires a ``TrainableOracle`` that can also save and load its parameters. *************** RandomRollout *************** .. code:: yaml name: RandomRollout number_of_rollouts: 15 .. note:: This oracle is not trainable with AlphaZero. This oracle estimates the expected payoff by performing ``number_of_rollouts`` simulations using random actions. The expected payoff is computed as the average payoff across all simulations. The returned action probability distribution is uniform over the legal actions. ************** OpenSpielMLP ************** .. code:: yaml name: OpenSpielMLP capacity: input_size: 18 output_size: 9 nn_width: 2 nn_depth: 2 .. note:: This oracle is trainable with AlphaZero. This oracle is implemented as a multi-layer perceptron (MLP). It takes as input the state of the game encoded as an ``n``-dimensional vector and produces: - a probability distribution over the available actions, and - an estimate of the expected payoff. The neural network follows the `OpenSpiel `_ architecture. It contains ``nn_depth`` hidden layers in addition to the input and output layers. Each hidden layer has ``nn_width`` neurons, and each layer is fully connected to the next. The output consists of two heads: - a **policy head**, which produces logits for the action probability distribution; - a **value head**, which estimates the expected payoff. ``input_size`` defines the dimension of the input vector, while ``output_size`` defines the number of possible actions and therefore the dimension of the probability distribution. *************** OpenSpielConv *************** .. code:: yaml name: OpenSpielConv capacity: input_shape: [2, 7, 7] output_size: 7 nn_width: 2 nn_depth: 2 .. note:: This oracle is trainable with AlphaZero. This oracle is implemented as a convolutional neural network. It takes as input the state of the game encoded as a array with shape ``(channels × height × width)``. Reshaping is performed automatically, so a perspective may return an array of any shape, provided that the total number of elements equals channels × height × width. The network produces both: - a probability distribution over the available actions, and - an estimate of the expected payoff. The architecture follows the `OpenSpiel `_ design. It includes ``nn_depth`` convolutional layers in addition to the input and output layers. Each convolutional layer produces ``nn_width`` output channels. Batch normalization and ReLU activation functions are applied after each convolutional layer. ***************** OpenSpielResnet ***************** .. code:: yaml name: OpenSpielResnet capacity: input_shape: [2, 7, 7] output_size: 7 nn_width: 2 nn_depth: 2 .. note:: This oracle is trainable with AlphaZero. This oracle is implemented as a convolutional neural network with residual connections. The network takes the game state encoded as a tensor with shape ``(channels × height × width)`` and produces both a probability distribution over the available actions and an estimate of the expected payoff. The architecture follows the `OpenSpiel `_ residual design. It includes ``nn_depth`` residual blocks in addition to the input and output layers. Each residual block contains two convolutional layers producing ``nn_width`` channels. The output of a residual block is obtained by adding the block input to the output of the second convolutional layer, forming a residual connection.