Oracle
An oracle is an object that estimates the expected payoff for a player in a given state and determines the optimal strategy for that state. A strategy is represented as a probability distribution over the actions available to the player.
An oracle must implement the following protocol:
class Oracle(ABC, Generic[Observation]):
@abstractmethod
def get_value(self, observation: Observation) -> Payoff:
"""Return the expected payoff in the given observation"""
@abstractmethod
def get_probabilities(
self, observation: Observation, legal_mask: LegalMask
) -> Probabilities:
"""Returns the probabilities for each action.
The legal mask defines which actions are allowed.
"""
Below there is a list of oracles which are built-in the mrl library.
For most uses in the Game Runner, implementing this Oracle protocol
is enough. AlphaZero training is stricter: run_alpha_zero requires a
TrainableOracle that can also save and load its parameters.
RandomRollout
name: RandomRollout
number_of_rollouts: 15
Note
This oracle is not trainable with AlphaZero.
This oracle estimates the expected payoff by performing
number_of_rollouts simulations using random actions.
The expected payoff is computed as the average payoff across all simulations. The returned action probability distribution is uniform over the legal actions.
OpenSpielMLP
name: OpenSpielMLP
capacity:
input_size: 18
output_size: 9
nn_width: 2
nn_depth: 2
Note
This oracle is trainable with AlphaZero.
This oracle is implemented as a multi-layer perceptron (MLP). It takes
as input the state of the game encoded as an n-dimensional vector
and produces:
a probability distribution over the available actions, and
an estimate of the expected payoff.
The neural network follows the OpenSpiel architecture. It
contains nn_depth hidden layers in addition to the input and output
layers. Each hidden layer has nn_width neurons, and each layer is
fully connected to the next.
The output consists of two heads:
a policy head, which produces logits for the action probability distribution;
a value head, which estimates the expected payoff.
input_size defines the dimension of the input vector, while
output_size defines the number of possible actions and therefore the
dimension of the probability distribution.
OpenSpielConv
name: OpenSpielConv
capacity:
input_shape: [2, 7, 7]
output_size: 7
nn_width: 2
nn_depth: 2
Note
This oracle is trainable with AlphaZero.
This oracle is implemented as a convolutional neural network. It takes
as input the state of the game encoded as a array with shape (channels
× height × width). Reshaping is performed automatically, so a
perspective may return an array of any shape, provided that the total
number of elements equals channels × height × width.
The network produces both:
a probability distribution over the available actions, and
an estimate of the expected payoff.
The architecture follows the OpenSpiel design. It includes
nn_depth convolutional layers in addition to the input and output
layers. Each convolutional layer produces nn_width output channels.
Batch normalization and ReLU activation functions are applied after each convolutional layer.
OpenSpielResnet
name: OpenSpielResnet
capacity:
input_shape: [2, 7, 7]
output_size: 7
nn_width: 2
nn_depth: 2
Note
This oracle is trainable with AlphaZero.
This oracle is implemented as a convolutional neural network with residual connections.
The network takes the game state encoded as a tensor with shape
(channels × height × width) and produces both a probability
distribution over the available actions and an estimate of the expected
payoff.
The architecture follows the OpenSpiel residual design. It
includes nn_depth residual blocks in addition to the input and
output layers.
Each residual block contains two convolutional layers producing
nn_width channels. The output of a residual block is obtained by
adding the block input to the output of the second convolutional layer,
forming a residual connection.