nmmo.core.env module#

class nmmo.core.env.Env(config: ~nmmo.core.config.Default = <nmmo.core.config.Default object>, seed=None)#

Bases: ParallelEnv

action_space(agent)#

Neural MMO Action Space

Args:

agent: Agent ID

Returns:

actions: gym.spaces object contained the structured actions for the specified agent. Each action is parameterized by a list of discrete-valued arguments. These consist of both fixed, k-way choices (such as movement direction) and selections from the observation space (such as targeting)

property agents: List[str]#

For conformity with the PettingZoo API only; rendering is external

change_task(new_tasks: List[Union[Tuple[Task, float], Task]], task_encoding: Optional[Dict[int, ndarray]] = None, embedding_size: int = 16, reset: bool = True, map_id=None, seed=None, options=None)#

Changes the task given to each agent

Args:

new_task: The task to complete and calculate rewards task_encoding: A mapping from eid to encoded task embedding_size: The size of each embedding reset: Resets the environment

close()#

For conformity with the PettingZoo API only; rendering is external

property max_num_agents: int#
metadata: Dict[str, Any] = {'name': 'neural-mmo', 'render.modes': ['human']}#
property num_agents: int#
observation_space(agent: int)#

Neural MMO Observation Space

Args:

agent: Agent ID

Returns:

observation: gym.spaces object contained the structured observation for the specified agent. Each visible object is represented by continuous and discrete vectors of attributes. A 2-layer attentional encoder can be used to convert this structured observation into a flat vector embedding.

possible_agents: List[str]#
render(mode='human')#

For conformity with the PettingZoo API only; rendering is external

reset(map_id=None, seed=None, options=None)#

OpenAI Gym API reset function

Loads a new game map and returns initial observations

Args:

idx: Map index to load. Selects a random map by default

Returns:

observations, as documented by _compute_observations()

Notes:

Neural MMO simulates a persistent world. Ideally, you should reset the environment only once, upon creation. In practice, this approach limits the number of parallel environment simulations to the number of CPU cores available. At small and medium hardware scale, we therefore recommend the standard approach of resetting after a long but finite horizon: ~1000 timesteps for small maps and 5000+ timesteps for large maps

seed(seed=None)#

Reseeds the environment (making it deterministic).

state() ndarray#

State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX

step(actions: Dict[int, Dict[str, Dict[str, Any]]])#

Simulates one game tick or timestep

Args:

actions: A dictionary of agent decisions of format:

    {
      agent_1: {
          action_1: [arg_1, arg_2],
          action_2: [...],
          ...
      },
      agent_2: {
          ...
      },
      ...
    }

Where agent_i is the integer index of the i'th agent

The environment only evaluates provided actions for provided
gents. Unprovided action types are interpreted as no-ops and
illegal actions are ignored

It is also possible to specify invalid combinations of valid
actions, such as two movements or two attacks. In this case,
one will be selected arbitrarily from each incompatible sets.

A well-formed algorithm should do none of the above. We only
Perform this conditional processing to make batched action
computation easier.
Returns:

(dict, dict, dict, None):

observations:

A dictionary of agent observations of format:

{
  agent_1: obs_1,
  agent_2: obs_2,
  ...
}

Where agent_i is the integer index of the i’th agent and obs_i is specified by the observation_space function.

rewards:

A dictionary of agent rewards of format:

{
  agent_1: reward_1,
  agent_2: reward_2,
  ...
}

Where agent_i is the integer index of the i’th agent and reward_i is the reward of the i’th’ agent.

By default, agents receive -1 reward for dying and 0 reward for all other circumstances. Override Env.reward to specify custom reward functions

dones:

A dictionary of agent done booleans of format:

{
  agent_1: done_1,
  agent_2: done_2,
  ...
}

Where agent_i is the integer index of the i’th agent and done_i is a boolean denoting whether the i’th agent has died.

Note that obs_i will be a garbage placeholder if done_i is true. This is provided only for conformity with PettingZoo. Your algorithm should not attempt to leverage observations outside of trajectory bounds. You can omit garbage obs_i values by setting omitDead=True.

infos:

A dictionary of agent infos of format:

{

agent_1: None, agent_2: None, …

}

Provided for conformity with PettingZoo

tasks: List[Tuple[Task, float]]#
property unwrapped: ParallelEnv#