nmmo.core.env module#
- class nmmo.core.env.Env(config: ~nmmo.core.config.Default = <nmmo.core.config.Default object>, seed=None)#
Bases:
ParallelEnv
- action_space(agent)#
Neural MMO Action Space
- Args:
agent: Agent ID
- Returns:
actions: gym.spaces object contained the structured actions for the specified agent. Each action is parameterized by a list of discrete-valued arguments. These consist of both fixed, k-way choices (such as movement direction) and selections from the observation space (such as targeting)
- property agents: List[str]#
For conformity with the PettingZoo API only; rendering is external
- change_task(new_tasks: List[Union[Tuple[Task, float], Task]], task_encoding: Optional[Dict[int, ndarray]] = None, embedding_size: int = 16, reset: bool = True, map_id=None, seed=None, options=None)#
Changes the task given to each agent
- Args:
new_task: The task to complete and calculate rewards task_encoding: A mapping from eid to encoded task embedding_size: The size of each embedding reset: Resets the environment
- close()#
For conformity with the PettingZoo API only; rendering is external
- property max_num_agents: int#
- metadata: Dict[str, Any] = {'name': 'neural-mmo', 'render.modes': ['human']}#
- property num_agents: int#
- observation_space(agent: int)#
Neural MMO Observation Space
- Args:
agent: Agent ID
- Returns:
observation: gym.spaces object contained the structured observation for the specified agent. Each visible object is represented by continuous and discrete vectors of attributes. A 2-layer attentional encoder can be used to convert this structured observation into a flat vector embedding.
- possible_agents: List[str]#
- render(mode='human')#
For conformity with the PettingZoo API only; rendering is external
- reset(map_id=None, seed=None, options=None)#
OpenAI Gym API reset function
Loads a new game map and returns initial observations
- Args:
idx: Map index to load. Selects a random map by default
- Returns:
observations, as documented by _compute_observations()
- Notes:
Neural MMO simulates a persistent world. Ideally, you should reset the environment only once, upon creation. In practice, this approach limits the number of parallel environment simulations to the number of CPU cores available. At small and medium hardware scale, we therefore recommend the standard approach of resetting after a long but finite horizon: ~1000 timesteps for small maps and 5000+ timesteps for large maps
- seed(seed=None)#
Reseeds the environment (making it deterministic).
- state() ndarray #
State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX
- step(actions: Dict[int, Dict[str, Dict[str, Any]]])#
Simulates one game tick or timestep
- Args:
actions: A dictionary of agent decisions of format:
{ agent_1: { action_1: [arg_1, arg_2], action_2: [...], ... }, agent_2: { ... }, ... } Where agent_i is the integer index of the i'th agent The environment only evaluates provided actions for provided gents. Unprovided action types are interpreted as no-ops and illegal actions are ignored It is also possible to specify invalid combinations of valid actions, such as two movements or two attacks. In this case, one will be selected arbitrarily from each incompatible sets. A well-formed algorithm should do none of the above. We only Perform this conditional processing to make batched action computation easier.
- Returns:
(dict, dict, dict, None):
- observations:
A dictionary of agent observations of format:
{ agent_1: obs_1, agent_2: obs_2, ... }
Where agent_i is the integer index of the i’th agent and obs_i is specified by the observation_space function.
- rewards:
A dictionary of agent rewards of format:
{ agent_1: reward_1, agent_2: reward_2, ... }
Where agent_i is the integer index of the i’th agent and reward_i is the reward of the i’th’ agent.
By default, agents receive -1 reward for dying and 0 reward for all other circumstances. Override Env.reward to specify custom reward functions
- dones:
A dictionary of agent done booleans of format:
{ agent_1: done_1, agent_2: done_2, ... }
Where agent_i is the integer index of the i’th agent and done_i is a boolean denoting whether the i’th agent has died.
Note that obs_i will be a garbage placeholder if done_i is true. This is provided only for conformity with PettingZoo. Your algorithm should not attempt to leverage observations outside of trajectory bounds. You can omit garbage obs_i values by setting omitDead=True.
- infos:
A dictionary of agent infos of format:
- {
agent_1: None, agent_2: None, …
}
Provided for conformity with PettingZoo
- property unwrapped: ParallelEnv#