News: 2.0 is now in soft-launch. The environment is stable and a working version of the RL baseline is available, but not everything has been fully tuned and documented.

Neural MMO is a computationally accessible, open-source research platform that simulates populations of agents in virtual worlds. We challenge you to train a team of agents to complete tasks they have never seen before against opponents they have never seen before on maps they have never seen before. Our objective is to spur research on increasingly general and cognitively realistic environments.

Click to Demo Neural MMO in your browser

Competition

Compete for $20,000 in prizes at NeurIPS 2023

Baselines

Training curves and metrics on WandB

Paper

Read our NeurIPS 2021 Datasets paper

Discord

Join our community for support and discussion

Github

Neural MMO is free and open-source software

Twitter

Follow the creator on Twitter

2023 Competition#

Successfully complete the most tasks to win! At stake are $20,000 in prizes sponsored by Parametrix.ai. All submissions receive A100 compute credits for training sponsored by Stability.ai. The competition is currently planned for the start of July 2023.

Neural MMO (NMMO) has three tracks to compete and win. In all tracks, the objective is for your team of 8 agents to accomplish more tasks than 15 other opponent teams. There are 128 Agents in play at the start of each round, and your submission will be evaluated over thousands of rounds with increasingly difficult tasks. Lobbies are made by a matchmaking algorithm that selects 16 teams of similar skill level. For the RL and Curriculum tracks, all entrants receive up to 8 hours of free A100 compute time per submission to train.

Reinforcement Learning

Train teams of agents using RL to complete tasks. Customize the RL algorithm, model, and reward structure, but leverage a fixed baseline curriculum of tasks for training.

This is an opportunity for you RL enthusiasts to test your skills building agents that can survive and thrive in a massively multiagent environment full of potential adversaries. Your task is to implement a policy that defines how your 8 Agent team performs within a novel environment. At the outset of each game, your team will receive a randomly generated task. Complete the task to score a point. We will evaluate submissions against each other over thousands of games. Whoever scores the most points wins.

The RL track includes control over the RL algorithm, environment rewards signal, observation featurization, and the neural network architecture. The presentation and sampling of tasks are provided by the baseline and are treated as constants. All RL agent teams are trained on the same baseline task curriculum. While hybrid methods are allowed, with the new emphasis on tasks, it is unlikely that pure traditional scripting will be effective.

We release a baseline repository that includes a model adapted from NetEase’s winning submission to the NeurIPS 2022 competition, a fixed curriculum of procedurally generated tasks, a single-file CleanRL PPO implementation, PufferLib integration for simpler training, and WandB for logging and visualization. The baseline is designed to be easy to use and modify. We encourage you to use it as a starting point for your own submissions.

To get started:

train.py contains the main training file. Modify hyperprameters and scale here.
cleanrl_ppo_lstm.py contains the CleanRL PPO implementation. Modify it to alter the training algorithm. This version includes PufferLib integration and asynchronous environment execution.
/model contains the network definition. This is an advanced architecture with a custom featurizer and multiple subnetworks dedicated to processing different types of information.
/feature_extractor preprocesses observations from the environment before they are passed to the network. It separately processes the map, inventory, and market observations.

# Run training. This is very memory intensive!
# We are working on a smaller config
# The --use_serial_vecenv flat puts envs on a
# local process and is useful for debugging
python train.py

# Evaluate a trained checkpoint
python -m tools.evaluate --model.checkpoint model_weights/achievements_4x10_new.200.pt

Curriculum Generation

The Curriculum track is a great way for programmers to compete and participate, without the need for advanced knowledge of AI. In this track, you will design unique and useful curricula for training successful teams on tasks. A curriculum is a structured set of tasks presented to the RL algorithm intelligently to maximize learning. Design the task generator, task sampler, and reward using Python.

All submitted curricula will be applied to the same baseline RL policy to control a team of agents. Your objective is to create a curriculum of tasks that results in better, more robust learning such that agents are able to complete tasks not seen during training. You will receive performance metrics to see how effective the curriculum is and iterate your training curriculum. The reinforcement learning algorithm, observation featurization, and neural network architecture are provided by the baseline and remain constant across teams.

The baseline for this track includes a fixed curriculum of tasks and OpenELM integration. For researchers and advanced users, we encourage approaches leveraging ELM and provide a code generation model with the baselines.

By default, Neural MMO provides a reward signal of 1 every tick the agent is alive. Our goal is to provide a flexible, powerful high level API to define rewards - and simple enough for even a language model to program. For example, to reward teams for exploring the map

scenario = Scenario(config)
scenario.add_tasks(p.DistanceTraveled(dist=64))
env.change_task(scenario.tasks)

We define a list of tasks, one for each team - to collectively travel 64 tiles away from the starting position. Agents are gradually rewarded as they move away, with a total reward summed to 1 on completion.

Glossary of key terms

GameState is a simplified read-only snapshot view of the environment.
Group is an immutable set of agents.
Predicate is a special, clipped case of Task.
Scenario is a utility class to help assign subjects to tasks.
Task is a mapping from GameState to a reward shared across its (subject: Group). We provide utilities that cover many use cases.

Get started by defining your own tasks by building from our provided set of operators.

task = t.OR(p.CountEvent(event='PLAYER_KILL',N=5),p.TickGE(num_tick=5))
task = task * 5
scenario.add_tasks(task)

# Rewarding the agent for increasing time isn't helpful for training
# Try improving this task!

Some possibilities include OR different tasks to count progress towards either, and MUL (overloaded operator “*””) to scale up rewards. It is possible to explicitly assign subjects and groups to tasks.

env.change_task([StayAlive(Group([agent])) for agent in agents])

More expressivity is possible from decorators @define_task and @define_predicate.

@t.define_task
def KillTask(gs: GameState,
             subject: Group): # Annotated with Group to expose env variables
  """ Reward 0.1 per player defeated, with a bonus for the 1st and 3rd kills."""
  num_kills = len(subject.event.PLAYER_KILL)
  score = num_kills * 0.1

  if num_kills >= 1:
    score += 1

  # You can use other tasks in a definition!
  if p.CountEvent(subject=subject, event='PLAYER_KILL',N=3)(gs) == 1.0:
    score += 1

  return score

# scenario also accepts fn(Group -> Task), and calls this for all desired
# Groups. The default behavior (passing in Task) is similar to the
# lambda definition below.
# Defined across agents instead of teams.
scenario.add_tasks(lambda agent: KillTask(subject=agent), groups='agents')

We return a score for an input GameState and the reward each tick is the change in score. Advanced usage can involve directly inheriting from the base Task class or subclasses.

# TaskOperator itself is a subclass of Task
class Repeat(TaskOperator):
  def __init__(self, task: Task, subject: Group=None):
    """ The reward each turn is the value of the operand."""
    super().__init__(lambda n: n==1, task, subject=subject)
    self._current_score = 0

  def _evaluate(self, gs: GameState) -> float:
    self._current_score += self._tasks[0](gs)
    return self._current_score

  def sample(self, config: Config, **kwargs):
    return super().sample(config, Repeat, **kwargs)

No Holds Barred

Combine RL and curriculum approaches. Entrants provide their own compute to win via any way possible - just don’t hack our servers!

Deploy both RL and Curriculum approaches to create the ultimate 8 Agent team policy. All methods are open and no constraints on (self-provided) compute. Only restrictions are: no unauthorized modifications of the game or other submissions.

If you are here, you know how to get started. Use any of the above baselines or build your own from scratch. This is the only track that does not strictly require winners to open-source their code. However, we strongly encourage you to do so.

Platform#

The project was inspired by classic Massively Multiplayer Online Role-Playing Games (MMOs) - a genre defined by interaction with a large number of other players. It is a platform for creating intelligent agents parameterized by neural networks. Our goal is to support a broad base of multiagent research that would be impractical or impossible to conduct using other environments. Unlike other game genres typically used in research, MMOs simulate persistent worlds that support rich player interactions and a wider variety of progression strategies. These properties seem important to intelligence in the real world. The massively multiagent setting allow player teams to interact in interesting ways and use entirely different strategies.

from nmmo import Env

# Default environment - see API for config options
env = Env(config=None)
obs = env.reset()

while True:
   actions = {} # Compute with your model
   obs, rewards, dones, infos = env.step(actions)

Environments provide a standard PettingZoo API. Join our community Discord and post in #support for help (do not raise Github issues for support). See the cards at the top of this page for source code, baselines, latest publications, social media, and news!

Neural MMO#

2023 Competition#

Platform#