Networking for Mesos-Managed Containers

While networking plays a key role in data center infrastructure, it is -- for now -- beyond the scope of Mesos to try to address the concerns of networking setup, topology and performance. However, Mesos can ease integrations with existing networking solutions and enable features, like IP per container, task-granular task isolation and service discovery. More often than not, it will be challenging to provide a one-size-fits-all networking solution. The requirements and available solutions will vary across all cloud-only, on-premise, and hybrid deployments.

One of the primary goals for the networking support in Mesos was to have a pluggable mechanism to allow users to enable custom networking solution as needed. As a result, several extensions were added to Mesos components in version 0.25.0 to enable networking support. Further, all the extensions are opt-in to allow older frameworks and applications without networking support to coexist with the newer ones.

The rest of this document describes the overall architecture of all the involved components, configuration steps for enabling IP-per-container, and required framework changes.

How does it work?

Mesos Networking Architecture

A key observation is that the networking support is enabled via a Mesos module and thus the Mesos master and agents are completely oblivious of it. It is completely up to the networking module to provide the desired support. Next, the IP requests are provided on a best effort manner. Thus, the framework should be willing to handle ignored (in cases where the module(s) are not present) or declined (the IPs can't be assigned due to various reasons) requests.

To maximize backwards-compatibility with existing frameworks, schedulers must opt-in to network isolation per-container. Schedulers opt in to network isolation using new data structures in the TaskInfo message.

Terminology

  • IP Address Management (IPAM) Server
  • assigns IPs on demand
  • recycles IPs once they have been released
  • (optionally) can tag IPs with a given string/id.

  • IPAM client

  • tightly coupled with a particular IPAM server
  • acts as a bridge between the "Network Isolator Module" and the IPAM server
  • communicates with the server to request/release IPs

  • Network Isolator Module (NIM):

  • a Mesos module for the Agent implementing the Isolator interface
  • looks at TaskInfos to detect the IP requirements for the tasks
  • communicates with the IPAM client to request/release IPs
  • communicates with an external network virtualizer/isolator to enable network isolation

  • Cleanup Module:

  • responsible for doing a cleanup (releasing IPs, etc.) during an Agent lost event, dormant otherwise

Framework requests IP address for containers

  1. A Mesos framework uses the TaskInfo message to requests IPs for each container being launched. (The request is ignored if the Mesos cluster doesn't have support for IP-per-container.)

  2. Mesos Master processes TaskInfos and forwards them to the Agent for launching tasks.

Network isolator module gets IP from IPAM server

  1. Mesos Agent inspects the TaskInfo to detect the container requirements (MesosContainerizer in this case) and prepares various Isolators for the to-be-launched container.
  2. The NIM inspects the TaskInfo to decide whether to enable network isolator or not.

  3. If network isolator is to be enabled, NIM requests IP address(es) via IPAM client and informs the Agent.

Agent launches container with a network namespace

  1. The Agent launches a container within a new network namespace.
  2. The Agent calls into NIM to perform "isolation"
  3. The NIM then calls into network virtualizer to isolate the container.

Network virtualizer assigns IP address to the container and isolates it.

  1. NIM then "decorates" the TaskStatus with the IP information.
  2. The IP address(es) from TaskStatus are made available at Master's /state endpoint.
  3. The TaskStatus is also forwarded to the framework to inform it of the IP addresses.
  4. When a task is killed or lost, NIM communicates with IPAM client to release corresponding IP address(es).

Cleanup module detects lost Agents and performs cleanup

  1. The cleanup module gets notified if there is an Agent-lost event.

  2. The cleanup module communicates with the IPAM client to release all IP address(es) associated with the lost Agent. The IPAM may have a grace period before the address(es) are recycled.

Configuration

The network isolator module is not part of standard Mesos distribution. However, there is an example implementation at https://github.com/mesosphere/net-modules.

Once the network isolation module has been built into a shared dynamic library, we can load it into Mesos Agent (see modules documentation on instructions for building and loading a module).

Enabling frameworks for IP-per-container capability

NetworkInfo

A new NetworkInfo message has been introduced:

message NetworkInfo {
  enum Protocol {
    IPv4 = 1;
    IPv6 = 2;
  }

  message IPAddress {
    optional Protocol protocol = 1;

    optional string ip_address = 2;
  }

  repeated IPAddress ip_addresses = 5;

  optional string name = 6;

  optional Protocol protocol = 1 [deprecated = true]; // Since 0.26.0

  optional string ip_address = 2 [deprecated = true]; // Since 0.26.0

  repeated string groups = 3;

  optional Labels labels = 4;
};

When requesting an IP address from the IPAM, one needs to set the protocol field to IPv4 or IPv6. Setting ip_address to a valid IP address allows the framework to specify a static IP address for the container (if supported by the NIM). This is helpful in situations where a task must be bound to a particular IP address even as it is killed and restarted on a different node.

Setting name to a valid network name allows the framework to specify a network for the container to join, it is up to the network isolator to decide how to interpret this field, e.g., network/cni isolator will interpret it as the name of a CNI network.

Examples of specifying network requirements

Frameworks wanting to enable IP per container, need to provide NetworkInfo message in TaskInfo. Here are a few examples:

  1. A request for one address of unspecified protocol version using the default command executor

TaskInfo { ... command: ..., container: ContainerInfo { network_infos: [ NetworkInfo { ip_addresses: [ IPAddress { protocol: None; ip_address: None; } ] groups: []; labels: None; } ] } }

  1. A request for one IPv4 and one IPv6 address, in two groups using the default command executor

TaskInfo { ... command: ..., container: ContainerInfo { network_infos: [ NetworkInfo { ip_addresses: [ IPAddress { protocol: IPv4; ip_address: None; }, IPAddress { protocol: IPv6; ip_address: None; } ] groups: ["dev", "test"]; labels: None; } ] } }

  1. A request for two network interfaces, each with one IP address, each in a different network group using the default command executor

TaskInfo { ... command: ..., container: ContainerInfo { network_infos: [ NetworkInfo { ip_addresses: [ IPAddress { protocol: None; ip_address: None; } ] groups: ["foo"]; labels: None; }, NetworkInfo { ip_addresses: [ IPAddress { protocol: None; ip_address: None; } ] groups: ["bar"]; labels: None; }, ] } }

  1. A request for a specific IP address using a custom executor

TaskInfo { ... executor: ExecutorInfo { ..., container: ContainerInfo { network_infos: [ NetworkInfo { ip_addresses: [ IPAddress { protocol: None; ip_address: "10.1.2.3"; } ] groups: []; labels: None; } ] } } }

  1. A request for joining a specific network using the default command executor

TaskInfo { ... command: ..., container: ContainerInfo { network_infos: [ NetworkInfo { name: "network1"; } ] } }

NOTE: The Mesos Containerizer will reject any CommandInfo that has a ContainerInfo. For this reason, when opting in to network isolation when using the Mesos Containerizer, set TaskInfo.ContainerInfo.NetworkInfo.

Address Discovery

The NetworkInfo message allows frameworks to request IP address(es) to be assigned at task launch time on the Mesos agent. After opting in to network isolation for a given executor's container in this way, frameworks will need to know what address(es) were ultimately assigned in order to perform health checks, or any other out-of-band communication.

This is accomplished by adding a new field to the TaskStatus message.

message ContainerStatus {
   repeated NetworkInfo network_infos;
}

message TaskStatus {
  ...
  optional ContainerStatus container;
  ...
};

Further, the container IP addresses are also exposed via Master's state endpoint. The JSON output from Master's state endpoint contains a list of task statuses. If a task's container was started with it's own IP address, the assigned IP address will be exposed as part of the TASK_RUNNING status.

NOTE: Since per-container address(es) are strictly opt-in from the framework, the framework may ignore the IP address(es) provided in StatusUpdate if it didn't set NetworkInfo in the first place.

Writing a Custom Network Isolator Module

A network isolator module implements the Isolator interface provided by Mesos. The module is loaded as a dynamic shared library in to the Mesos Agent and gets hooked up in the container launch sequence. A network isolator may communicate with external IPAM and network virtualizer tools to fulfill framework requirements.

In terms of the Isolator API, there are three key callbacks that a network isolator module should implement:

  1. Isolator::prepare() provides the module with a chance to decide whether or not the enable network isolation for the given task container. If the network isolation is to be enabled, the Isolator::prepare call would inform the Agent to create a private network namespace for the coordinator. It is this interface, that will also generate an IP address (statically or with the help of an external IPAM agent) for the container.

  2. Isolator::isolate() provide the module with the opportunity to isolate the container after it has been created but before the executor is launched inside the container. This would involve creating virtual ethernet adapter for the container and assigning it an IP address. The module can also use help of an external network virtualizer/isolator for setting up network for the container.

  3. Isolator::cleanup() is called when the container terminates. This allows the module to perform any cleanups such as recovering resources and releasing IP addresses as needed.