Roles

Many modern host-level operating systems (e.g. Linux, BSDs, etc) support multiple users. Similarly, Mesos is a multi-user cluster management system, with the expectation of a single Mesos cluster managing an organization's resources and servicing the organization's users.

As such, Mesos has to address a number of requirements related to resource management:

  • Fair sharing of the resources amongst users
  • Providing resource guarantees to users (e.g. quota, priorities, isolation)
  • Providing accurate resource accounting
    • How many resources are allocated / utilized / etc?
    • Per-user accounting

In Mesos, we refer to these "users" as roles. More precisely, a role within Mesos refers to a resource consumer within the cluster. This resource consumer could represent a user within an organization, but it could also represent a team, a group, a service, a framework, etc.

Schedulers subscribe to one or more roles in order to receive resources and schedule work on behalf of the resource consumer(s) they are servicing.

Some examples of resource allocation guarantees that Mesos provides:

  • Guaranteeing that a role is allocated a specified amount of resources (via quota).
  • Ensuring that some (or all) of the resources on a particular agent are allocated to a particular role (via reservations).
  • Ensuring that resources are fairly shared between roles (via DRF).
  • Expressing that some roles should receive a higher relative share of the cluster (via weights).

Roles and access control

There are two ways to control which roles a framework is allowed to subscribe to. First, ACLs can be used to specify which framework principals can subscribe to which roles. For more information, see the authorization documentation.

Second, a role whitelist can be configured by passing the --roles flag to the Mesos master at startup. This flag specifies a comma-separated list of role names. If the whitelist is specified, only roles that appear in the whitelist can be used. To change the whitelist, the Mesos master must be restarted. Note that in a high-availability deployment of Mesos, you should take care to ensure that all Mesos masters are configured with the same whitelist.

In Mesos 0.26 and earlier, you should typically configure both ACLs and the whitelist, because in these versions of Mesos, any role that does not appear in the whitelist cannot be used.

In Mesos 0.27, this behavior has changed: if --roles is not specified, the whitelist permits any role name to be used. Hence, in Mesos 0.27, the recommended practice is to only use ACLs to define which roles can be used; the --roles command-line flag is deprecated.

Associating frameworks with roles

A framework specifies which roles it would like to subscribe to when it subscribes with the master.

As a framework developer, you must specify the roles you would like to subscribe to via the FrameworkInfo.roles field.

As a user, you can typically specify which role(s) a framework will subscribe to when you start the framework. How to do this depends on the user interface of the framework you're using. For example, a single user scheduler might take a --mesos_role command-line flag, and a multi-user scheduler might take a --mesos-roles command-line flag or sync with the organization's LDAP system to automatically adjust which roles it is subscribed to as the organization's structure changes.

Multiple frameworks in the same role

Multiple frameworks can be subscribed to the same role. This can be useful: for example, one framework can create a persistent volume and write data to it. Once the task that writes data to the persistent volume has finished, the volume will be offered to other frameworks subscribed to the same role; this might give a second ("consumer") framework the opportunity to launch a task that reads the data produced by the first ("producer") framework.

However, configuring multiple frameworks to use the same role should be done with caution, because all the frameworks will have access to any resources that have been reserved for that role. For example, if a framework stores sensitive information on a persistent volume, that volume might be offered to a different framework subscribed to the same role. Similarly, if one framework creates a persistent volume, another framework subscribed to the same role might "steal" the volume and use it to launch a task of its own. In general, multiple frameworks sharing the same role should be prepared to collaborate with one another to ensure that role-specific resources are used appropriately.

Associating resources with roles

A resource is assigned to a role using a reservation. Resources can either be reserved statically (when the agent that hosts the resource is started) or dynamically: frameworks and operators can specify that a certain resource should subsequently be reserved for use by a given role. For more information, see the reservation documentation.

Default role

The role named * is special. Unreserved resources are currently represented as having the special * role (the idea being that * matches any role). By default, all the resources at an agent node are unreserved (this can be changed via the --default_role command-line flag when starting the agent).

In addition, when a framework registers without providing a FrameworkInfo.role, it is assigned to the * role. In Mesos 1.3, frameworks should use the FrameworkInfo.roles field, which does not assign a default of *, but frameworks can still specify * explicitly if desired. Frameworks and operators cannot make reservations to the * role.

Invalid role

A role name must be a valid directory name, so it cannot:

  • Be an empty string
  • Be . or ..
  • Start with -
  • Contain any slash, backspace, or whitespace character

Roles and resource allocation

By default, the Mesos master uses weighted Dominant Resource Fairness (wDRF) to allocate resources. In particular, this implementation of wDRF first identifies which role is furthest below its fair share of the role's dominant resource. Each of the frameworks subscribed to that role are then offered additional resources in turn.

The resource allocation process can be customized by assigning weights to roles: a role with a weight of 2 will be allocated twice the fair share of a role with a weight of 1. By default, every role has a weight of 1. Weights can be configured using the /weights operator endpoint, or else using the deprecated --weights command-line flag when starting the Mesos master.

Roles and quota

In order to guarantee that a role is allocated a specific amount of resources, quota can be specified via the /quota endpoint.

The resource allocator will first attempt to satisfy the quota requirements, before fairly sharing the remaining resources. For more information, see the quota documentation.

Role vs. Principal

A principal identifies an entity that interacts with Mesos; principals are similar to user names. For example, frameworks supply a principal when they register with the Mesos master, and operators provide a principal when using the operator HTTP endpoints. An entity may be required to authenticate with its principal in order to prove its identity, and the principal may be used to authorize actions performed by an entity, such as resource reservation and persistent volume creation/destruction.

Roles, on the other hand, are used exclusively for resource allocation, as covered above.