Many modern host-level operating systems (e.g. Linux, BSDs, etc) support multiple users. Similarly, Mesos is a multi-user cluster management system, with the expectation of a single Mesos cluster managing an organization's resources and servicing the organization's users.
As such, Mesos has to address a number of requirements related to resource management:
- Fair sharing of the resources amongst users
- Providing resource guarantees to users (e.g. quota, priorities, isolation)
- Providing accurate resource accounting
- How many resources are allocated / utilized / etc?
- Per-user accounting
In Mesos, we refer to these "users" as roles. More precisely, a role within Mesos refers to a resource consumer within the cluster. This resource consumer could represent a user within an organization, but it could also represent a team, a group, a service, a framework, etc.
Schedulers subscribe to one or more roles in order to receive resources and schedule work on behalf of the resource consumer(s) they are servicing.
Some examples of resource allocation guarantees that Mesos provides:
- Guaranteeing that a role is allocated a specified amount of resources (via quota).
- Ensuring that some (or all) of the resources on a particular agent are allocated to a particular role (via reservations).
- Ensuring that resources are fairly shared between roles (via DRF).
- Expressing that some roles should receive a higher relative share of the cluster (via weights).
Roles and access control
There are two ways to control which roles a framework is allowed to subscribe to. First, ACLs can be used to specify which framework principals can subscribe to which roles. For more information, see the authorization documentation.
Second, a role whitelist can be configured by passing the
--roles flag to
the Mesos master at startup. This flag specifies a comma-separated list of role
names. If the whitelist is specified, only roles that appear in the whitelist
can be used. To change the whitelist, the Mesos master must be restarted. Note
that in a high-availability deployment of Mesos, you should take care to ensure
that all Mesos masters are configured with the same whitelist.
In Mesos 0.26 and earlier, you should typically configure both ACLs and the whitelist, because in these versions of Mesos, any role that does not appear in the whitelist cannot be used.
In Mesos 0.27, this behavior has changed: if
--roles is not specified, the
whitelist permits any role name to be used. Hence, in Mesos 0.27, the
recommended practice is to only use ACLs to define which roles can be used; the
--roles command-line flag is deprecated.
Associating frameworks with roles
A framework specifies which roles it would like to subscribe to when it subscribes with the master.
As a framework developer, you must specify the roles you would like to
subscribe to via the
As a user, you can typically specify which role(s) a framework will
subscribe to when you start the framework. How to do this depends on the
user interface of the framework you're using. For example, a single user
scheduler might take a
--mesos_role command-line flag, and a multi-user
scheduler might take a
--mesos-roles command-line flag or sync with
the organization's LDAP system to automatically adjust which roles it
is subscribed to as the organization's structure changes.
Multiple frameworks in the same role
Multiple frameworks can be subscribed to the same role. This can be useful: for example, one framework can create a persistent volume and write data to it. Once the task that writes data to the persistent volume has finished, the volume will be offered to other frameworks subscribed to the same role; this might give a second ("consumer") framework the opportunity to launch a task that reads the data produced by the first ("producer") framework.
However, configuring multiple frameworks to use the same role should be done with caution, because all the frameworks will have access to any resources that have been reserved for that role. For example, if a framework stores sensitive information on a persistent volume, that volume might be offered to a different framework subscribed to the same role. Similarly, if one framework creates a persistent volume, another framework subscribed to the same role might "steal" the volume and use it to launch a task of its own. In general, multiple frameworks sharing the same role should be prepared to collaborate with one another to ensure that role-specific resources are used appropriately.
Associating resources with roles
A resource is assigned to a role using a reservation. Resources can either be reserved statically (when the agent that hosts the resource is started) or dynamically: frameworks and operators can specify that a certain resource should subsequently be reserved for use by a given role. For more information, see the reservation documentation.
The role named
* is special. Unreserved resources are currently represented
as having the special
* role (the idea being that
* matches any role). By
default, all the resources at an agent node are unreserved (this can be changed
--default_role command-line flag when starting the agent).
In addition, when a framework registers without providing a
FrameworkInfo.role, it is assigned to the
* role. In Mesos 1.3, frameworks
should use the
FrameworkInfo.roles field, which does not assign a default of
*, but frameworks can still specify
* explicitly if desired. Frameworks
and operators cannot make reservations to the
A role name must be a valid directory name, so it cannot:
- Be an empty string
- Start with
- Contain any slash, backspace, or whitespace character
Roles and resource allocation
By default, the Mesos master uses weighted Dominant Resource Fairness (wDRF) to allocate resources. In particular, this implementation of wDRF first identifies which role is furthest below its fair share of the role's dominant resource. Each of the frameworks subscribed to that role are then offered additional resources in turn.
The resource allocation process can be customized by assigning
weights to roles: a role with a weight of 2 will be allocated
twice the fair share of a role with a weight of 1. By default, every role has a
weight of 1. Weights can be configured using the
/weights operator endpoint, or else using the
--weights command-line flag when starting the Mesos master.
Roles and quota
In order to guarantee that a role is allocated a specific amount of resources, quota can be specified via the /quota endpoint.
The resource allocator will first attempt to satisfy the quota requirements, before fairly sharing the remaining resources. For more information, see the quota documentation.
Role vs. Principal
A principal identifies an entity that interacts with Mesos; principals are similar to user names. For example, frameworks supply a principal when they register with the Mesos master, and operators provide a principal when using the operator HTTP endpoints. An entity may be required to authenticate with its principal in order to prove its identity, and the principal may be used to authorize actions performed by an entity, such as resource reservation and persistent volume creation/destruction.
Roles, on the other hand, are used exclusively for resource allocation, as covered above.