Mesos Configuration

The Mesos master and slave can take a variety of configuration options through command-line arguments, or environment variables. A list of the available options can be seen by running mesos-master --help or mesos-slave --help. Each option can be set in two ways:

  • By passing it to the binary using --option_name=value, either specifying the value directly, or specifying a file in which the value resides (--option_name=file://path/to/file). The path can be absolute or relative to the current working directory.
  • By setting the environment variable MESOS_OPTION_NAME (the option name with a MESOS_ prefix added to it).

Configuration values are searched for first in the environment, then on the command-line.

Important Options

If you have special compilation requirements, please refer to ./configure --help when configuring Mesos. Additionally, the documentation lists only a subset of the options. A definitive source for which flags your version of Mesos supports can be found by running the binary with the flag --help, for example mesos-master --help.

Master and Slave Options

These options can be supplied to both masters and slaves.

Flag Explanation
--ip=VALUE IP address to listen on
--[no-]help Prints this help message (default: false)
--log_dir=VALUE Location to put log files (no default, nothing is written to disk unless specified; does not affect logging to stderr)
--logbufsecs=VALUE How many seconds to buffer log messages for (default: 0)
--logging_level=VALUE Log message at or above this level; possible values: 'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this will affect just the logs from log_dir (if specified) (default: INFO)
--port=VALUE Port to listen on (master default: 5050 and slave default: 5051)
--[no-]quiet Disable logging to stderr (default: false)
--[no-]version Show version and exit. (default: false)

Master Options

Required Flags

Flag Explanation
--quorum=VALUE The size of the quorum of replicas when using 'replicated_log' based registry. It is imperative to set this value to be a majority of masters i.e., quorum > (number of masters)/2.

NOTE Not required if master is run in standalone mode (non-HA).

--work_dir=VALUE Where to store the persistent information stored in the Registry.
--zk=VALUE ZooKeeper URL (used for leader election amongst masters) May be one of:
zk://host1:port1,host2:port2,.../path
zk://username:password@host1:port1,host2:port2,.../path
file:///path/to/file (where file contains one of the above)

NOTE Not required if master is run in standalone mode (non-HA).

Optional Flags

Flag Explanation
--acls=VALUE The value is a JSON formatted string of ACLs. Remember you can also use the file:///path/to/file or /path/to/file argument value format to write the JSON in a file.

See the ACLs protobuf in mesos.proto for the expected format.

JSON file example:

{
  "register_frameworks": [
    {
      "principals": { "type": "ANY" },
      "roles": { "values": ["a"] }
    }
  ],
  "run_tasks": [
    {
      "principals": { "values": ["a", "b"] },
      "users": { "values": ["c"] }
    }
  ],
  "shutdown_frameworks": [
    {
      "principals": { "values": ["a", "b"] },
      "framework_principals": { "values": ["c"] }
    }
  ]
}
--allocation_interval=VALUE Amount of time to wait between performing (batch) allocations (e.g., 500ms, 1sec, etc). (default: 1secs)
--[no-]authenticate If authenticate is 'true' only authenticated frameworks are allowed to register. If 'false' unauthenticated frameworks are also allowed to register. (default: false)
--[no-]authenticate_slaves If 'true' only authenticated slaves are allowed to register.

If 'false' unauthenticated slaves are also allowed to register. (default: false)

--authenticators=VALUE Authenticator implementation to use when authenticating frameworks and/or slaves. Use the default crammd5, or load an alternate authenticator module using --modules. (default: crammd5)
--cluster=VALUE Human readable name for the cluster, displayed in the webui.
--credentials=VALUE Either a path to a text file with a list of credentials, each line containing 'principal' and 'secret' separated by whitespace, or, a path to a JSON-formatted file containing credentials. Path should be of the form file:///path/to/file or /path/to/file

JSON file Example:

{
  "credentials": [
    {
      "principal": "sherman",
      "secret": "kitesurf"
    }
  ]
}

Text file Example:

    username secret 
--external_log_file=VALUE Specified the externally managed log file. This file will be exposed in the webui and HTTP api. This is useful when using stderr logging as the log file is otherwise unknown to Mesos.
--framework_sorter=VALUE Policy to use for allocating resources between a given user's frameworks. Options are the same as for user_allocator. (default: drf)
--hooks=VALUE A comma separated list of hook modules to be installed inside master.
--hostname=VALUE The hostname the master should advertise in ZooKeeper. If left unset, the hostname is resolved from the IP address that the master binds to.
--[no-]log_auto_initialize Whether to automatically initialize the replicated log used for the registry. If this is set to false, the log has to be manually initialized when used for the very first time. (default: true)
--modules=VALUE List of modules to be loaded and be available to the internal subsystems.

Use --modules=filepath to specify the list of modules via a file containing a JSON formatted string. Remember you can also use the file:///path/to/file or /path/to/file argument value format to write the JSON in a file.

Use --modules="{...}" to specify the list of modules inline.

JSON file example:

{
  "libraries": [
    {
      "file": "/path/to/libfoo.so",
      "modules": [
        {
          "name": "org_apache_mesos_bar",
          "parameters": [
            {
              "key": "X",
              "value": "Y"
            }
          ]
        },
        {
          "name": "org_apache_mesos_baz"
        }
      ]
    },
    {
      "name": "qux",
      "modules": [
        {
          "name": "org_apache_mesos_norf"
        }
      ]
    }
  ]
}
--offer_timeout=VALUE Duration of time before an offer is rescinded from a framework.

This helps fairness when running frameworks that hold on to offers, or frameworks that accidentally drop offers.

--rate_limits=VALUE The value could be a JSON formatted string of rate limits or a file path containing the JSON formatted rate limits used for framework rate limiting.

Remember you can also use the file:///path/to/file or /path/to/file argument value format to write the JSON in a file.

See the RateLimits protobuf in mesos.proto for the expected format.

Example:

{
  "limits": [
    {
      "principal": "foo",
      "qps": 55.5
    },
    {
      "principal": "bar"
    }
  ],
  "aggregate_default_qps": 33.3
}
--recovery_slave_removal_limit=VALUE For failovers, limit on the percentage of slaves that can be removed from the registry *and* shutdown after the re-registration timeout elapses. If the limit is exceeded, the master will fail over rather than remove the slaves.

This can be used to provide safety guarantees for production environments. Production environments may expect that across Master failovers, at most a certain percentage of slaves will fail permanently (e.g. due to rack-level failures).

Setting this limit would ensure that a human needs to get involved should an unexpected widespread failure of slaves occur in the cluster.

Values: [0%-100%] (default: 100%)

--slave_removal_rate_limit=VALUE The maximum rate (e.g., 1/10mins, 2/3hrs, etc) at which slaves will be removed from the master when they fail health checks. By default slaves will be removed as soon as they fail the health checks.

The value is of the form 'Number of slaves'/'Duration'

--registry=VALUE Persistence strategy for the registry;

available options are 'replicated_log', 'in_memory' (for testing). (default: replicated_log)

--registry_fetch_timeout=VALUE Duration of time to wait in order to fetch data from the registry after which the operation is considered a failure. (default: 1mins)
--registry_store_timeout=VALUE Duration of time to wait in order to store data in the registry after which the operation is considered a failure. (default: 5secs)
--[no-]registry_strict Whether the Master will take actions based on the persistent information stored in the Registry. Setting this to false means that the Registrar will never reject the admission, readmission, or removal of a slave. Consequently, 'false' can be used to bootstrap the persistent state on a running cluster.

NOTE: This flag is *experimental* and should not be used in production yet. (default: false)

--roles=VALUE A comma separated list of the allocation roles that frameworks in this cluster may belong to.
--[no-]root_submissions Can root submit frameworks? (default: true)
--slave_reregister_timeout=VALUE The timeout within which all slaves are expected to re-register when a new master is elected as the leader. Slaves that do not re-register within the timeout will be removed from the registry and will be shutdown if they attempt to communicate with master.

NOTE: This value has to be atleast 10mins. (default: 10mins)

--user_sorter=VALUE Policy to use for allocating resources between users. May be one of:

dominant_resource_fairness (drf) (default: drf)

--webui_dir=VALUE Directory path of the webui files/assets (default: /usr/local/share/mesos/webui)
--weights=VALUE A comma separated list of role/weight pairs of the form 'role=weight,role=weight'. Weights are used to indicate forms of priority.
--whitelist=VALUE A filename which contains a list of slaves (one per line) to advertise offers for. The file is watched, and periodically re-read to refresh the slave whitelist. By default there is no whitelist / all machines are accepted. (default: None)

Example:

file:///etc/mesos/slave_whitelist

--zk_session_timeout=VALUE ZooKeeper session timeout. (default: 10secs)

Flags available when configured with '--with-network-isolator'

Flag Explanation
--max_executors_per_slave=VALUE Maximum number of executors allowed per slave. The network monitoring/isolation technique imposes an implicit resource acquisition on each executor (# ephemeral ports), as a result one can only run a certain number of executors on each slave.

This flag was added as a hack to avoid frameworks getting offers when we have allocated all of the ephemeral port range on the slave.

Slave Options

Required Flags

Flag Explanation
--master=VALUE This specifies how to connect to a master or a quorum of masters. This flag works with 3 different techniques. It may be one of:
  1. hostname or ip to a master or comma-delimited list of masters, e.g.,
    --master=localhost:5050
    --master=10.0.0.5:5050,10.0.0.6:5050
    
  2. zookeeper or quorum hostname/ip + port + master registration path
  3. --master=zk://host1:port1,host2:port2,.../path
    --master=zk://username:password@host1:port1,host2:port2,.../path
    
  4. a path to a file containing either one of the above options. You can also use the file:///path/to/file syntax to read the argument from a file which contains one of the above.

Optional Flags

Flag Explanation
--attributes=VALUE Attributes of machine, in the form:

rack:2 or 'rack:2;u:1'

--authenticatee=VALUE Authenticatee implementation to use when authenticating against the master. Use the default crammd5, or load an alternate authenticatee module using --modules. (default: crammd5)
--[no-]cgroups_enable_cfs Cgroups feature flag to enable hard limits on CPU resources via the CFS bandwidth limiting subfeature. (default: false)
--cgroups_hierarchy=VALUE The path to the cgroups hierarchy root (default: /sys/fs/cgroup)
--[no-]cgroups_limit_swap Cgroups feature flag to enable memory limits on both memory and swap instead of just memory. (default: false)
--cgroups_root=VALUE Name of the root cgroup (default: mesos)
--container_disk_watch_interval=VALUE The interval between disk quota checks for containers. This flag is used for the posix/disk isolator. (default: 15secs)
--containerizer_path=VALUE The path to the external containerizer executable used when external isolation is activated (--isolation=external).
--containerizers=VALUE Comma separated list of containerizer implementations to compose in order to provide containerization.

Available options are 'mesos', 'external', and 'docker' (on Linux). The order the containerizers are specified is the order they are tried (--containerizers=mesos). (default: mesos)

--credential=VALUE A path to a text file with a single line containing 'principal' and 'secret' separated by whitespace.

Or a path containing the JSON formatted information used for one credential.

Path should be of the form file://path/to/file.

Remember you can also use the file:///path/to/file argument value format to read the value from a file.

JSON file example:
{
  "principal": "username",
  "secret": "secret"
}
--default_container_image=VALUE The default container image to use if not specified by a task, when using external containerizer.
--default_container_info=VALUE JSON formatted ContainerInfo that will be included into any ExecutorInfo that does not specify a ContainerInfo.

See the ContainerInfo protobuf in mesos.proto for the expected format.

Example:

{
  "type": "MESOS",
  "volumes": [
    {
      "host_path": "./.private/tmp",
      "container_path": "/tmp",
      "mode": "RW"
    }
  ]
}
--default_role=VALUE Any resources in the --resources flag that omit a role, as well as any resources that are not present in --resources but that are automatically detected, will be assigned to this role. (default: *)
--disk_watch_interval=VALUE Periodic time interval (e.g., 10secs, 2mins, etc) to check the disk usage (default: 1mins)
--docker=VALUE The absolute path to the docker executable for docker containerizer. (default: docker)
--docker_remove_delay=VALUE The amount of time to wait before removing docker containers (e.g., 3days, 2weeks, etc). (default: 6hrs)
--docker_sandbox_directory=VALUE The absolute path for the directory in the container where the sandbox is mapped to. (default: /mnt/mesos/sandbox)
--docker_stop_timeout=VALUE The time as a duration for docker to wait after stopping an instance before it kills that instance. (default: 0secs)
--[no-]enforce_container_disk_quota Whether to enable disk quota enforcement for containers. This flag is used for the 'posix/disk' isolator. (default: false)
--executor_registration_timeout=VALUE Amount of time to wait for an executor to register with the slave before considering it hung and shutting it down (e.g., 60secs, 3mins, etc) (default: 1mins)
--executor_shutdown_grace_period=VALUE Amount of time to wait for an executor to shut down (e.g., 60secs, 3mins, etc) (default: 5secs)
--external_log_file=VALUE Specified the externally managed log file. This file will be exposed in the webui and HTTP api. This is useful when using stderr logging as the log file is otherwise unknown to Mesos.
--frameworks_home=VALUE Directory path prepended to relative executor URIs (default: )
--gc_delay=VALUE Maximum amount of time to wait before cleaning up executor directories (e.g., 3days, 2weeks, etc).

Note that this delay may be shorter depending on the available disk usage. (default: 1weeks)

--gc_disk_headroom=VALUE Adjust disk headroom used to calculate maximum executor directory age. Age is calculated by:

gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage)) every --disk_watch_interval duration. gc_disk_headroom must be a value between 0.0 and 1.0 (default: 0.1)
--hadoop_home=VALUE Path to find Hadoop installed (for fetching framework executors from HDFS) (no default, look for HADOOP_HOME in environment or find hadoop on PATH) (default: )
--hooks=VALUE A comma separated list of hook modules to be installed inside master.
--hostname=VALUE The hostname the slave should report.

If left unset, the hostname is resolved from the IP address that the slave binds to.

--isolation=VALUE Isolation mechanisms to use, e.g., 'posix/cpu,posix/mem', or 'cgroups/cpu,cgroups/mem', or network/port_mapping (configure with flag: --with-network-isolator to enable), or 'external', or load an alternate isolator module using the --modules flag. (default: posix/cpu,posix/mem)
--launcher_dir=VALUE Directory path of Mesos binaries (default: /usr/local/lib/mesos)
--modules=VALUE List of modules to be loaded and be available to the internal subsystems.

Remember you can also use the file:///path/to/file or /path/to/file argument value format to have the value read from a file.

Use --modules="{...}" to specify the list of modules inline.

JSON file example:


{
  "libraries": [
    {
      "file": "/path/to/libfoo.so",
      "modules": [
        {
          "name": "org_apache_mesos_bar",
          "parameters": [
            {
              "key": "X",
              "value": "Y"
            }
          ]
        },
        {
          "name": "org_apache_mesos_baz"
        }
      ]
    },
    {
      "name": "qux",
      "modules": [
        {
          "name": "org_apache_mesos_norf"
        }
      ]
    }
  ]
}
--perf_duration=VALUE Duration of a perf stat sample. The duration must be less that the perf_interval. (default: 10secs)
--perf_events=VALUE List of command-separated perf events to sample for each container when using the perf_event isolator. Default is none.

Run command 'perf list' to see all events. Event names are sanitized by downcasing and replacing hyphens with underscores when reported in the PerfStatistics protobuf, e.g., cpu-cycles becomes cpu_cycles; see the PerfStatistics protobuf for all names.

--perf_interval=VALUE Interval between the start of perf stat samples. Perf samples are obtained periodically according to perf_interval and the most recently obtained sample is returned rather than sampling on demand. For this reason, perf_interval is independent of the resource monitoring interval (default: 1mins)
--recover=VALUE Whether to recover status updates and reconnect with old executors.

Valid values for 'recover' are

reconnect: Reconnect with any old live executors.

cleanup : Kill any old live executors and exit.

Use this option when doing an incompatible slave or executor upgrade!).

NOTE: If checkpointed slave doesn't exist, no recovery is performed and the slave registers with the master as a new slave. (default: reconnect)

--recovery_timeout=VALUE Amount of time alloted for the slave to recover. If the slave takes longer than recovery_timeout to recover, any executors that are waiting to reconnect to the slave will self-terminate.

NOTE: This flag is only applicable when checkpoint is enabled. (default: 15mins)

--registration_backoff_factor=VALUE Slave initially picks a random amount of time between [0, b], where b = registration_backoff_factor, to (re-)register with a new master.

Subsequent retries are exponentially backed off based on this interval (e.g., 1st retry uses a random value between [0, b * 2^1], 2nd retry between [0, b * 2^2], 3rd retry between [0, b * 2^3] etc) up to a maximum of 1mins (default: 1secs)

--resource_monitoring_interval=VALUE Periodic time interval for monitoring executor resource usage (e.g., 10secs, 1min, etc) (default: 1secs)
--resources=VALUE Total consumable resources per slave, in the form

name(role):value;name(role):value....
--slave_subsystems=VALUE List of comma-separated cgroup subsystems to run the slave binary in, e.g., memory,cpuacct. The default is none. Present functionality is intended for resource monitoring and no cgroup limits are set, they are inherited from the root mesos cgroup.
--[no-]strict If strict=true, any and all recovery errors are considered fatal.

If strict=false, any expected errors (e.g., slave cannot recover information about an executor, because the slave died right before the executor registered.) during recovery are ignored and as much state as possible is recovered. (default: true)

--[no-]switch_user Whether to run tasks as the user who submitted them rather than the user running the slave (requires setuid permission) (default: true)
--work_dir=VALUE Directory path to place framework work directories (default: /tmp/mesos)

Flags available when configured with '--with-network-isolator'

Flag Explanation
--ephemeral_ports_per_container=VALUE Number of ephemeral ports allocated to a container by the network isolator. This number has to be a power of 2. (default: 1024)
--eth0_name=VALUE The name of the public network interface (e.g., eth0). If it is not specified, the network isolator will try to guess it based on the host default gateway.
--lo_name=VALUE The name of the loopback network interface (e.g., lo). If it is not specified, the network isolator will try to guess it.
--egress_rate_limit_per_container=VALUE The limit of the egress traffic for each container, in Bytes/s. If not specified or specified as zero, the network isolator will impose no limits to containers' egress traffic throughput. This flag uses the Bytes type, defined in stout.
--[no-]network_enable_socket_statistics Whether to collect socket statistics (e.g., TCP RTT) for each container. (default: false)

Mesos Build Configuration Options

The configure script has the following flags for optional features:

Flag Explanation
--enable-shared[=PKGS] build shared libraries [default=yes]
--enable-static[=PKGS] build static libraries [default=yes]
--enable-fast-install[=PKGS] optimize for fast installation [default=yes]
--disable-libtool-lock avoid locking (might break parallel builds)
--disable-java don't build Java bindings
--disable-python don't build Python bindings
--enable-debug enable debugging. If CFLAGS/CXXFLAGS are set, this option won't change them default: no
--enable-optimize enable optimizations. If CFLAGS/CXXFLAGS are set, this option won't change them default: no
--disable-bundled build against preinstalled dependencies instead of bundled libraries
--disable-bundled-distribute excludes building and using the bundled distribute package in lieu of an installed version in PYTHONPATH
--disable-bundled-pip excludes building and using the bundled pip package in lieu of an installed version in PYTHONPATH
--disable-bundled-wheel excludes building and using the bundled wheel package in lieu of an installed version in PYTHONPATH
--disable-python-dependency-install when the python packages are installed during make install, no external dependencies are downloaded or installed

The configure script has the following flags for optional packages:

Flag Explanation
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
--with-sysroot=DIR Search for dependent libraries within DIR (or the compiler's sysroot if not specified).
--with-zookeeper[=DIR] excludes building and using the bundled ZooKeeper package in lieu of an installed version at a location prefixed by the given path
--with-leveldb[=DIR] excludes building and using the bundled LevelDB package in lieu of an installed version at a location prefixed by the given path
--with-glog[=DIR] excludes building and using the bundled glog package in lieu of an installed version at a location prefixed by the given path
--with-protobuf[=DIR] excludes building and using the bundled protobuf package in lieu of an installed version at a location prefixed by the given path
--with-gmock[=DIR] excludes building and using the bundled gmock package in lieu of an installed version at a location prefixed by the given path
--with-curl=[=DIR] specify where to locate the curl library
--with-sasl=[=DIR] specify where to locate the sasl2 library
--with-zlib=[=DIR] specify where to locate the zlib library
--with-apr=[=DIR] specify where to locate the apr-1 library
--with-svn=[=DIR] specify where to locate the svn-1 library
--with-network-isolator builds the network isolator

Some influential environment variables for configure script:

Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations.

Flag Explanation
JAVA_HOME location of Java Development Kit (JDK)
JAVA_CPPFLAGS preprocessor flags for JNI
JAVA_JVM_LIBRARY full path to libjvm.so
MAVEN_HOME looks for mvn at MAVEN_HOME/bin/mvn
PROTOBUF_JAR full path to protobuf jar on prefixed builds
PYTHON which Python interpreter to use
PYTHON_VERSION The installed Python version to use, for example '2.3'. This string will be appended to the Python interpreter canonical name.