tencent cloud

Elastic MapReduce

Configuring Capacity Scheduler

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2023-12-27 14:32:16

Overview

Capacity Scheduler organizes resources in a hierarchical manner, allowing multiple users to share cluster resources based on multi-level resource restrictions.

Directions

Creating a resource pool

1. Log in to the EMR console and click Details of the target Hadoop cluster in the cluster list to go to the cluster details page.
2. On the cluster details page, select Cluster Service and choose Operation > Resource Scheduling in the top-right corner of the YARN component block to go to the Resource Scheduling page.


3. Toggle on Resource scheduler and configure the scheduler.
4. Create a resource pool for Capacity Scheduler. Select Capacity Scheduler and click Create Resource Pool to create a resource pool. You can edit or clone an existing resource pool as well as create a subpool for it. You can also click Default settings to set the number of times of delayed scheduling.



Fields and parameters:
Field Name
Parameter
Description
Resource Pool Name
yarn.scheduler.capacity.<queue-path>.queues</queue-path>
Name of the resource pool or queue
Label Settings
N/A
Set the specified label that can be accessed by the queue.
Capacity
yarn.scheduler.capacity.<queue-path>.capacity</queue-path>
Available resource amount. The total capacity of the subpools of a parent pool is 100. Available resource amount = resource amount of the parent pool * percentage set here. The queue can consume more resources than the queue's capacity if there are idle resources in other queues.
Max Capacity
yarn.scheduler.capacity.<queue-path>.maximum-capacity</queue-path>
Maximum queue capacity in percentage. Because of resource sharing, the amount of resources used by a queue may exceed its capacity, and this field specifies the maximum amount of resources that can be used by the queue.
Default Label Expression
yarn.scheduler.capacity.<queue-path>.default-node-label-expression</queue-path>
If a resource request does not have a node label specified, the application will be submitted to the corresponding partition specified by this configuration item. By default, the value is empty, i.e., applications will be allocated to containers on nodes with no label.
Min User Capacity
yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent</queue-path>
Minimum resources in percentage guaranteed for each user. Each queue enforces a limit on the percentage of resources allocated to a user at any given time. When multiple users' applications are running in a queue concurrently, the amount of resources used by each user varies between a minimum and maximum value. The minimum value depends on the number of running applications, and the maximum value is determined by `minimum-user-limit-percent`.
User Resource Factor
yarn.scheduler.capacity.<queue-path>.user-limit-factor</queue-path>
Maximum amount of resources in percentage that can be used by each user. For example, if the value is `30`, the amount of resources for each user cannot exceed 30% of the queue capacity at any given time.
Max Memory per Container
yarn.scheduler.capacity.<queue-path>.maximum-allocation-mb</queue-path>
Maximum memory that can be allocated to each container. The value will overwrite and cannot be greater than that of the system's `yarn.scheduler.maximum-allocation-mb`.
Max vCores per Container
yarn.scheduler.capacity.<queue-path>.maximum-allocation-vcores</queue-path>
Maximum number of CPU cores that can be allocated to each container. The value will overwrite and cannot be greater than that of the system's `yarn.scheduler.maximum-allocation-vcores`.
Resource Pool Status
yarn.scheduler.capacity.<queue-path>.state</queue-path>
Status of the queue. The value can be `Running` or `Stopped`. If a queue is in the `Stopped` status, new applications cannot be submitted to it or any of its subqueues.
Max Apps
yarn.scheduler.capacity.<queue-path>.maximum-applications</queue-path>
Maximum number of concurrent active (both running and pending) applications allowed in the system
Max Resources for AM
yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent</queue-path>
Maximum percentage of resources in the cluster which can be used to run application masters. It controls the number of concurrent active applications.
Resource Pool Priority
yarn.scheduler.capacity.root.<leaf-queue-path>.default-application-priority</leaf-queue-path>
Configure the priority of the resource queue, which is `0` by default. The larger the value, the higher the priority.
Submission
yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications
List of users that can submit apps to the queue
Management
yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue
List of users that can manage the queue
Delayed Scheduling
yarn.scheduler.capacity.node-locality-delay
Set the allowed number of times of delayed scheduling to ensure the local execution of tasks. If the value is `-1`, delayed scheduling will be disabled.

Configuring resource pool mappings

1. In the Policy Settings section, choose Resource Pool Mappings > Create Resource Pool Mapping to create a resource pool mapping.




2. Configure Overwrite Specified Queues. This feature is disabled by default. For example, you have defined a mapped queue in Resource Pool Mappings and specified a queue other than the mapped queue when submitting a job; if the specified queue is default and Overwrite Specified Queues is enabled, the mapped queue will be used; otherwise, the specified queue will be used.

Sample label-based scheduling

1. Log in to the EMR console and click Details of the target Hadoop cluster in the cluster list to go to the cluster details page.
2. On the cluster details page, select Cluster Service and choose Operation > Resource Scheduling in the top-right corner of the YARN component block to go to the Resource Scheduling page.
3. Toggle on Resource scheduler and select Capacity Scheduler.
4. Toggle on Label-based scheduling and click Label management to go to the label management** page.


5. Click Create Label, enter the label name, and set the label type and node to be bound to as needed.


6. After the label is set, click Apply. Then, you can view and edit the resource queue of the label in the resource pool.


7. On the Resource Scheduling page, click Create Resource Pool and set Label, Capacity, Max Capacity, and other parameters as needed.
Note
The capacity and maximum capacity of a resource pool can be configured by label.

8. After the resource pool is set, click Apply to submit the task to the backend.
Caution
Proceed with caution when restarting the ResourceManager. If you are prompted that the ResourceManager will be restarted after you click Apply, check whether the operation is successful in Scheduling History and whether the ResourceManager is healthy in Role Management.

Ajuda e Suporte

Esta página foi útil?

comentários