Hadoop Ecosystem: Enabling and Configuring Capacity Schedular

Hadoop Schedulers:

Hadoop has 4 types of schedulers

1. FIFO (By default)
2. Capacity Scheduler
3. Fair Scheduler
4. HOD Scheduler (Hadoop On Demand)

In Hadoop jobs are submitted to Queues. Queues are submitting jobs to Hadoop Cluster By default the queue name is called default.

Configuring Capacity Scheduler:

To configure Capacity Scheduler, we need to do the following things:

1. Change/Add some of the configuration parameters in mapred-site.xml
2. Change/Add queues information in the capacity-scheduler.xml
3. Change/Add acls information in the mapred-queue-acls.xml

I assumed 4 queues for the cluster, their names are india, usa, aus, eng etc...

1. Changes to mapred-site.xml

<configuration>

<property>
<name>mapred.job.tracker</name>
<value>hostname:9001</value>
<description>This tells where is Job Tracker is running....</description>
</property>

<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>This number tells how many no of reducers to run</description>
</property>

<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
<description>This overrides the default schedular FIFO with Capacity Schedular</description>
</property>

<property>
<name>mapred.queue.names</name>
<value>india,usa,eng,aus</value>
<description>These are new created queue names</description>
</property>

<property>
<name>mapred.acls.enabled</name>
<value>true</value>
<description>This enables access control lists</description>
</property>

<property>
<name>mapred.job.queue.name</name>
<value>india</value>
<description>Specify the specific queue to submit the job instead of submitting to default queue.</description>
</property>
</configuration>-

2. Changes to capacity-scheduler.xml

<configuration>



<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>3000</value>
<description>Maximum number of jobs in the system which can be initialized,
concurrently, by the CapacityScheduler.
</description>
</property>


<property>
<name>mapred.capacity-scheduler.queue.india.capacity</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.india.supports-priority</name>
<value>false</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.india.minimum-user-limit-percent</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.india.user-limit-factor</name>
<value>10</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.india.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.india.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.india.init-accept-jobs-factor</name>
<value>100</value>
</property>


<property>
<name>mapred.capacity-scheduler.queue.usa.capacity</name>
<value>30</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.usa.supports-priority</name>
<value>false</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.usa.minimum-user-limit-percent</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.usa.user-limit-factor</name>
<value>1</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.usa.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.usa.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.usa.init-accept-jobs-factor</name>
<value>10</value>
</property>


<property>
<name>mapred.capacity-scheduler.queue.eng.capacity</name>
<value>30</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.eng.supports-priority</name>
<value>false</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.eng.minimum-user-limit-percent</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.eng.user-limit-factor</name>
<value>1</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.eng.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.eng.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.eng.init-accept-jobs-factor</name>
<value>10</value>
</property>


<property>
<name>mapred.capacity-scheduler.queue.aus.capacity</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.aus.supports-priority</name>
<value>false</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.aus.minimum-user-limit-percent</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.aus.user-limit-factor</name>
<value>20</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.aus.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.aus.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>

<property>
<name>mapred.capacity-scheduler.queue.aus.init-accept-jobs-factor</name>
<value>10</value>
</property>
</configuration>

3. Changes to mapred-queue-acls.xml

<configuration>

<property>
<name>mapred.queue.india.acl-submit-job</name>
<value>*</value>
<description> Comma separated list of user and group names that are allowed
to submit jobs to the 'india' queue. The user list and the group list
are separated by a blank. For e.g. user1,user2 group1,group2.
If set to the special value '*', it means all users are allowed to
submit jobs. If set to ' '(i.e. space), no user will be allowed to submit
jobs.

It is only used if authorization is enabled in Map/Reduce by setting the
configuration property mapred.acls.enabled to true.

Irrespective of this ACL configuration, the user who started the cluster and
cluster administrators configured via mapreduce.cluster.administrators can submit jobs.
</description>
</property>

<property>
<name>mapred.queue.india.acl-administer-jobs</name>
<value>*</value>
<description> Comma separated list of user and group names that are allowed
to view job details, kill jobs or modify job's priority for all the jobs
in the 'india' queue. The user list and the group list
are separated by a blank. For e.g. user1,user2 group1,group2.
If set to the special value '*', it means all users are allowed to do
this operation. If set to ' '(i.e. space), no user will be allowed to do
this operation.

It is only used if authorization is enabled in Map/Reduce by setting the
configuration property mapred.acls.enabled to true.

Irrespective of this ACL configuration, the user who started the cluster and
cluster administrators configured via
mapreduce.cluster.administrators can do the above operations on all the jobs
in all the queues. The job owner can do all the above operations on his/her
job irrespective of this ACL configuration.
</description>
</property>

</configuration>

Once all the above changes are done, just start-all.sh to start the Hadoop cluster.
Visit the Hadoop MapReduce Cluster Administration on http://hostname:50030/jobtracker.jsp
We will see the Scheduling Information in the web page, it gives the complete info about each and every configured Queue.

****************************************Done*************************************

Hadoop Ecosystem

Thursday, August 21, 2014

Enabling and Configuring Capacity Schedular

No comments:

Post a Comment

About Me