A Survey on Job and Task Scheduling in Big Data

ALEXANDER, Mr. MALCOM MARSHALL (2014) A Survey on Job and Task Scheduling in Big Data. [Conference Paper]

Full text available as:



Bigdata handles the datasets which exceeds the ability of commonly used software tools for storing, sharing and processing the data. Classification of workload is a major issue to the Big Data community namely job type evolution and job size evolution. On the basis of job type, job size and disk performance, clusters are been formed with data node, name node and secondary name node. To classify the workload and to perform the job scheduling, mapreduce algorithm is going to be applied. Based on the performance of individual machine, workload has been allocated. Mapreduce has two phases for processing the data: map and reduce phases. In map phase, the input dataset taken is splitted into keyvalue pairs and an intermediate output is obtained and in reduce phase that key value pair undergoes shuffle and sort operation. Intermediate files are created from map tasks are written to local disk and output files are written to distributed file system of Hadoop. Scheduling of different jobs to different disks are identified after completing mapreduce tasks. Johnson algorithm is used to schedule the jobs and used to find out the optimal solution of different jobs. It schedules the jobs into different pools and performs the scheduling. The main task to be carried out is to minimize the computation time for entire jobs and analyze the performance using response time factors in hadoop distributed file system. Based on the dataset size and number of nodes which is formed in hadoop cluster, the performance of individual jobs are identified Keywords — hadoop; mapreduce; johnson algorithm

Item Type:Conference Paper
Subjects:Computer Science > Artificial Intelligence
Computer Science > Dynamical Systems
ID Code:9814
Deposited On:18 Feb 2017 20:30
Last Modified:18 Feb 2017 20:30


Repository Staff Only: item control page