blog




  • Essay / MapReduce: A programming model - 678

    MapReduce is presented by Dean and Ghemawt [1] as a programming model for parallelizing the calculations of data-intensive applications on a cluster. It has two primitives, map and reduce. Map calculates a set of intermediate key/value pairs for each record, then reduction is applied to all values ​​with similar keys to form a smaller set of values. Several implementations of MapReduce have been implemented depending on the hardware infrastructure. Phoenix [2], for example, is a shared-memory implementation of MapReduce. Apache Hadoop is another example implemented to run applications on a large cluster built with commodity hardware [3]. MATE-CG [4] is another map-reduce framework, implemented appropriately for scheduling CPU+GPU clusters. MATE-CG aims to accelerate map-reduce applications on heterogeneous parallel environments, notably CPU+GPU clusters. It helps accelerate different types of applications, supporting three schemes. Based on its dataset, an application can be accelerated by one of the CPU-only, GPU-only, and CPU-n-GPU schemes, to achieve the best performance. Appropriate APIs allow the implementer to specify reduce functions for the CPU and GPU. GPU shrinkage is implemented by CUDA kernels to work on GPUs. The user also defines application-specific partitioning and separation functions. The first is used to partition the dataset among computing nodes and the second divides the data blocks into smaller chunks to be processed on CPUs and GPUs. In the runtime system, the CPU is used to perform partitioning, scheduling, etc. The GPU is mainly responsible for speeding up the calculation. Based on the user-defined partitioning parameter, the data is distributed among the nodes. After finishing...... middle of paper ...... we tasks. A user-defined split is performed on the input to derive tasks. Taking into account the number of cores or a user-defined value, each node requests certain tasks. A node does not request a task until it has completed previous tasks. In order to be fault tolerant, MARLA uses a specific fault tolerance scheme. Failed tasks are submitted to another worker. If the task succeeds with another worker, the faulty node receives a warning. Nodes with three strikes are considered faulty nodes and are not allowed to participate in processing. This scheme avoids the costly data relocation used by many other implementations to achieve fault tolerance. To enable the mentioned specifications, MARLA implemented three main parts, the Splitter for I/O management, the TaskController for concurrency management and the FaultTracker for fault tolerance..