MapReduce for Distributed Indexing
Post by: syed hussain in All Application Design Patterns
Summary
MapReduce is a programming model commonly used within the Hadoop framework to access big data stored in the Hadoop File System (HDFS). It consists of several phases: Map, Shuffle, and Reduce.
Google popularised the term MapReduce in 2004, however, the concepts can easily be implemented using any programming language. Below is an implementation of MapReduce using C#.
Mapping Function
Function x = x + 2, and a list of integers 1 -> 3. Map will return:
: : / \ / \ 1 : 3 : / \ / \ 2 : 4 : / \ / \ 3 [] 5 []
static void MapFunction(string[] args)
{
var index = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var map = Map<int, int>(x => x + 2, index);
map.ToList<int>().ForEach(i => Console.Write(i + " "));
Console.WriteLine();
Console.ReadKey();
}
static IEnumerable<TResult> Map<T, TResult>(Func<T, TResult> func, IEnumerable<T> list)
{
foreach (var i in list)
yield return func(i);
}
Reduce
Function (x, y) => x + y, and a list of elements 1 to 3, Reduce will return an aggregate as follows:
: / \ 1 : f(0,1) = 1 / \ \ 2 : f(1,2) = 3 / \ \ 3 [] f(3,3) = 6
static void ReduceFunction(string[] args)
{
var index = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Console.WriteLine(Reduce<int, int>((x, y) => x + y, index, 0));
Console.ReadKey();
}
static T Reduce<T, U>(Func<U, T, T> func, IEnumerable<U> list, T x)
{
foreach (var i in list)
x = func(i, x);
return x;
}
Tags: