MapReduce for Distributed Indexing
Post by: syed hussain in All Application Design Patterns
Summary
MapReduce is a programming model commonly used within the Hadoop framework to access big data stored in the Hadoop File System (HDFS). It consists of several phases: Map, Shuffle, and Reduce.
Google popularised the term MapReduce in 2004, however, the concepts can easily be implemented using any programming language. Below is an implementation of MapReduce using C#.

Mapping Function
Function x = x + 2, and a list of integers 1 -> 3. Map will return:
: :
/ \ / \
1 : 3 :
/ \ / \
2 : 4 :
/ \ / \
3 [] 5 []static void MapFunction(string[] args)
{
var index = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var map = Map<int, int>(x => x + 2, index);
map.ToList<int>().ForEach(i => Console.Write(i + " "));
Console.WriteLine();
Console.ReadKey();
}
static IEnumerable<TResult> Map<T, TResult>(Func<T, TResult> func, IEnumerable<T> list)
{
foreach (var i in list)
yield return func(i);
}Reduce
Function (x, y) => x + y, and a list of elements 1 to 3, Reduce will return an aggregate as follows:
:
/ \
1 : f(0,1) = 1
/ \ \
2 : f(1,2) = 3
/ \ \
3 [] f(3,3) = 6static void ReduceFunction(string[] args)
{
var index = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Console.WriteLine(Reduce<int, int>((x, y) => x + y, index, 0));
Console.ReadKey();
}
static T Reduce<T, U>(Func<U, T, T> func, IEnumerable<U> list, T x)
{
foreach (var i in list)
x = func(i, x);
return x;
} Tags:
