Implementing PageRank using MapReduce

Posted by Nick D. on Stack Overflow See other posts from Stack Overflow or by Nick D.
Published on 2011-02-17T13:03:56Z Indexed on 2011/02/17 23:25 UTC
Read the original article Hit count: 483

Filed under:
|
|

Hello,

I'm trying to get my head around an issue with the theory of implementing the PageRank with MapReduce.

I have the following simple scenario with three nodes: A B C.

The adjacency matrix is here:

A { B, C }
B { A }

The PageRank for B for example is equal to:

(1-d)/N + d ( PR(A) / C(A) ) 

N     = number of incoming links to B
PR(A) = PageRank of incoming link A
C(A)  = number of outgoing links from page A

I am fine with all the schematics and how the mapper and reducer would work but I cannot get my head around how at the time of calculation by the reducer, C(A) would be known. How will the reducer, when calculating the PageRank of B by aggregating the incoming links to B will know the number of outgoing links from each page. Does this require a lookup in some external data source?

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about mapreduce