yasemin's notes: Using Distributed Cache in Hadoop
Using Distributed Cache in Hadoop
Distributed cache allows to share static data among all nodes. In order use this functionality, the data location should be set before MR job starts.
Here is an example usage for distributed cache. While working on web-graph problem I replace URLs with unique id's. If I have the url-Id mapping in memory, I can easily replace URLs with their corresponding ids. So here is the sample usage:
Here is an example usage for distributed cache. While working on web-graph problem I replace URLs with unique id's. If I have the url-Id mapping in memory, I can easily replace URLs with their corresponding ids. So here is the sample usage:
Read full article from yasemin's notes: Using Distributed Cache in Hadoop
No comments:
Post a Comment