1. Most frequent IP Addresses
Find the most frequent IP address in web log. The log size may over 100 G.
Naive Method:
Use a HashMap to count the frequencies for each IP address.
The key is the IP address, while the value is the counts of the IP address.
Complexity:
Time: O(N * Read)
Memory: O(N').
Optimization 1:
Only 2^32 distinct IP addresses -> 4G.
For the longest Ip address, 255.255.255.255, it needs 15 characters, appropriately 16 bytes to store an IP address at worst case. So the need of total memory is 64 GB.
Read full article from Buttercola: Nine Chapter
No comments:
Post a Comment