java - Producing a sorted wordcount with Spark - Code Review Stack Exchange
My method using Java 8
As addendum I'll show how I would identify your problem in question and show you how I would do it.
Input: An input file, consisting of words. Output: A list of the words sorted by frequency in which they occur.
Map<String, Long> occurenceMap = Files.readAllLines(Paths.get("myFile.txt")) .stream() .flatMap(line -> Arrays.stream(line.split(" "))) .collect(Collectors.groupingBy(i -> i, Collectors.counting())); List<String> sortedWords = occurenceMap.entrySet() .stream() .sorted(Comparator.comparing((Map.Entry<String, Long> entry) -> entry.getValue()).reversed()) .map(Map.Entry::getKey) .collect(Collectors.toList());
This will do the following steps:
- Read all lines into a
List<String>
(care with large files!) - Turn it into a
Stream<String>
. - Turn that into a
Stream<String>
by flat mapping everyString
to aStream<String>
splitting on the blanks. - Collect all elements into a
Map<String, Long>
grouping by the identity (i -> i
) and using as downstreamCollectors.counting()
such that the map-value will be its count. - Get a
Set<Map.Entry<String, Long>>
from the map. - Turn it into a
Stream<Map.Entry<String, Long>>
. - Sort by the reverse order of the value of the entry.
- Map the results to a
Stream<String>
, you lose the frequency information here. - Collect the stream into a
List<String>
.
Beware that the line .sorted(Comparator.comparing((Map.Entry<String, Long> entry) -> entry.getValue()).reversed())
should really be .sorted(Comparator.comparing(Map.Entry::getValue).reversed()
, but type inference is having issues with that and for some reason it will not compile.
I hope the Java 8 way can give you interesting insights.
Read full article from java - Producing a sorted wordcount with Spark - Code Review Stack Exchange
No comments:
Post a Comment