Introducing streaming k-means in Spark 1.2 – Databricks
January 28, 2015 | by Jeremy Freeman (Howard Hughes Medical Institute) Many real world data are acquired sequentially over time, whether messages from social media users, time series from wearable sensors, or — in a case we are particularly excited about — the firing of large populations of neurons. In these settings, rather than wait for all the data to be acquired before performing our analyses, we can use streaming algorithms to identify patterns over time, and make more targeted predictions and decisions. One simple strategy is to build machine learning models on static data, and then use the learned model to make predictions on an incoming data stream. But what if the patterns in the data are themselves dynamic? That’s where streaming algorithms come in. A key advantage of Spark is that its machine learning library (MLlib) and its library for stream processing (Spark Streaming) are built on the same core architecture for distributed analytics.Read full article from Introducing streaming k-means in Spark 1.2 – Databricks
No comments:
Post a Comment