All About Programming: Implementing a real-time data pipeline with Spark Streaming

Implementing a real-time data pipeline with Spark Streaming | Chimpler

Real-time analytics has become a very popular topic in recent years. Whether it is in finance (high frequency trading), adtech (real-time bidding), social networks (real-time activity), Internet of things (sensors sending real-time data), server/traffic monitoring, providing real-time reporting can bring tremendous value (e.g., detect potential attacks on network immediately, quickly adjust ad campaigns, …). Apache Storm is one of the most popular frameworks to aggregate data in real-time but there are also many others such as Apache S4 , Apache Samza , Akka Streams , SQLStream and more recently Spark Streaming . According to Kyle Moses , on his page on Spark Streaming , it can process about 400,000 records / node / second for simple aggregations on small records and significantly outperforms other popular streaming systems such as Apache Storm (40x) and Yahoo S4 (57x).

Read full article from Implementing a real-time data pipeline with Spark Streaming | Chimpler

Implementing a real-time data pipeline with Spark Streaming | Chimpler

No comments:

Post a Comment

Labels

Popular Posts