极分享:高质分享+专业互助=没有难做的软件+没有不得已的加班



极分享:高质分享+专业互助=没有难做的软件+没有不得已的加班

Big Data非常复杂,涉及到的技术繁多。我们常"耳闻"的技术,如果MapReduce, Hadoop等处于哪一层次,下图会给你一个整体印象。

图片:databricks.jpg




下面是常见的技术简介:

HBase是一个分布式的、面向列的开源数据库。其设计理念源自谷歌的 BigTable,用 Java 语言编写而成。


MongoDB是目前非常流行的一种非关系型(NoSQL)数据库

YARN这是新一代的 MapReduce 计算框架,简称 MRv2,它是在第一代 MapReduce 的基础上演变而来的

Spark是一个基于内存计算的开源的集群计算系统,其目的在于,让数据分析更加快速。Spark 是由加州大学伯克利分校的 AMP 实验室采用 Scala 语言开发而成。Spark 的内存计算框架,适合各种迭代算法和交互式数据分析,能够提升大数据处理的实时性和准确性,现已逐渐获得很多企业的支持,如阿里巴巴、百度、网易、英特尔等公司均是其用户。

原文:https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan

Read full article from 极分享:高质分享+专业互助=没有难做的软件+没有不得已的加班


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts