All About Programming: Apache Helix

Apache Helix - Home

Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.

What Is Cluster Management?

To understand Helix, you first need to understand cluster management. A distributed system typically runs on multiple nodes for the following reasons:

scalability
fault tolerance
load balancing

Each node performs one or more of the primary functions of the cluster, such as storing and serving data, producing and consuming data streams, and so on. Once configured for your system, Helix acts as the global brain for the system. It is designed to make decisions that cannot be made in isolation. Examples of such decisions that require global knowledge and coordination:

scheduling of maintainence tasks, such as backups, garbage collection, file consolidation, index rebuilds
repartitioning of data or resources across the cluster
informing dependent systems of changes so they can react appropriately to cluster changes
throttling system tasks and changes

While it is possible to integrate these functions into the distributed system, it complicates the code. Helix has abstracted common cluster management tasks, enabling the system builder to model the desired behavior with a declarative state model, and let Helix manage the coordination. The result is less new code to write, and a robust, highly operable system.

Read full article from Apache Helix - Home

Apache Helix - Home

What Is Cluster Management?

No comments:

Post a Comment

Labels

Popular Posts