Java Serialization - Good, Fast, and Faster | Distributed Thoughts



Java Serialization - Good, Fast, and Faster | Distributed Thoughts

Probably anyone who has ever worked with serialization of objects, be that in Java or any other language, knows that it should be avoided whenever possible. Just like the first rule of distribution is "Do not distribute!", the first rule of serialization should be "Do not serialize!". However, in many cases, especially in distributed environments, serialization cannot be avoided and therefore must be significantly optimized to achieve any kind of reasonable throughput.

At GridGain, given the distributed nature of our product, we have always been working on optimizing of our serialization routines, but starting with version 4.3.0 we have achieved the fastest results so far. Our GridOptimizedMarshaller in our tests achieved up to 20x performance optimization on standard Java serialization with java.io.Serializable. If you switch to java.io.Externalizable, then GridGain marshaller is up to 10x faster. We have even compared our marshaller to Kryo serialization, and turns out that our marshaller is up to 5x faster than Kryo. On top of that, the footprint of GridGain serialized objects is significantly smaller than Java.

The coolest thing here is that we do not require any custom interfaces or API s - GridGain optimized serialization works directly with standard Java POJOs, regardless if they implement java.io.Serializable interface or not. If your POJOs implement java.io.Externalizable, then our marshaling works even faster.

How do we do it? The main culprit of Java serialization is java.io.ObjectOutputStream which is extremely expensive to initialize and performs poorly. The first thing we did is replaced it with our own implementation, based on direct memory copying by invoking native C and Java so-called "unsafe" routines.  We also serialize fields in predefined order by doing lots of object introspection which allows us to pass only values and not their type names or other metadata.

Read full article from Java Serialization - Good, Fast, and Faster | Distributed Thoughts


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts