Complex JSON Transformation with Jolt - Hortonworks



Complex JSON Transformation with Jolt - Hortonworks

Today we don't have a good way to perform JSON manipulation in NiFi. The kind when the output is, again, a JSON. Typical transforms:

  • Field renaming
  • Enrichment, default fields for sparse incoming JSON
  • Transposing, map->list, list->map, etc.
  • Obfuscating sensitive fields

Some functionality can be achieved with a ReplaceText processor, but there are major issues:

  • It operates on a text string, not structured
  • Replace can backfire when there is a regex match in an unexpected location

Proposed Solution

Create a dedicated JSON Transform processor. While doing my research I locked in on Jolt: http://bazaarvoice.github.io/jolt/

  • Java-based implementation. There are myriads JSON transform libraries, but most of them are JavaScript or even browser-focused only
  • Alternatives like a JSON serializer for JDK's XSLT parsers might work, but are usually way too much trouble than they are worth. XSLT files aren't the most user friendly bits either
  • Jolt transform spec is, in turn, a JSON
  • Any complex transformation logic which can't be expressed in standard terms can be plugged in via a Java extension class with Jolt
  • There is an online interactive design tool, which helps with the 'no UI' aspect: http://jolt-demo.appspot.com/

Examples

Below is an example transformation I needed in one of the flows (would like to substitute a ReplaceText with this new transformer eventually). The use case - rename one of the fields in the incoming JSON to bring it to a common data format which streams into a central location. Much more complicated transformations are, of course, possible, and are listed in the Jolt online demo app (link above).


Read full article from Complex JSON Transformation with Jolt - Hortonworks


1 comment:

  1. AWS big data consultant should understand the need of Data, and they should work to build more appropriate services to meet the requirements of their clients.

    ReplyDelete

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts