java - Processing CDATA from XML via DOM parser - Stack Overflow



java - Processing CDATA from XML via DOM parser - Stack Overflow
I'm suspecting that your problem is in the following line of code from the getTagValue method:
Node nValue = (Node) nlList.item(0);
You are always getting the first child! But you might have more than one.
The following example has 3 children: text node "detail ", CDATA node "with cdata" and text node " here":
<Details>detail <![CDATA[with cdata]]> here</Details>
If you run your code, you get only "detail ", you loose the rest.
The following example has 1 child: a CDATA node "detail with cdata here":
<Details><![CDATA[detail with cdata here]]></Details>
If you run your code, you get everything.
But the same example as above written this way:
<Details>     <![CDATA[detail with cdata here]]>  </Details>
now has 3 children because the spaces and line feeds are picked up as text nodes. If you run your code you get the first empty text node with a line feed, you loose the rest.
You either have to loop through all children (no matter how many) and concatenate the value of each to get the full result, or if it's not important for you to differentiate between plain text and text inside CDATA, then set the coalescing property on the document builder factory first:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();  docFactory.setCoalescing(true);

Coalescing specifies that the parser produced by this code will convert CDATA nodes to Text nodes and append it to the adjacent (if any) text node. By default the value of this is set to false.
Read full article from java - Processing CDATA from XML via DOM parser - Stack Overflow

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts