Text Analytics in Enterprise Search - Daniel Ling
Document Categorization
To assign a label to the document / content / data.
Labels for the category or for the sentiment.
Threshold values for matching a category before labeling.
Statistics and “knowledge” from previous examples can be used.
Mallet and the process of setup and train:
Training the component, Mallet (Machine Learning for Language Toolkit).
• Alternative components includes Lucene (TFIDF) index
(MoreLikeThis), OpenNLP, Textcat, Classifier4j.
Running the new documents against the model/index of trained
documents.
Training from interface, adhoc, or index pre-categorized
Document Summarization
Summarize a document, at index time or on-demand.
Leverage from the knowledge and term statistics of the document
and the index.
Picks the “most important” sentences based on the statistics and
displays those.
Example Solution: Document Summarization
Custom RequestHandler that receives document ID and field to summarize.
Custom Search Component making the selection of top sentences.
Selecting a subset of sentences and sends these back in a field.
Please read full article from Text Analytics in Enterprise Search - Daniel Ling
Document Categorization
To assign a label to the document / content / data.
Labels for the category or for the sentiment.
Threshold values for matching a category before labeling.
Statistics and “knowledge” from previous examples can be used.
Mallet and the process of setup and train:
Training the component, Mallet (Machine Learning for Language Toolkit).
• Alternative components includes Lucene (TFIDF) index
(MoreLikeThis), OpenNLP, Textcat, Classifier4j.
Running the new documents against the model/index of trained
documents.
Training from interface, adhoc, or index pre-categorized
Document Summarization
Summarize a document, at index time or on-demand.
Leverage from the knowledge and term statistics of the document
and the index.
Picks the “most important” sentences based on the statistics and
displays those.
Example Solution: Document Summarization
Custom RequestHandler that receives document ID and field to summarize.
Custom Search Component making the selection of top sentences.
Selecting a subset of sentences and sends these back in a field.
Please read full article from Text Analytics in Enterprise Search - Daniel Ling
No comments:
Post a Comment