Document Summarization with LSA This is a several-part series on document summarization using Latent Semantic Analysis (LSA). I wrote a document summarizer and did an exhaustive measurement pass using it to summarize newspaper articles from the first Reuters corpus. The code is structured as a web service in Solr, using Lucene for text analysis and the OpenNLP package for tuning the algorithm with Parts-of-Speech analysis. Introduction Document summarization is about finding the "themes" in a document: the important words and sentences that contain the core concepts.
Read full article from Uncle Lance's Ultra Whiz Bang: Document Summarization with LSA #1: Introduction
No comments:
Post a Comment