[TIKA-1723] Integrate language-detector into Tika - ASF JIRA
The language-detector project at https://github.com/optimaize/language-detector is faster, has more languages (70 vs 13) and better accuracy than the built-in language detector.
This is a stab at integrating it, with some initial findings. There are a number of issues this raises, especially if Chris A. Mattmann moves forward with turning language detection into a pluggable extension point.
Read full article from [TIKA-1723] Integrate language-detector into Tika - ASF JIRA
No comments:
Post a Comment