LexGarden: 4 strategies for building type system agnostic UIMA components



One the promises of UIMA is to allow for a community of developers to build components for a common platform such that best-of-breed strategies for text analysis can be realized by mixing and matching components built by disparate development groups. Unfortunately, just because a component is built on UIMA does not mean it can be seamlessly integrated into any arbitrary UIMA pipeline. Generally speaking, to easily mix and match components that do roughly the same work requires that they either submit to the same type system or they somehow be type system agnostic (i.e. can work with many/any type systems.) While I think it behooves the community to promote standard type systems - it is also helpful (and maybe better) to also think about creating more components that don't submit to any type system.

Here I summarize four strategies for creating UIMA components that are type system agnostic.

1) View Abuse
This approach takes advantage of the ability to create new views where data can be placed. This is the only approach of the four that does not require any type system at all. Not surprisingly, this is the most limiting of the four strategies and the most "hackish" for most scenarios. Suppose you have some piece of information that you need to add to the CAS. One approach would be to extend your type system and add that piece of information to a new type or a new feature of an existing type. An alternative approach proposed here is to instead create a new view and put that piece of information into that view as e.g. a string. For example, you may want to attach an identifier such as a URI to each document that is run through your pipeline. Instead of creating a new type that has a feature called "URI" and putting it there you could instead create a new view called "URIView" and make the URI the document text of that view. A utility method for setting the URI of a CAS might look something like this:

//copied from org.cleartk.util.ViewURIUtil
//(subject to this copyright/license statement)
public static void setURI(CAS cas, String uri) {
CAS view = cas.createView(ViewNames.URI);
view.setSofaDataURI(uri, null);
}

Read full article from LexGarden: 4 strategies for building type system agnostic UIMA components


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts