Using the ExpandComponent to expand a Solr Block Join | Solr Evolved



In the Solr documents above, there a two books. Each book has two pages, which will be loaded as a block with the master book records.

After loading we can issue the following parent block join query:

http://localhost:8983/solr/collection1/select?q={!parent which='type_s:parent'}text_t:solr&wt=xml&indent=true

This parent block join returns the parent book records for books where the child documents contain “solr” in the text_t field.

With the ExpandComponent, we can expand the results to include the children of the book records returned by the block join. For exmaple:

http://localhost:8983/solr/collection1/select?q={!parent which='type_s:parent'}text_t:solr&wt=xml&indent=true&expand=true&expand.field=ISBN_s&expand.q=*:*

The query above turns on the ExpandComponent with the expand=true parameter. The expand.field=ISBN_s parameter tells the ExpandComponent to group the expanded documents by the ISBN_s field. The expand.q=*:* tells the ExpandComponent to match all of the documents within the group. This is needed because by default the ExpandComponent uses the main query to determine which documents to match, which in this case was the block join.

With the ExpandComponent on, the results will have a new “expanded” section that contains the expanded child documents from the block join.


Read full article from Using the ExpandComponent to expand a Solr Block Join | Solr Evolved


Transforming Result Documents - Apache Solr Reference Guide - Apache Software Foundation



Document Transformers can be used to modify the information returned about each documents in the results of a query.

Using Document Transformers

When executing a request, a document transformer can be used by including it in the fl parameter using square brackets, for example:

fl=id,name,score,[shard]

Some transformers allow, or require, local parameters which can be specified as key value pairs inside the brackets:

fl=id,name,score,[explain style=nl]

Read full article from Transforming Result Documents - Apache Solr Reference Guide - Apache Software Foundation


Using Solr 4.9 new ChildDocTransformerFactory | Javalobby



Lucene & Solr 4.9 were released a couple weeks ago and introduced a new result document transformer called ChildDocTransformerFactory.

The ChildDocTransformerFactory transformer is useful when we need to get child documents that were indexed as nested documents.

There are many use cases where we want to 'join' results into a single response where this transformer can help.

Read full article from Using Solr 4.9 new ChildDocTransformerFactory | Javalobby


Journal of Learning Apache Lucene - Lucene In Action | I love programming



Lucene is a high-performance, scalable, search engine technology. Both indexing and searching features make up the Lucene API. The first part of this article takes you through an example of using Lucene to index all the text files in a directory and its subdirectories. Before proceeding to examples of analysis and searching, we'll take a brief detour to discuss the format of the index directory.

Indexing

We'll begin by creating the Indexer class that will be used to index all the text files in a specified directory. This class is a utility class with a single public method index() that takes two arguments. The first argument is a File object indexDir that corresponds to the directory where the index will be created. The second argument is another File object dataDir that corresponds to the directory to be indexed.

Read full article from Journal of Learning Apache Lucene - Lucene In Action | I love programming


瞬之与容对《Introduction to Information Retrieval》的笔记(23)



  • 1.1 An example information retrieval problem
    1.1
    The term "unstructured data" refers to datawhich does not have clear, semantically overt, easy-for-a-computer structure.
    structured data: relational database
    Incidence matrix: columns are documents and rows are words.
    And every row can be considered as a vector, we can solute if-not problems by bitwise AND, OR or NOT.
    Boolean retrieval model: pose any query which is in the form of a Boolean expression. The model views each document as just a set of words.
    Some basic definition:
    Information need: like query, documents which are relevant to personal information need.
    Two key statistics to evaluate an IR system:
    1. Precision: the fraction of relevant results in the whole results to information need.
    2. Recall: the fraction of relevant documents in the collection were returned in the result.
    2011-12-03 10:01:56 2回应
  • 1.2 A first take at building an inverted index
    1.2
    A term-document matrix is usually sparse, so it's better to record only nonzero ones, so we need inverted index.
    Inverted index: an index always maps back from terms to the parts of a document where they occur.
    dictionary + postings list
    Tokens and normalized tokens are loosely equivalent to words.(also terms)
    Sorting: become term and docID pairs to inverted index.(behind one term is frequency)
    and a posting can hold other information such as term frequency(the term occurs in the document) and position.
    The terms are sorted by alphabet and postings are sorted by docID.
    The dictionary can be in memory but posting lists are usually on the disk. If a part of posting lists is in memory, we can use linked list or variable length array.

  • Read full article from 瞬之与容对《Introduction to Information Retrieval》的笔记(23)


    The Fibonacci heap ruins my life at Mary Rose Cook



    A couple of Sundays ago, I wrote an implementation of Dijkstra’s algorithm in Clojure. The core algorithm came to twenty-five lines. I banged out the code as I sat in a coffee shop with some other people from Hacker School. I ran my program on a data set that has two-hundred nodes in a densely interconnected graph. The program produced best paths from a start node to all other nodes in the graph in about 200 milliseconds.

    I closed my laptop, finished my peanut butter, banana and honey sandwich, said goodbye to my friends and spent the rest of the afternoon wandering around the Lower East Side in the dusty sunlight.

    By Monday evening, my life had begun falling apart.

    Dijkstra’s algorithm is a way to find the shortest route from one node to another in a graph. If the cities in Britain were the nodes and the roads were the connections between the nodes, Dijkstra could be used to plan the shortest route from London to Edinburgh. And plan is the key word. The algorithm does reconnaissance. It does not go on a road trip.


    Read full article from The Fibonacci heap ruins my life at Mary Rose Cook


    斐波那契堆(Fibonacci heaps) | 酷~行天下



    斐波那契堆同二项堆一样,也是一种可合并堆。斐波那契堆的优势是:不涉及删除元素的操作仅需要O(1)的平摊运行时间(关于平摊分析的知识建议看《算法导论》第17章)。和二项堆一样,斐波那契堆由一组树构成。这种堆松散地基于二项堆,说松散是因为:如果不对斐波那契堆做任何DECREASE-KEY 或 DELETE 操作,则堆中每棵树就和二项树一样;但是如果执行这两种操作,在一些状态下必须要破坏二项树的特征,比如DECREASE-KEY或DELETE 后,有的树高为k,但是结点个数却少于2k。这种情况下,堆中的树不是二项树。

         与二项堆相比,斐波那契堆同样是由一组最小堆有序树构成,但是斐波那契堆中的树都是有根而无序的,也就是说,单独的树满足最小堆特性,但是堆内树与树之间是无序的,如下图。

         对于斐波那契堆上的各种可合并操作,关键思想是尽可能久地将工作推后。例如,当向斐波那契堆中插入新结点或合并两个斐波那契堆时,并不去合并树,而是将这个工作留给EXTRACT-MIN操作。


    Read full article from 斐波那契堆(Fibonacci heaps) | 酷~行天下


    algorithm - When to use Binomial Heap? - Stack Overflow



    Binomial Heap has quite special design. Personally I don't think this design is intuitive.

    Although posts such as What is the difference between binary heaps and binomial heaps? talks about diff and its speciality, I am still wondering when I should use it.

    In http://en.wikipedia.org/wiki/Binomial_heap, it says

    Because of its unique structure, a binomial tree of order k can be constructed from two trees of order k−1 trivially by attaching one of them as the leftmost child of root of the other one. This feature is central to the merge operation of a binomial heap, which is its major advantage over other conventional heaps.

    I presumes that an advantage of Binomial Heap is its merge. However, Leftist heap also has O(logN) merge and much simpler, why we still use Binomial Heap? When should I use Binomial Heap?


    Read full article from algorithm - When to use Binomial Heap? - Stack Overflow


    An Introduction to Binomial Heaps: Merge Better | Charlie Marsh



    Binomial Heaps: Merge Better

    Merge Better

    The other day, I was introduced to a really cool data structure: the binomial heap. You might be familiar with binary heaps, which use a binary tree to keep items in heap order; but binomial heaps are a little more obscure. As you would expect, they too retain heap order and are often used in implementing priority queues. However, the advantage of a binomial heap is that it supports log(n) merging given two binomial heaps.

    This table sums it up nicely:

    In short: with a binomial heap, you earn faster merging, but give up the O(1) find-min of a binary heap.

    How It Works: Binomial Trees

    A binomial heap is made up of a list of binomial trees, so we’ll first discuss the latter structure, which I find to be the particularly ingenious component. A binomial tree is a recursive data structure: a tree of degree zero is just a single node and a tree of degree k is two trees of degree k-1, connected.

    Thus:

    • A tree of degree 1 is just two nodes, i.e., two trees of degree 0.
    • A tree of degree 2 is four nodes, i.e., two trees of degree 1 (or two trees of two trees of degree zero = four nodes).
    • A tree of degree 3…

    Here's a visual representation:


    Read full article from An Introduction to Binomial Heaps: Merge Better | Charlie Marsh


    经典算法系(9)-二叉堆&二项树&二项堆&斐波那契堆(Binary Heap&Binomial Tree&Binomial Heap&Fibonacci Heap)



    10. 二叉堆(Binary Heap)

     二叉堆是完全二叉树(或者近似完全二叉树);其满足堆的特性:父节点的值>=(<=)任何一个子节点的键值,并且每个左子树或者右子树都是一个二叉堆(最小堆或者最大堆);一般使用数组构建二叉堆,对于array[i]而言,其左子节点为array[2*i],其右子节点为array[2*i+1];二叉堆支持插入,删除,查找最大(最小)键值的操作,但是合并二叉堆的复杂度较高,时间复杂度为O(N);但是二项堆或者斐波那契堆则仅需要O(logN);

    11. 二项树(Binomial Tree)
     定义度数为二项树根节点的直接子节点个数;如果一棵二项树的度数为0,则其只包含一个根节点;如果一棵二项树(包括子树)的度数为K,则其根节点包含K个子节点,并且其子节点分别为度数是K-1,K-2,K-3,…,1,0的子树的根;
     

     从上图可知,
     每当一棵二项树的度数从k-1变成k,则其所有子节点的个数增加2k-1。因此度数为K的二项树的所有子节点个数为1+2+…+2k-1=2k。
     二项树的高度由其增加的度数锁带来的子树的高度确定(度数每增加1,相当于二项树根节点增加一个其自身大小的子树,所以其高度和节点数都变成2N或者2H),所以其高度为H=k
     在深度为h的层(从0开始记),节点个数为C(k, h),也就是从k个数中选h个数的选择方法数;C(k, h)=k!/(h!*(k-h)!)

    12. 二项堆(Binomial Heap)

    Read full article from 经典算法系(9)-二叉堆&二项树&二项堆&斐波那契堆(Binary Heap&Binomial Tree&Binomial Heap&Fibonacci Heap)


    Panda algorithm rewards sites that made changes since last update < Content Marketing Blog | Castleford Media



    Don’t you just hate it when you spend hours and hours trying to make your website perfect, then a gigantic 250 pound panda comes along and rips it all to shreds. This week some major websites are licking their wounds after the latest Panda update rolled out. Panda updates are designed to target and penalise low-quality content – such as spam – but also sites with very little useful content. So far, sites producing news and online content have benefited from the update, while lyrics, gaming and medical sites have been hit hardest. One of the good things about this update, however,

    Read full article from Panda algorithm rewards sites that made changes since last update < Content Marketing Blog | Castleford Media


    计算几何算法概览 - GameRes.com



    本文整理的计算几何基本概念和常用算法包括如下内容:

      矢量的概念

      矢量加减法

      矢量叉积

      折线段的拐向判断

      判断点是否在线段上

      判断两线段是否相交

      判断线段和直线是否相交

      判断矩形是否包含点

      判断线段、折线、多边形是否在矩形中

      判断矩形是否在矩形中

      判断圆是否在矩形中

      判断点是否在多边形中

      判断线段是否在多边形内

      判断折线是否在多边形内

      判断多边形是否在多边形内

      判断矩形是否在多边形内

      判断圆是否在多边形内

      判断点是否在圆内

      判断线段、折线、矩形、多边形是否在圆内

      判断圆是否在圆内

      计算点到线段的最近点

      计算点到折线、矩形、多边形的最近点

      计算点到圆的最近距离及交点坐标

      计算两条共线的线段的交点

      计算线段或直线与线段的交点

      求线段或直线与折线、矩形、多边形的交点

      求线段或直线与圆的交点

      凸包的概念

      凸包的求法


    Read full article from 计算几何算法概览 - GameRes.com


    Sweep Algorithm



    Now we take a look at a better way to calculate the 2D Voronoi diagram.  This algorithm was first introduced by Steven Fortune in [2] and is often referred to as Fortune’s Algorithm as a result.  This algorithm has complexity O(n log n), which is optimal for this problem.  The following sections introduce the different constructions and terminology that are used by the solution, followed by the algorithm itself.

     

    Sections:

    1. Sweep Line
    2. Site Events
    3. Circle Events
    4. Sweep Algorithm
      1. Data structures
      2. Algorithm
      3. Complexity

     

     


    Sweep Line

     

    A common way to simplify a proximity related problem such as computing the Voronoi diagram is to use a sweep line construction.  A sweep line splits the problem domain into two regions, an explored region and an unexplored region.  It applies an ordering to the problem because we reason about the explored area based on what we’ve seen so far and ignore the unexplored area.  We can get away with not caring about the unexplored area by observing that only points that have reached the sweep line or crossed it can possibly affect the current computation since all other points are too far away.  The entire problem area is eventually examined by “sweeping" the line across the set of points from one extreme to another.

     

    Let's look at an example of the sweep line applied to calculating the Voronoi diagram of a single site.  Imagine our sweep line, denoted s, starts at the top of the image and sweeps down as seen in Figure 2.  The point set contains only one site p1.  How can we reason about the area above the sweep line?  Before the first and only site is encountered, there is nothing to reason about.  Once the sweep line has passed below site p1, what area must be in the Voronoi cell of site p1?  Since there could be a new site lurking just beneath the surface of the sweep line we can only reason that any point closer to p1 than to the sweep line itself must be in the Voronoi cell of site p1 at this moment in time.  It could be that the Voronoi cell of site p1 is much larger than what we’re assigning to it at this moment, but the sweep will eventually account for this.

     

    Reasoning in this way divides the area above the sweep line into two disjoint regions separated by a parabola.  Points on the parabola are equidistant to p1 and the sweep line.  This dividing line is termed the beach line (for reasons that will soon become apparent).  The algorithm for computing the Voronoi diagram of a set of points depends entirely on how this beach line changes as the sweep moves through the space.  The beach line’s topology changes when a new arc is added or deleted.  The following sections look at when these events can occur.


    Read full article from Sweep Algorithm


    SparkNotes: Vector Multiplication: The Cross Product



    Dot Product Problems Cross Product Problems We saw in the previous section on dot products that the dot product takes two vectors and produces a scalar, making it an example of a scalar product. In this section, we will introduce a vector product, a multiplication rule that takes two vectors and produces a new vector. We will find that this new operation, the cross product, is only valid for our 3-dimensional vectors, and cannot be defined in the 2- dimensional case. The reasons for this will become clear when we discuss the kinds of properties we wish the cross product to have.

    Read full article from SparkNotes: Vector Multiplication: The Cross Product


    How to validate email address with regular expression



    How to validate email address with regular expression
    Email Regular Expression Pattern
    ^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*
          @[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$;
    Description
    ^   #start of the line
      [_A-Za-z0-9-\\+]+ #  must start with string in the bracket [ ], must contains one or more (+)
      (   #   start of group #1
        \\.[_A-Za-z0-9-]+ #     follow by a dot "." and string in the bracket [ ], must contains one or more (+)
      )*   #   end of group #1, this group is optional (*)
        @   #     must contains a "@" symbol
         [A-Za-z0-9-]+      #       follow by string in the bracket [ ], must contains one or more (+)
          (   #         start of group #2 - first level TLD checking
           \\.[A-Za-z0-9]+  #           follow by a dot "." and string in the bracket [ ], must contains one or more (+)
          )*  #         end of group #2, this group is optional (*)
          (   #         start of group #3 - second level TLD checking
           \\.[A-Za-z]{2,}  #           follow by a dot "." and string in the bracket [ ], with minimum length of 2
          )   #         end of group #3
    $   #end of the line
    The combination means, email address must start with “_A-Za-z0-9-\\+” , optional follow by “.[_A-Za-z0-9-]“, and end with a “@” symbol. The email’s domain name must start with “A-Za-z0-9-“, follow by first level Tld (.com, .net) “.[A-Za-z0-9]” and optional follow by a second level Tld (.com.au, .com.my) “\\.[A-Za-z]{2,}”, where second level Tld must start with a dot “.” and length must equal or more than 2 characters.
    Read full article from How to validate email address with regular expression

    Eclipse Tip: Escape text when pasting | Vasanth Dharmaraj's Blog



    Eclipse Tip: Escape text when pasting | Vasanth Dharmaraj's Blog
    When you paste some text in to a string have you seen this?
    pasting string 1.png
    The problem is that the string is not escaped. This gives you a error. Fixing this is not easy if you have pasted text that needs a lot of escaping. I have encountered this a few times. It is annoying.
    Escape text setting.png
    Luckily Eclipse has a simple solution for this. Go to Window>Preference>Java>Editor>Typingand check the “Escape text when pasting into a string literal“. Now when you paste the same text here is what you get:
    Read full article from Eclipse Tip: Escape text when pasting | Vasanth Dharmaraj's Blog

    4 Useful Things To Learn In 10 Minutes To Better Your Programming Career - Forbes



    9/23/2014 @ 12:05PM 3,136 views 4 Useful Things To Learn In 10 Minutes To Better Your Programming Career Define the problem (requirements and constraints). Define the solution (algorithms and data structures). Prove and/or test correctness. A lot of people get really good at #3 because it’s by far the easiest, but never master the other parts and as a result remain mediocre programmers at best.  Often, a good programmer can solve a problem without writing any code at all, by using their knowledge and experience to avoid a problem or find solutions that don’t require new code.  In a team,

    Read full article from 4 Useful Things To Learn In 10 Minutes To Better Your Programming Career - Forbes


    How To Broaden Your Thinking, Find Ideas - Business Insider



    × Reuters/Lucas JacksonAuthor Michael Lewis says that people join Wall Street because they're no good at original thinking. We're here to correct that. Coming up with original ideas — thinking outside the box, if you will — is hard to do, even for the highest-achieving among us.  "Flash Boys" author Michael Lewis says that Ivy Leaguers flock to Wall Street because that's where they can become high-status and uber-rich without needing to think of anything new.  If you, like the bankers Lewis calls out, need fresh ideas, turn to Quora.

    Read full article from How To Broaden Your Thinking, Find Ideas - Business Insider


    The Google Formula for Success - NYTimes.com



    By STEVE LOHR September 28, 2014 Can Google’s winning ways be applied to all kinds of businesses? The authors of “How Google Works,” Eric Schmidt, Google’s former chief executive, and Jonathan Rosenberg, a former senior product manager at Google, firmly believe that they can. The critical ingredient, they argue in their new book, is to build teams, companies and corporate cultures around people they call “smart creatives.” These are digital-age descendants of yesterday’s “knowledge workers,” a term coined in 1959 by Peter Drucker, the famed management theorist.

    Read full article from The Google Formula for Success - NYTimes.com


    Nexus 9, Android L Release Date is October 24 After October 16 Google Intro - Report - International Business Times



    By Erik Pineda | September 26, 2014 12:10 PM EST Release date of the Nexus 9, Google's upcoming HTC -built flagship tablet, is set to happen on October 24 or a week after its grand unveiling, a new report said. A Google logo is seen at the garage where the company was founded on Google's 15th anniversary in Menlo Park, California September 26, 2013. Developer Paul O'Brien shared via Twitter that the HTC Nexus 9 will start rolling out on October 24 in the United Kingdom .

    Read full article from Nexus 9, Android L Release Date is October 24 After October 16 Google Intro – Report - International Business Times


    CSC378: Interval Trees




    Read full article from CSC378: Interval Trees


    Two dimensional binary searching



    Application: Searching Maps

    How do we determine some subset of a 2D map to draw on a screen, without searching the whole map?  For instance, we have an in-car navigation system, with limited CPU power, but a street level map of the whole state. We want to display the streets in our immediate vicinity on the car's display screen.

    We can treat the map as a collection of line segments on a plane.  Even more complicated structures can be broken into segments.

    Problem: Given a rectangular window region defined by two points, p1 (top left) and p2 (bottom right), return all segments from some global set that intersect this window.

    Naive solution: We can construct a 1D interval tree for each dimension, using the projection of each segment onto the X and Y axes respectively:


    Read full article from Two dimensional binary searching


    6.838 Lecture 13



  •  (Warmup) Indexing in one dimension (binary search trees):
    1. Points
    2. Segments
  • Binary indexing in two dimensions:
    1. Application: searching maps
    2. Multiple 1D trees
  • kD trees
    1. Nice features
    2. How to build them
    3. How to use them
    4. ``Not just for points any more!''
    5. Application: 2D nearest neighbour & N-body problems
    6. Application: colour reduction with 3D kD trees
    7. Fixing them up dynamically
  • QuadTrees
    1. Presentation by Cyril.
  • BSP trees
    1. Nice Features
    2. Building and using
    3. Application: stabbing rays/mouse picking
    4. Application: painter's algorithm and frustum culling
  • Hierarchical triangulation (Dobkin/Kirpatrick hierarchy)
    1. Built-in nearest neighbours
    2. How to build and use in 2D
    3. Application: convex polyhedron intersection in 3D

    Read full article from 6.838 Lecture 13


    Interval Tree - GeeksforGeeks




    Read full article from Interval Tree - GeeksforGeeks


    Visualizing range trees : Inside 206-105




    Read full article from Visualizing range trees : Inside 206-105


    Stanford CS Ed Library



    This online library collects education CS material from Stanford courses and distributes them for free. Update 2006 For learning code concepts (Java strings, loops, arrays, ...), check out Nick's experimental javabat.com server, where you can type in little code puzzles and get immediate feedback. If you think the CS Education Library is useful, please link to it at http://cslibrary.stanford.edu/
     

    Pointers and Memory

    Binky Pointer Video picture from video A silly but memorable 3 minute animated video demonstrating the basic structure, techniques, and pitfalls of using pointers. There are separate versions of the video for C, Java, C++, Pascal, and Ada. There is also a more traditional companion text (below) that goes with the video, and a brief history of how the video was made.
    Pointer Basics The companion text for the Binky video. Presents the same concepts and examples as the video, and includes study questions with solutions. Code is presented in C, Java, C++ and Pascal.
    Pointers and Memory A 31 page explanation of everything you ever wanted to know about pointers and memory. Can be used as an introduction, or as review for people who mostly understand pointers. Mostly uses C, with some discussion of C++ and Java. The early sections introduce basic pointer concepts, while the later sections discuss more advanced topics such as reference pointers and dynamic arrays.

    Lists and Trees

    Linked List Basics A 26 page introduction to the techniques and code for building linked lists in C. Includes basic examples and sample problems with solutions. Provides a basic understanding of linked lists and pointer code.
    Linked List Problems A quick review of linked list basics followed by 18 linked list problems with solutions. The problems range from beginner, to intermediate, to advanced -- an excellent source of pointer algorithm problems.
    Binary Trees A 27 page introduction to binary trees. Introduces the basic concepts of binary trees, and then works through a series of practice problems with solution code in C/C++ and Java. Binary trees have an elegant recursive structure, so they make a good introduction to recursive pointer algorithms.
    The Great Tree List Recursion Problem One of the neatest pointer/recursion problems you will ever see. This is an advanced problem that uses linked lists, binary trees, and recursion. Includes solution code in Java and C.

    Languages

    Essential Perl A quick 23 page introduction to the main features of the Perl language. Handy as an introduction or a quick reference.
    Essential C A relatively quick, 45 page discussion of most of the practical aspects of programming in C. Explains types, variables, operators, functions, control constructs, arrays, pointers, strings, array/pointer trickery, and the standard library functions. The coverage is complete, but quick, so it is most appropriate for someone with some programming experience. (revised 4/2003)

    Unix

    Unix Programming Tools A 16 page introduction to the most common Unix tools and their usage in the compile-link-debug process. Introduces gcc, make, gdb, emacs, and the shell. There should be enough information here to allow someone with a little Unix experience to build and debug.

    Tetris

    Stanford Tetris Project Complete programming materials for a tetris assignment, including a game playing AI. A runnable version is included, along with sufficient materials for people to attempt the project. Presented at the Nifty Assignments Panel at SIG-CSE 2001

    Read full article from Stanford CS Ed Library


    How to master in-place array modification algorithms? - Stack Overflow



    In-place modification algorithms could become very hard to handle.

    Consider a couple:

    • Inplace out-shuffle in linear time. Uses number theory.
    • In-place merge sort, was open for a few years. An algorithm came but was too complicated to be practical. Uses very complicated bookkeeping.

    Sorry, if this sounds discouraging, but there is no magic elixir which will solve all in-place algorithm problems for you. You need to work with the problem, figure out its properties and try to exploit them (as is the case with most algorithms).

    That said, for array modifications which result in a permutation of the original array, you can try the method of following the cycles of the permutation. Basically any permutation can be written as a disjoint set of cycles (see John's answer too). For instance the permutation:

    1 4 2 5 3 6  

    of 1 2 3 4 5 6 can be written as

    1 -> 1  2 -> 3 -> 5 -> 4 -> 2  6 -> 6.  

    you can read the arrow as 'goes to'.

    So to permute the array 1 2 3 4 5 6 you follow the three cycles:

    1 goes to 1.

    6 goes to 6.

    2 goes to 3, 3 goes to 5, 5 goes to 4 and 4 goes to 2.

    To follow this long cycle, you can use just one temp variable. Store 3 in it. Put 2 where 3 was. Now put 3 in 5 and store 5 in the temp and so on. Since you only use constant extra temp space to follow a particular cycle, you are doing an in-place modification of the array for that cycle.

    Now if I gave you a formula for computing where an element goes to, all you now need is the set of starting elements of each cycle.

    A judicious choice of the starting points of the cycles can make the algorithm easy. If you come up with the starting points in O(1) space, you now have a complete in-place algorithm. This is where you might actually have to get familiar with the problem and exploit its properties.

    Even if you didn't know how to compute the starting points of the cycles, but had a formula to compute the next element, you could use this method to get an O(n) time in-place algorithm in some special cases.

    For instance: if you knew the array of signed integers held only positive integers.

    You can now follow the cycles, but negate the numbers in them as an indicator of 'visited' elements. Now you can walk the array and pick the first positive number you come across and follow the cycles for that, making the elements of the cycle negative and continue to find untouched elements. In the end you just make all the elements positive again to get the resulting permutation.

    You get an O(n) time and O(1) space algorithm! Of course, we kind of 'cheated' by using the sign bits of the array integers as our personal 'visited' bitmap.

    Even if the array was not necessarily integers, this method (of following the cycles, not the hack of sign bits :-)) can actually be used to tackle the two problems you state:

    • The inshuffle (or out-shuffle) problem: When 2n+1 is a power of 3, it can be shown (using number theory) that 1,3,3^2, etc are in different cycles and all cycles are covered using those. Combine this with the fact that the inshuffle is susceptible to divide and conquer, you get an O(n) time, O(1) space algorithm (the formula is i -> 2*i modulo 2n+1). Refer the above paper for more details.

    • The cyclic shift an array problem: Cyclic shift an array of size n by k also gives a permutation of the resulting array (given by the formula i goes to i+k modulo n), and can also be solved in linear time and in-place using the following the cycle method. In fact, in terms of the number of element exchanges this following cycle method is better than the 3 reverses algorithm. Of course, following the cycle method can kill the cache because of the access patterns and in practice the 3 reverses algorithm might actually fare better.


    Read full article from How to master in-place array modification algorithms? - Stack Overflow


    algorithm - in-place permutation of a array follows this rule - Stack Overflow



    An O(n) time O(1) space solution.

    The ideas used are similar to the ideas used in the following paper: A simple in-place algorithm for Inshuffle.

    You would need to read that paper to understand the below. I suggest you also read: http://stackoverflow.com/questions/2352542/how-to-master-in-place-array-modification-algorithms

    This is basically the inverse permutation of what is solved in the paper above.

    It is enough to solve this when 2n+1 is a power of 3 = (3^m say), as we can use divide and conquer after that (like the O(nlogn) solution).

    Now 2n+1 and n+1 are relatively prime, so working modulo 3^m, we see that n+1 must be some power of 2. (See that paper again to see why: basically any number modulo 3^m which is relative prime to 3^m is a power of 2, again modulo 3^m).

    Say n+1 = 2^k (we don't know k yet and note this is modulo 3^m).

    A way to find out k, compute powers of n+1 modulo 3^m, till it becomes 1. This gives us k (and is O(n) time at most).

    Now we can see that the cycles of the permutation (see above paper/stackoverflow link for what that is) start at

    2^a*3^b

    where 0 <= a < k, and 0 <= b < m.

    So you start with each possible pair (a,b) and follow the cycles of the permutation, and this gives an O(n) time, in-place algorithm, as you touch each element no more than a constant number of times!

    This was a bit brief(!) and if you need more info, please let me know.


    Read full article from algorithm - in-place permutation of a array follows this rule - Stack Overflow


    Gunnar Kudrjavets - The out-shuffle problem: solutions and acknowledgments



    First of all I would like to thank everyone who shared their solutions for out-shuffle problem with the rest of us and submitted any comments related to this problem. Looking back in time it was actually three years ago when we tried to find an efficient answer to this particular programming challenge. After we were puzzled and didn’t make any progress for some time I contacted the most knowledgeable person in my circle of acquaintances when it comes to algorithms and data structures. His name is Ahto Truu and he is GodOfAlgorithmsAndDataStructures ;-)

    Ahto writes regular column about programming puzzles for one of the Estonian IT magazines and as a natural result he composed an article about this particular problem. The original article in Estonian is published here. On Monday Ahto sent me an updated English translation of this article. The PDF file can be downloaded from here. Here’s the table of contents:

    • The Problem
    • The Setup
    • A Memory-Hungry Solution
    • A Time-Hungry Solution
    • A Divide’n’Conquer Solution
    • A Combinatorial Solution
    • Acknowledgments

    As also pointed out in Ahto’s article, this paper by Ellis, Krahn, and Fan describes an algorithm that solves out-shuffle problem in O(N) time and O(log N) space.


    Read full article from Gunnar Kudrjavets - The out-shuffle problem: solutions and acknowledgments


    Gunnar Kudrjavets - Think you're good with algorithms, try attacking this problem



    Almost everything from professional literature I've been lately reading is written by the following authors: Jon Bentley, Donald Knuth, and Robert Sedgewick. In the process I've been also going through number of different programming problems. Here's one old problem which puzzled me and number of my colleagues from Speech Component Group (team which is responsible for core speech recognition engine and SAPI) about two years ago. We never came up with efficient solution though ;-) The problem statement is pretty trivial, but be careful - it's not as simple as it seems.

    The problem. An array which contains 2 N elements needs to be arranged from

    a1, a2, a3, ..., an, b1, b2, b3, ..., bn

    to

    a1, b1, a2, b2, a3, b3, ..., an, bn.

    Of course this needs to be done as efficiently as possible in terms of both computational complexity and memory usage. The best solution known to me (old coworker of mine from Estonia came up with the algorithm) has the complexity of O(N log(N)) and uses no more than constant amount of memory.

    Can you code up the solution which has the same characteristics as the best solution known to me? Can you do better? Is it even possible to do better? If you can solve this problem under these constraints or prove mathematically that there's no better solution then you should definitely send your CV to Microsoft ;-)


    Read full article from Gunnar Kudrjavets - Think you're good with algorithms, try attacking this problem


    Matcher (java.util.regex.Matcher)








    The java.util.regex.Matcher class is used to search through a text for multiple occurrences of a regular expression. You can also use a Matcher to search for the same regular expression in different texts.
    The Matcher class has a lot of useful methods. For a full list, see the official JavaDoc for the Matcher class. I will cover the core methods here.

    Java Matcher Example

    Here is a quick Java Matcher example so you can get an idea of how the Matcher works:
    String text    =          "This is the text to be searched " +          "for occurrences of the http:// pattern.";    String patternString = ".*http://.*";    Pattern pattern = Pattern.compile(patternString);    Matcher matcher = pattern.matcher(text);  boolean matches = matcher.matches();  
    First a Pattern is created, and from that a Matcher. Then the matches() method is called, which returns true if the pattern matches the text, and false if not.
    You can do a whole lot more with the Matcher class. The rest is covered throughout the rest of this text.

    Creating a Matcher

    Creating a Matcher is done via the matcher() method in the Pattern class. Here is an example:
    String text    =          "This is the text to be searched " +          "for occurrences of the http:// pattern.";    String patternString = ".*http://.*";    Pattern pattern = Pattern.compile(patternString);    Matcher matcher = pattern.matcher(text);    

    matches()

    The matches() method in the Matcher class matches the regular expression against the whole text passed to the Pattern.matcher() method, when the Matcher was created. Here is an example:
    boolean matches = matcher.matches();  
    If the regular expression matches the whole text, then the matches() method returns true. If not, the matches() method returns false.
    You cannot use the matches() method to search for multiple occurrences of a regular expression in a text. For that, you need to use the find(), start() and end() methods.

    lookingAt()

    The lookingAt() method works like the matches() method with one major difference. The lookingAt() method only matches the regular expression against the beginning of the text, whereas matches() matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, lookingAt() will return true, whereas matches() will return false.
    Here is an example:
    String text    =          "This is the text to be searched " +          "for occurrences of the http:// pattern.";    String patternString = "This is the";    Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);  Matcher matcher = pattern.matcher(text);    System.out.println("lookingAt = " + matcher.lookingAt());  System.out.println("matches   = " + matcher.matches());  
    This example matches the regular expression "this is the" against both the beginning of the text, and against the whole text. Matching the regular expression against the beginning of the text (lookingAt()) will return true.
    Matching the regular expression against the whole text (matches()) will return false, because the text has more characters than the regular expression. The regular expression says that the text must match the text "This is the" exactly, with no extra characters before or after the expression.

    find() + start() + end()

    The find() method searches for occurrences of the regular expressions in the text passed to the Pattern.matcher(text) method, when the Matcher was created. If multiple matches can be found in the text, the find() method will find the first, and then for each subsequent call to find() it will move to the next match.
    The methods start() and end() will give the indexes into the text where the found match starts and ends. Actually end() returns the index of the character just after the end of the matching section. Thus, you can use the return values of start() and end() inside a String.substring() call.
    Here is an example:
    String text    =          "This is the text which is to be searched " +          "for occurrences of the word 'is'.";    String patternString = "is";    Pattern pattern = Pattern.compile(patternString);  Matcher matcher = pattern.matcher(text);    int count = 0;  while(matcher.find()) {      count++;      System.out.println("found: " + count + " : "              + matcher.start() + " - " + matcher.end());  }

    Read full article from Matcher (java.util.regex.Matcher)

    Java Tutorial/XML/CDATA



    http://jexp.ru/index.php/Java_Tutorial/XML/CDATA
    Converting CDATA Nodes into Text Nodes While Parsing an XML File
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setCoalescing(true);
    Document doc = factory.newDocumentBuilder().parse(new File("infilename.xml"));





    java - Processing CDATA from XML via DOM parser - Stack Overflow



    java - Processing CDATA from XML via DOM parser - Stack Overflow
    I'm suspecting that your problem is in the following line of code from the getTagValue method:
    Node nValue = (Node) nlList.item(0);
    You are always getting the first child! But you might have more than one.
    The following example has 3 children: text node "detail ", CDATA node "with cdata" and text node " here":
    <Details>detail <![CDATA[with cdata]]> here</Details>
    If you run your code, you get only "detail ", you loose the rest.
    The following example has 1 child: a CDATA node "detail with cdata here":
    <Details><![CDATA[detail with cdata here]]></Details>
    If you run your code, you get everything.
    But the same example as above written this way:
    <Details>     <![CDATA[detail with cdata here]]>  </Details>
    now has 3 children because the spaces and line feeds are picked up as text nodes. If you run your code you get the first empty text node with a line feed, you loose the rest.
    You either have to loop through all children (no matter how many) and concatenate the value of each to get the full result, or if it's not important for you to differentiate between plain text and text inside CDATA, then set the coalescing property on the document builder factory first:
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();  docFactory.setCoalescing(true);

    Coalescing specifies that the parser produced by this code will convert CDATA nodes to Text nodes and append it to the adjacent (if any) text node. By default the value of this is set to false.
    Read full article from java - Processing CDATA from XML via DOM parser - Stack Overflow

    Add CDATA section to DOM document | Examples Java Code Geeks



    In this example we shall show you how to add CDATASection to a DOM Document. We have implemented a method, that is void prettyPrint(Document xml), in order to convert a DOM into a formatted XML String. To add CDATASection to a DOM Document one should perform the following steps:

    • Obtain a new instance of a DocumentBuilderFactory, that is a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.
    • Set the parser produced so as not to validate documents as they are parsed, using setValidating(boolean validating) API method of DocumentBuilderFactory, with validating set to false.
    • Create a new instance of a DocumentBuilder, using newDocumentBuilder() API method of DocumentBuilderFactory.
    • Parse the FileInputStream with the content to be parsed, using parse(InputStream is) API method of DocumentBuilder. This method parses the content of the given InputStream as an XML document and returns a new DOM Document object.
    • Get the Document Element using getDocumentElement() API method of Document.
    • Create a CDATASection node whose value is the specified string, with createCDATASection(String data) API method of Document.
    • Append the node to the document element, using appendChild(Node newChild) API method of Node.
    • Call void prettyPrint(Document xml) method of the example. The method gets the xml Document and converts it into a formatted xml String, after transforming it with specific parameters, such as encoding. The method uses a Transformer, that is created using newTransformer() API method of TransformerFactory. The Transformer is used to transform a source tree into a result tree. After setting specific output properties to the transformer, using setOutputProperty(String name, String value) API method of Transformer, the method uses it to make the transformation, with transform(Source xmlSource, Result outputTarget) API method of Transformer. The parameters are the DOMSource with the DOM node and the result that is a StreamResult created from a StringWriter,

    Read full article from Add CDATA section to DOM document | Examples Java Code Geeks




    ############################################################  #  	Default Logging Configuration File  #  # You can use a different file by specifying a filename  # with the java.util.logging.config.file system property.    # For example java -Djava.util.logging.config.file=myfile  ############################################################    ############################################################  #  	Global properties  ############################################################    # "handlers" specifies a comma separated list of log Handler   # classes.  These handlers will be installed during VM startup.  # Note that these classes must be on the system classpath.  # By default we only configure a ConsoleHandler, which will only  # show messages at the INFO and above levels.  handlers= java.util.logging.ConsoleHandler    # To also add the FileHandler, use the following line instead.  #handlers= java.util.logging.FileHandler, java.util.logging.ConsoleHandler    # Default global logging level.  # This specifies which kinds of events are logged across  # all loggers.  For any given facility this global level  # can be overriden by a facility specific level  # Note that the ConsoleHandler also has a separate level  # setting to limit messages printed to the console.  #.level= INFO  .level= ALL    ############################################################  # Handler specific properties.  # Describes specific configuration info for Handlers.  ############################################################    # default file output is in user's home directory.  java.util.logging.FileHandler.pattern = %h/java%u.log  java.util.logging.FileHandler.limit = 50000  java.util.logging.FileHandler.count = 1  java.util.logging.FileHandler.formatter = java.util.logging.XMLFormatter    # Limit the message that are printed on the console to INFO and above.  java.util.logging.ConsoleHandler.level = INFO  java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter      ############################################################  # Facility specific properties.  # Provides extra control for each logger.  ############################################################    # For example, set the com.xyz.foo logger to only log SEVERE  # messages:  com.xyz.foo.level = SEVERE  

    Read full article from


    How Your Comfort Zone Is Sabotaging Your Success | Kevin Kleitches



    Email address Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors Hot on the Blog Posted: Despite what you were taught growing up, being comfortable is bad. Many people envision a lifestyle where they can achieve success without having to step outside of their comfort zone. This is delusional thinking. If you want to make progress with your dreams, you have to do things that scare you. Let me show you why. Living Inside Your Comfort Zone Restricts Action People marvel at great ideas,

    Read full article from How Your Comfort Zone Is Sabotaging Your Success | Kevin Kleitches


    lucene - Solr segments.gen and segments_N file restore - Stack Overflow



    would guess that it is unlikely that the segment_N and segments.gen files were the only things lost, by the sound of it, but you can try using CheckIndex.

    You can run it from a command line something like:

    java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex path/to/index -fix  

    Or you can invoke methods of it in your own implementation, something like:

    Directory directory = FSDirectory.open(new File("path/to/index"));  CheckIndex check = new CheckIndex(directory);  CheckIndex.Satus status = check.checkIndex();  check.fixIndex(status);

    Read full article from lucene - Solr segments.gen and segments_N file restore - Stack Overflow


    SOLR: java.io.FileNotFoundException: no segments* file found | Web Builder Zone



    While playing around with one of my development SOLR installations (this time under Windows), I suddenly got a weird error message when feeding data to one of the fresh cores.

     

    SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.SimpleFSDirectory@C:\temp\solr\*\data\index: files:

     

    Taking a look at the contents of the index\ directory, it was in fact empty. Seems weird, but my initial guess was that Lucene / SOLR would treat this as a new installation and create the files.

    Turns out the issue is that it won’t – as long as the index directory exists, Lucene / SOLR goes looking for the segment files.

    Thanks to an old post to the solr-dev list by Yonik, the easiest fix is to simply delete the index directory and restart your applet container (Tomcat in this case).


    Read full article from SOLR: java.io.FileNotFoundException: no segments* file found | Web Builder Zone




    Get character data (CDATA) from xml document
    http://www.java2s.com/Code/Java/XML/GetcharacterdataCDATAfromxmldocument.htm
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;

    import org.w3c.dom.CharacterData;
    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    public class Main {
      public static void main(String[] argsthrows Exception {
        File file = new File("data.xml");
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = builder.parse(file);

        NodeList nodes = doc.getElementsByTagName("topic");
        for (int i = 0; i < nodes.getLength(); i++) {
          Element element = (Elementnodes.item(i);
          NodeList title = element.getElementsByTagName("title");
          Element line = (Elementtitle.item(0);
          System.out.println("Title: " + getCharacterDataFromElement(line));
        }
      }
      public static String getCharacterDataFromElement(Element e) {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
          CharacterData cd = (CharacterDatachild;
          return cd.getData();
        }
        return "";
      }
    }

    Labels

    Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

    Popular Posts