美企婕斯涉嫌非法传销 普通果汁宣称是特效神药_新浪财经_新浪网



美企婕斯涉嫌非法传销 普通果汁宣称是特效神药_新浪财经_新浪网

  中国经济网北京12月22日讯(记者刘佳马先震)近年来,打着微商口号的网络传销日益蔓延,国外公司也乐此不疲,其中,被新华社、天津日报等媒体多次曝光的婕斯公司,就是典型的一例。

  中国经济网记者通过查证发现,涉嫌网络传销的婕斯公司并未取得直销牌照,其核心产品婕斯沛泉菁华——白藜芦醇果汁,也没有获得国家批准文号;婕斯设在上海的公司,也未取得进口食品资质,然而,这款产品依然在网上销售。

  记者查找婕斯(上海)生物科技有限公司网站,所留电话号码无人接听。

  普通果汁还是特效神药?

  据新华社记者发现,以"婕斯国际"、"婕斯云课堂"、"婕斯香港""皇冠婕斯"为名的微信公众号有近百个,公众号内宣传最多的是美商婕斯环球有限公司的产品:婕斯沛泉菁华——白藜芦醇果汁。

  这款被称为"创造生命奇迹——21世纪的长寿瑰宝"的果汁,号称能治疗31种疾病,举凡癌症、糖尿病、冠心病、肝硬化甚至不孕不育都有特效。目前,产品在淘宝、美斯乐网上商城等平台均有销售,月销量从数百件到上千件不等。


Read full article from 美企婕斯涉嫌非法传销 普通果汁宣称是特效神药_新浪财经_新浪网


LeetCode 373. Find K Pairs with Smallest Sums | all4win78



LeetCode 373. Find K Pairs with Smallest Sums | all4win78

You are given two integer arrays nums1 and nums2 sorted in ascending order and an integer k.

Define a pair (u,v) which consists of one element from the first array and one element from the second array.

Find the k pairs (u1,v1),(u2,v2) …(uk,vk) with the smallest sums.

Example 1:

Given nums1 = [1,7,11], nums2 = [2,4,6],  k = 3  Return: [1,2],[1,4],[1,6]  The first 3 pairs are returned from the sequence: [1,2],[1,4],[1,6],[7,2],[7,4],[11,2],[7,6],[11,4],[11,6] 

Example 2:

Given nums1 = [1,1,2], nums2 = [1,2,3],  k = 2  Return: [1,1],[1,1]  The first 2 pairs are returned from the sequence: [1,1],[1,1],[1,2],[2,1],[1,2],[2,2],[1,3],[1,3],[2,3] 

Example 3:

Given nums1 = [1,2], nums2 = [3],  k = 3   Return: [1,3],[2,3]  All possible pairs are returned from the sequence: [1,3],[2,3] 

Analysis:

首先对题目进行分析,为了方便,我用Ai来表示nums1[i],用Bi来表示nums2[i]。我们可以知道[Ai, Bj]永远比[Ai+m, Bj+n]小(m+n>0)。换句话说,如果[Ai, Bj]还没有被排进list,那么[Ai+m, Bj+n]永远不可能被考虑加入list。再换句话说,在任意一个时刻,对于每一个Ai(Bi),最多只需要考虑一种组合。根据这个分析的结果,我的想法是,控制一个集合S,里面包含所有当前会被考虑排进list的组合,每次在其中选出和最小的,并更新这个集合。

为了不那么抽象,我决定用图片辅助说明,不妨有A={A1, A2, A3, A4}以及B={B1, B2, B3, B4}。假设在某一时刻,list={[A1, B1], [A2, B1], [A3, B1], [A1, B2]}。下图中黑色线条代表已经加入list的组合,红色代表当前需要被考虑的组合,也就是S={[A1,B3], [A2, B2], [A1, B3]}。

373_1

那么这个组合怎么得到呢?其实规律很简单,对于每一个Ai,如果在list中和Ai组合的最大数字为Bj,那么只需要考虑[Ai, Bj+1];如果[Ai, B1]进了list,而[Ai + 1, B1]还没有进list,那么需要考虑[Ai + 1, B1]。

根绝这个思路,那么可以设计算法,既然每次我们都需要找S中最小的组合来加入list,很容易就想到用一个PriorityQueue来储存和更新S。到了这一步,基本也就解决了这道题,只需要再考虑edge cases就可以了。

不过依我对于LC test cases的尿性的了解,以及我多年来谨慎小心的作风,我觉得LC很可能在一些小的edge cases来阴我。对于这道题,容易想到,如果我们对于两个int求和(或者做差),可能会有overflow(underflow)的问题,所以我当即为我的机智点了赞,然后哼哧哼哧地把很多地方的int换成了long。然后我就一遍AC,心情大好,心想毕竟是心思缜密的我啊!但是我是一个强迫症,想看看不改成int会有什么结果,然而依旧AC!(╯‵□′)╯︵┻━┻  LC你太让我失望了,test cases写得这么马虎,怎么对得起我付的钱?!还钱!


Read full article from LeetCode 373. Find K Pairs with Smallest Sums | all4win78


Minimax算法研究(TicTacToe) - Univasity's (Share&Save) - ITeye技术网站



Minimax算法研究(TicTacToe) - Univasity's (Share&Save) - ITeye技术网站

Minimax算法 又名极小化极大算法,是一种找出失败的最大可能性中的最小值的算法(即最小化对手的最大得益)。通常以递归形式来实现。


Minimax算法常用于棋类等由两方较量的游戏和程序。该算法是一个零总和算法,即一方要在可选的选项中选择将其优势最大化的选择,另一方则选择令对手 优势最小化的一个,其输赢的总和为0(有点像能量守恒,就像本身两个玩家都有1点,最后输家要将他的1点给赢家,但整体上还是总共有2点)。很多棋类游戏 可以采取此算法,例如tic-tac-toe。


Read full article from Minimax算法研究(TicTacToe) - Univasity's (Share&Save) - ITeye技术网站


546C. Soldier and Cards - 水滴失船 - 博客园



546C. Soldier and Cards - 水滴失船 - 博客园

两个人玩扑克,共n张牌,第一个人k1张,第二个人k2张

给定输入的牌的顺序就是出牌的顺序

每次分别比较两个人牌的第一张,牌上面数字大的赢,把这两张牌给赢的人,并且大的牌放在这个人的牌最下面,另外一张放在上面牌的上面,其他牌在放在这两张牌的上面。

求要pk多少次结束游戏,并记录赢得是哪个人

若出现死循环的情况输出 –1

 

这里可以根据栈或队列

java的程序是根据栈的,比较时候取出栈顶,加入新的两个 数的时候,要先出栈,在入栈,有点麻烦

Python程序是根据队列,在头取出进行比较,加入时候再队尾加入元素,不会出现过度的入栈和出栈的操作

Java 的有增加了队列实现

ArrayList可实现队列的功能,比较简单了


Read full article from 546C. Soldier and Cards - 水滴失船 - 博客园


今際の国の呵呵君: [Data Structure]Segment Tree



今際の国の呵呵君: [Data Structure]Segment Tree

线段树是一种用来解决区间查询和更新的数据结构。对于[m ,n]的区间我们可以把区间递归地二分直到区间的长度变为1。比如储存任意区间和的线段树可以表示为(image from here):


可以看出,segment tree的每个节点要不有两个子节点要不没有子节点,所以segment tree是一个full binary tree,同时也是一个balance binary tree。对于有n个叶节点的binary tree,有n - 1个非叶节点,所以空间复杂度是O(n), 我们实现的时候用array因为更加方便,但是值得注意的是,开2*n的空间是不够的,因为和heap不同,segment tree不是一个complete binary tree所以其中有一些slot我们是用不上的,一般开3*n差不多够用。这里我们以求区间和为例,建树的时候我们只需bottom up不断用子节点的值更新当前节点即可,时间复杂度O(n),代码如下,先不用在意mark,我们之后会讲:

Read full article from 今際の国の呵呵君: [Data Structure]Segment Tree


Home | Dropwizard



Home | Dropwizard

Dropwizard is a Java framework for developing ops-friendly, high-performance, RESTful web services.

Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done.

Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible.


    Read full article from Home | Dropwizard


    Resilient ad serving at Twitter-scale



    Resilient ad serving at Twitter-scale

    Popular events, breaking news, and other happenings around the world drive hundreds of millions of visitors to Twitter, and they generate a huge amount of traffic, often in an unpredictable manner. Advertisers seize these opportunities and react quickly to reach their target audience in real time, resulting in demand surges in the marketplace. In the midst of such variability, Twitter's ad server — our revenue engine — performs ad matching, scoring, and serving at an immense scale. The goal for our ads serving system is to serve queries at Twitter-scale without buckling under load spikes, find the best possible ad for every query, and utilize our resources optimally at all times.


    Read full article from Resilient ad serving at Twitter-scale


    Introducing Omnisearch



    Introducing Omnisearch

    Twitter has more than 310 million monthly active users who send hundreds of millions of Tweets per day, from all over the world. To make sure everyone sees the Tweets that matter most to them, we've been working on features that bring the best content to the forefront. We've refreshed the Home timeline to highlight the best Tweets first, introduced tailored content via Highlights for Android, and personalized the search results and trends pages.

    The performance of these products depends on finding the most relevant Tweets from a large set of candidates, based on a product-specific definition of "relevant." From an engineering point of view, we view these as information retrieval problems where the documents are Tweets and the product is defined by a query. For example, to show you the best Tweets first in your Home timeline, we might first find candidate Tweets from accounts you follow with a search query like this one:


    Read full article from Introducing Omnisearch


    presto、druid、sparkSQL、kylin的对比分析,如性能、架构等,有什么异同? - 知乎



    presto、druid、sparkSQL、kylin的对比分析,如性能、架构等,有什么异同? - 知乎

    其中你列的presto、druid、sparkSQL、kylin可以分为三类。其中presto和spark sql都是解决分布式查询问题,提供SQL查询能力,但数据加载不一定能保证实时。Druid是保证数据实时写入,但查询上不支持SQL,或者说目前只支持部分SQL,我个人觉得适合用于工业大数据,比如一堆传感器实时写数据的场景。Kylin是MOLAP,就是将数据先进行预聚合,然后把多维查询变成了key-value查询。


    Read full article from presto、druid、sparkSQL、kylin的对比分析,如性能、架构等,有什么异同? - 知乎


    621. Task Scheduler | Now to Share



    621. Task Scheduler | Now to Share

    Given a char array representing tasks CPU need to do. It contains capital letters A to Z where different letters represent different tasks.Tasks could be done without original order. Each task could be done in one interval. For each interval, CPU could finish one task or just be idle.

    However, there is a non-negative cooling interval n that means between two same tasks, there must be at least n intervals that CPU are doing different tasks or just be idle.

    You need to return the least number of intervals the CPU will take to finish all the given tasks.


    Read full article from 621. Task Scheduler | Now to Share


    Merge Two Balanced Binary Search Trees - GeeksforGeeks



    Merge Two Balanced Binary Search Trees - GeeksforGeeks

    Merge Two Balanced Binary Search Trees

    You are given two balanced binary search trees e.g., AVL or Red Black Tree. Write a function that merges the two given balanced BSTs into a balanced binary search tree. Let there be m elements in first tree and n elements in the other tree. Your merge function should take O(m+n) time.

    In the following solutions, it is assumed that sizes of trees are also given as input. If the size is not given, then we can get the size by traversing the tree (See this).

    Method 1 (Insert elements of first tree to second)
    Take all elements of first BST one by one, and insert them into the second BST. Inserting an element to a self balancing BST takes Logn time (See this) where n is size of the BST. So time complexity of this method is Log(n) + Log(n+1) … Log(m+n-1). The value of this expression will be between mLogn and mLog(m+n-1). As an optimization, we can pick the smaller tree as first tree.

    Method 2 (Merge Inorder Traversals)
    1) Do inorder traversal of first tree and store the traversal in one temp array arr1[]. This step takes O(m) time.
    2) Do inorder traversal of second tree and store the traversal in another temp array arr2[]. This step takes O(n) time.
    3) The arrays created in step 1 and 2 are sorted arrays. Merge the two sorted arrays into one array of size m + n. This step takes O(m+n) time.
    4) Construct a balanced tree from the merged array using the technique discussed in this post. This step takes O(m+n) time.

    Time complexity of this method is O(m+n) which is better than method 1. This method takes O(m+n) time even if the input BSTs are not balanced.
    Following is C++ implementation of this method.


    Read full article from Merge Two Balanced Binary Search Trees - GeeksforGeeks


    -3. Longest Substring Without Repeating Characters – DeReK – Medium



    -3. Longest Substring Without Repeating Characters – DeReK – Medium

    不是一道难题但是有两个值得注意的坑

    1. 如果用map统计并不重复的char的话,然后选择遇到重复字符就clear map再重置index的话,会超时,因为有大量重复操作
    2. 添加辅助的start指针,长度用i和start之间的差来计算。 这样做就不会超时了。要注意的是start 的重置logic,每当遇到重复字符的时候,start不一定是要设成当前重复字符的原index的下一个,而是要看当前的start和这个index+1谁更大,因为有可能index+1会小于当前的start这样会造成错误。
    3. 最后记得收尾,比较s.length-start的长度是否更长。

    Read full article from -3. Longest Substring Without Repeating Characters – DeReK – Medium


    记一次面试经历 – DeReK – Medium



    记一次面试经历 – DeReK – Medium

    问题就是清除掉多余的括号。字符串里可能会有除括号外的其他字符。返回一个possible就可以不是返回所有的。

    我的思路是一个stack辅助,保存left,遇到right如果stack为空则去掉这个right,如果不为空就pop。最后stack里面的剩下的就是invalid的left需要去掉。这么做有一些catch,第一就是你不能在push left的过程中直接remove right,否则stack存的index都乱了,因为我们动态修改了字符串长度。所以遇到非法right要先标记特殊字符然后最后再删掉。 最后删掉的时候也要注意其实还是在动态修改长度,index要记得再i 减减才可以继续循环(这个我忽略了)。

    然后被问了一个不用stack的解法。没能自己想出来,给了提示。说,其实我们不用去记住left的每一个index。最左边的括号可以考虑是可以match anything的。这是他原话,没有让我明白其实。后来再经过例子的提示,明白了解法。 左边的有几个就需要有几个右边的来match,如果没有match的就是invalid的。那么我们仅需要统计左边和右边括号的数量,然后如果左边的多那就从后往前遍历数组,删掉前n个,n为左右数量差值。如果需要删掉右边的,那就从前往后遍历。

    这个方法真心不错,之前练习的时候从来没想过这个思路,还是蛮开心的能学到东西。希望下一次学东西不是在面试现场啊。。这次能不能通过就看缘分吧,希望能继续下去!


    Read full article from 记一次面试经历 – DeReK – Medium


    -594. Longest Harmonious Subsequence – DeReK – Medium



    -594. Longest Harmonious Subsequence – DeReK – Medium

    简单难度的好题,我觉得不小心很容易跑偏的一道简单难度的题。先说说我自己的跑偏经历

    首先这个是subsequence,所以其实原数组的顺序已经不重要了,或者说顺序根本不是考察的范围所能用得到的,所以我们要么可以排序再分析,要么就是利用HashMap来统计频率,这也是我知道的两个解法的始源

    先说排序的算法:

    1. 排序了,然后怎么分析呢?一开始想的是直接遍历然后看是不是比前一个大一或者相等。我擦这么明显的错误都能犯?不是检测是不是递增,而是相差1的子数组相当于,否则连[1,2,3]这样的数组都不过测试。那就增加个临时变量表示之前的数字?还是不对因为这样还需要回溯,但是又回到哪里去呢?我们只设一个指针来遍历是不够的。那么我们用俩
    2. 两个指针其实就是我们最喜闻乐见的sliding window了。但是这个sliding window又稍微有些不一样。 一般siliding window是超出范围后结算的。但是这道题怎么说是"超出范围"呢?如果我们是决定num[right] 比 num[left] 大超过1是超出范围应该结算的话,那么这个test case就不对了:[1,1,1,4] 这个应该返回0,但是会被错误地返回3。 思来想去这道题的关键是当出现rigth比left的值大1的时候,我们才可以自信的说我们有harmonious subsequence,因为如果是1 1 1 1 或者 1,3,5,8都是不行的。
    3. 于是结算就不是发现窗口扩展不下去了再结算,而是每次只要有效都要结算。动态增加而不是最后结算。这样每次看到right比left的值大1就更新下res并right++。其他如果right比left大于超过1。left++,否则(其实也就是等于1) right++。

    再说HashMap,这个思路完全不一样,先遍历数组统计各数字的频率,然后再遍历一次来看每个数字是否其+1和-1的数也在hashmap里,如果在就把俩人的freq的和跟res比,谁大要谁。然后将当前数字从map里删掉以免重复比较。

    这个思路很简单而且只需要O(n)的时间和O(n)的space。但是其运行速度却比上一个要慢了50%,为什么呢?我能想到的就是hashmap的查找和增删时间的消耗。


    Read full article from -594. Longest Harmonious Subsequence – DeReK – Medium


    深入理解HBase的系统架构 - 正西风落叶下长安 - 博客频道 - CSDN.NET



    深入理解HBase的系统架构 - 正西风落叶下长安 - 博客频道 - CSDN.NET

    物理上来说,hbase是由三种类型的服务器以主从模式构成的。这三种服务器分别是:Region serverHBase HMasterZooKeeper

    其中Region server负责数据的读写服务。用户通过沟通Region server来实现对数据的访问。

    HBase HMaster负责Region的分配及数据库的创建和删除等操作。

    ZooKeeper作为HDFS的一部分,负责维护集群的状态(某台服务器是否在线,服务器之间数据的同步操作及master的选举等)。

    另外,Hadoop DataNode负责存储所有Region Server所管理的数据。HBase中的所有数据都是以HDFS文件的形式存储的。出于使Region server所管理的数据更加本地化的考虑,Region server是根据HDFS DataNode,Region server是根据DataNode分布的。HBase的数据在写入的时候都存储在本地。但当某一个region被移除或被重新分配的时候,就可能产生数据不在本地的情况。这种情况只有在所谓的compaction之后才能解决。

    NameNode负责维护构成文件的所有物理数据块的元信息(metadata)。


    Read full article from 深入理解HBase的系统架构 - 正西风落叶下长安 - 博客频道 - CSDN.NET


    Count of Smaller Numbers After Self | Algorithms Collection



    Count of Smaller Numbers After Self | Algorithms Collection

    Count of Smaller Numbers After Self

    You are given an integer array nums and you have to return a new counts array. The counts array has the property where counts[i] is the number of smaller elements to the right of nums[i].

    +

    Example:

    +

    Given nums = [5, 2, 6, 1]

    +

    • To the right of 5 there are 2 smaller elements (2 and 1).
    • To the right of 2 there is only 1 smaller element (1).
    • To the right of 6 there is 1 smaller element (1).
    • To the right of 1 there is 0 smaller element.

    Return the array [2, 1, 1, 0].

    +


    Read full article from Count of Smaller Numbers After Self | Algorithms Collection


    Solr - User - Can the elevation component work with synonyms?



    Solr - User - Can the elevation component work with synonyms?

    I see two choices here.  The first, which is the only one that I can
    reasonably be sure will work, is to do synonym expansion only at index
    time.  The other is to put the fully expanded query into the elevate
    config.  I do not know if this will actually work -- the situation may
    involve more complexity.

    Often synonyms are only done for one analysis chain, but if that's the
    case, they are usually done for the query side, not the index side.
    Therefore, if the elevate config will do it, the latter option above
    would be preferred.  If you change your synonyms, you might need to also
    change your elevate config.

    Read full article from Solr - User - Can the elevation component work with synonyms?


    Good Code is Adaptable Code | Effective Software Design



    Good Code is Adaptable Code | Effective Software Design

    Some people find the diagram below very funny: it basically says that there is no way to write good code. Of course I do not agree with this. The diagram implies that writing well is a slow process, and that the requirements will have changed before we finish writing our code. I claim that we should write Adaptable Code, so that when the requirements change we will not have to "throw it all out and start over", as appears in the box in the bottom. In this sense, Good Code is synonymous with Adaptable Code. Now the question is: how do we write Adaptable Code? My answer is: Adaptable Design Up Front, an Agile approach to Software Design.

    Read full article from Good Code is Adaptable Code | Effective Software Design


    [KAFKA-4233] StateDirectory fails to create directory if any parent directory does not exist - ASF JIRA



    [KAFKA-4233] StateDirectory fails to create directory if any parent directory does not exist - ASF JIRA

    The method directoryForTask attempts to create a task directory but will silently fail to do so as it calls taskDir.mkdir(); which will only create the leaf directory.

    Calling taskDir.mkdirs(); (note the 's') will create the entire path if any parent directory is missing.

    The constructor also attempts to create a bunch of directories using the former method and should be reviewed as part of any fix.


    Read full article from [KAFKA-4233] StateDirectory fails to create directory if any parent directory does not exist - ASF JIRA


    Monitoring Kafka with Burrow - Part 1 - Hortonworks



    Monitoring Kafka with Burrow - Part 1 - Hortonworks

    Burrow automatically monitors all consumers and every partition that they consume. It does it by consuming the special internal Kafka topic to which consumer offsets are written. Burrow then provides consumer information as a centralized service that is separate from any single consumer. Consumer status is determined by evaluating the consumer's behavior over a sliding window. For each partition, data is recorded to answer the following questions:


    Read full article from Monitoring Kafka with Burrow - Part 1 - Hortonworks


    Not able to find any consumer group. Kafka version 0.10.0 · Issue #144 · linkedin/Burrow · GitHub



    Not able to find any consumer group. Kafka version 0.10.0 · Issue #144 · linkedin/Burrow · GitHub

    Check in Zookeeper (under the /consumers/console-consumer-98785 path, for example). Burrow will skip any groups in Zookeeper that do not have an offsets znode. Burrow does not list all consumer groups, only the consumer groups that are committing offsets.

    Read full article from Not able to find any consumer group. Kafka version 0.10.0 · Issue #144 · linkedin/Burrow · GitHub


    java - how to specify consumer group in Kafka Spark Streaming using direct stream - Stack Overflow



    java - how to specify consumer group in Kafka Spark Streaming using direct stream - Stack Overflow

    The direct stream API use the low level Kafka API, and as so doesn't use consumer groups in anyway. If you want to use consumer groups with Spark Streaming, you'll have to use the receiver based API.


    Read full article from java - how to specify consumer group in Kafka Spark Streaming using direct stream - Stack Overflow


    Example of a dynamic HTML5 datalist control · Raymond Camden



    Example of a dynamic HTML5 datalist control · Raymond Camden

    I've made no secret of being a huge fan of the updates to forms within HTML5. One of the more interesting updates is the datalist control. This control gives you basic autocomplete support for an input field. At its simplest, you create the datalist control options, tie it to a control, and when the user types, they see items that match your initial list. Consider this example.


    Read full article from Example of a dynamic HTML5 datalist control · Raymond Camden


    system_design/Airbnb: Maximum Room Days at master · jxr041100/system_design · GitHub



    system_design/Airbnb: Maximum Room Days at master · jxr041100/system_design · GitHub

    给一个数组代表reservation request,suppose start date, end date back to back. 比如[5,1,1,5]代表如下预定: Jul 1-Jul6 Jul6-Jul7 Jul7-Jul8 jul8-Jul13 当然最开始那个Jul 1是随便选就好的啦。 现在input的意义搞清楚了。还有一个限制,就是退房跟开始不能是同一天,比如如果接了Jul 1-Jul6,Jul6-Jul7就不能接了。那问题就是给你个数组,算算最多能把房子租出去多少天。这个例子的话就是10天。. 1point3acres.com/bbs [4,9,6]=10. visit 1point3acres.com for more.

    Read full article from system_design/Airbnb: Maximum Room Days at master · jxr041100/system_design · GitHub


    system_design/转载:System design papers at master · jxr041100/system_design · GitHub



    system_design/转载:System design papers at master · jxr041100/system_design · GitHub

    Following reading list is selected from the papers I had read in the past 3 years. It will help you to gain a basic knowledge of what happened in current industry and bring you a little sense about how to design a distributed system with certain principles. Feel free to post the good paper you had read in the comments for sharing.:) Concurrency In Search of an Understandable Consensus Algorithm. Diego Ongaro, John Ousterhout, 2013 A Simple Totally Ordered Broadcast Protocol. Benjamin Reed, Flavio P. Junqueira,2008 Paxos Made Live - An Engineering Perspective. Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone, 2007 The Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike Burrows, 2006 Paxos Made Simple. Leslie Lamport, 2001 Impossibility of Distributed Consensus with One Faulty Process. Michael Fischer, Nancy Lynch, Michael Patterson, 1985 The Byzantine Generals Problem. Leslie Lamport, 1982 An Algorithm for Concurrency Control and Recovery in Replicated Distributed Databases. PA Bernstein, N Goodman, 1984 Wait-Free Synchronization. M Herlihy…, 1991 ZooKeeper: Wait-free coordination for Internet-scale systems. P Hunt, M Konar, FP Junqueira, 2010 Consistency Highly Available Transactions: Virtues and Limitations. Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica, 2013 Consistency Tradeoffs in Modern Distributed Database System Design. Daniel J. Abadi, 2012 CAP Twelve Years Later: How the "Rules" Have Changed. Eric Brewer, 2012 Optimistic Replication. Yasushi Saito and Marc Shapiro, 2005 Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. Seth Gilbert, Nancy Lynch, 2002 Harvest, Yield, and Scalable Tolerant Systems. Armando Fox, Eric A. Brewer, 1999 Linearizability: A Correctness Condition for Concurrent Objects. Maurice P. Herlihy, Jeannette M. Wing, 1990 Time, Clocks, and the Ordering of Events in a Distributed System. Leslie Lamport, 1978 Conflict-free data structures A Comprehensive Study of Convergent and Commutative Replicated Data Types. Mark Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011 A Commutative Replicated Data Type For Cooperative Editing. Nuno Preguica, Joan Manuel Marques, Marc Shapiro, Mihai Letia, 2009 CRDTs: Consistency without Concurrency Control. Mihai Letia, Nuno Preguiça, Marc Shapiro, 2009 Conflict-free replicated data types. Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011 Designing a commutative replicated data type. Marc Shapiro, Nuno Preguiça, 2007 Distributed programming Logic and Lattices for Distributed Programming. Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier, 2012 Dedalus: Datalog in Time and Space. Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears, 2011 MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean, Sanjay Ghemawat, 2004 A Note On Distributed Computing. Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant, 1994 An Overview of the Scala Programming Language. M Odersky, P Altherr, V Cremet, B Emir, S Man, 2004 Erlang. Joe Ar mstrong, 2010

    Read full article from system_design/转载:System design papers at master · jxr041100/system_design · GitHub


    [2016 indeed笔试题] Tables and Pieces - 快乐的小鸟_ZJU - 博客频道 - CSDN.NET



    [2016 indeed笔试题] Tables and Pieces - 快乐的小鸟_ZJU - 博客频道 - CSDN.NET

    Question
    There is a 6×6 table. Place a number of pieces on the table to meet the following conditions:
    There is to be either zero or one piece in each square of the table.
    Each column in the table is to have exactly three pieces.
    Each row in the table is to have exactly three pieces.
    There may already be pieces in some of the squares of the table. When si,j is 'o', there is already a piece in the square at the jth column of the ith row. When that is not the case, si,j is '.' and there is no piece in that square. Find out the number of ways to place pieces in the empty squares, which satisfies the conditions. Two ways are considered different if they have at least one square which contains a piece in one way and doesn't contain a piece in the other way.

    Read full article from [2016 indeed笔试题] Tables and Pieces - 快乐的小鸟_ZJU - 博客频道 - CSDN.NET


    Google面试题:扔玻璃珠 - 程序园



    Google面试题:扔玻璃珠 - 程序园

    某幢大楼有100层。你手里有两颗一模一样的玻璃珠。当你拿着玻璃珠在某一层往下扔的时候,一定会有两个结果,玻璃珠碎了或者没碎。这幢大楼有个临界楼层。低于它的楼层,往下扔玻璃珠,玻璃珠不会碎,等于或高于它的楼层,扔下玻璃珠,玻璃珠一定会碎。玻璃珠碎了就不能再扔。现在让你设计一种方式,使得在该方式下,最坏的情况扔的次数比其他任何方式最坏的次数都少。也就是设计一种最有效的方式。
    首先,为了保存下一颗玻璃珠自己玩,就采用最笨的办法吧:从第一层开始试,每次增加一层,当哪一层扔下玻璃珠后碎掉了,也就知道了。不过最坏的情况扔的次数可能为100。
    当然,为了这一颗玻璃珠代价也高了点,还是采取另外一种办法吧。随便挑一层,假如为N层,扔下去后,如果碎了,那就只能从第一层开始试了,最坏的情况可能为N。假如没碎,就一次增加一层继续扔吧,这时最坏的情况为100-N。也就是说,采用这种办法,最坏的情况为max{N, 100-N+1}。之所以要加一,是因为第一次是从第N层开始扔。
    不过还是觉得不够好,运气好的话,挑到的N可能刚好是临界楼层,运气不好的话,要扔的次数还是很多。不过回过头看看第二种方式,有没有什么发现。假如没摔的话,不如不要一次增加一层继续扔吧,而是采取另外一种方式:把问题转换为100-N,在这里面找临界楼层,这样不就把问题转换成用递归的方式来解决吗?看下面:

    Read full article from Google面试题:扔玻璃珠 - 程序园


    Prioritizing your document in search results - Administrating Solr



    Prioritizing your document in search results - Administrating Solr

    You might come across situations wherein you need to promote some of your products and would like to find those on top of other documents in the search result list. Additionally, you might also need to have such products flexible and define exclusive queries applicable only to these products and not to the others. To achieve so, you might think of options such as boosting, index time boosting, or probably some special field. Don't worry! Solr will help you out via this section using a robust component known as QueryElevationComponent.


    Read full article from Prioritizing your document in search results - Administrating Solr


    how to configure solr / lucene to perform levenshtein edit distance searching? - Stack Overflow



    how to configure solr / lucene to perform levenshtein edit distance searching? - Stack Overflow

    how to configure SOLR to perform levensthein / jaro-winkler / n-gram searches with scores returned and without doing additional stuff like tf, idf, boost and so included?

    You've got some solutions of how to obtain the desired results but none actually answeres your question.

    q={!func}strdist("webspace",term,edit) will overwrite the default document scoring with the Levenstein distance and q={!func}strdist("webspace",term,jw) does the same for Jaro-Winkler.

    The sorting suggested above will work fine in most cases but it doesn't change the scoring function, it just sorts the results obtained with the scoring method you want to avoid. This might lead to different results and the order of the groups might not be the same.

    To see which ones would fit best a &debugQuery=true might be enough.


    Read full article from how to configure solr / lucene to perform levenshtein edit distance searching? - Stack Overflow


    Salmon Run: Near Duplicate Detection using MinHashing and Solr



    Salmon Run: Near Duplicate Detection using MinHashing and Solr

    Recently there was a discussion on detecting near duplicate documents on LinkedIn (login required). Someone suggested using NGrams, which prompted me to suggest using More Like This (MLT) queries on Solr using shingles for terms (the general idea of querying with shingles is explained here).

    Turns out that this was a bit naive. For one thing, the Solr ShingleFilter works differently than you would expect. For example, for the phrase "lipstick on a pig", you would expect 3-grams to be the set of tokens {"lipstick on a", "on a pig"}. With Solr's Shinglefilter, for minShingleSize=3, maxShingleSize=3 and outputUnigrams=true, you get {"lipstick|lipstick on a", "on|on a pig"}, ie synonym tokens with the unigram as anchor and the n-gram(s) as synonyms. If you set outputUnigrams=false, no shingles are produced because there is no anchor for the synonym term. Further, since MLT works by matching tokens found by analyzing a query document with tokens found in index documents, the only way to implement my original suggestion would be a custom ShingleFilter.

    While I've built custom Solr components in the past, in hindsight I think its generally a bad idea for two reasons. First the Lucene and Solr projects are quite fast moving and APIs get changed frequently. Second, custom extension points are often regarded as "expert" and there is less concern for backward compatibility for these APIs than the user-centric ones. I guess the expectation is that people doing customizations are responsible for being on top of API changes as they occur. Maybe its true in general, but for me its a potential annoyance I have to deal with each time I upgrade.

    In any case, I figured out a user-centric approach to do this. Instead of analyzing the content into shingles (n-grams), we decompose the content into shingles at index time and store them in a multi-valued string (no tokenization beyond the shingling) field. When trying to find near duplicates, we search using a boolean query of shingles built from the query document. This returns a ranked list of documents where topmost document has the most shingles matching the shingles of the query document, the next one less so, and so on.

    Read full article from Salmon Run: Near Duplicate Detection using MinHashing and Solr


    Levenshtein automata can be simple and fast



    Levenshtein automata can be simple and fast

    A few days ago somebody brought up an old blog post about Lucene's fuzzy search. In this blog post Michael McCandless describes how they built Levenshtein automata based on the paper Fast String Correction with Levenshtein-Automata. This proved quite difficult:

    At first he built a simple prototype, explicitly unioning the separate DFAs that allow for up to N insertions, deletions and substitutions. But, unfortunately, just building that DFA (let alone then intersecting it with the terms in the index), was too slow.

    Fortunately, after some Googling, he discovered a paper, by Klaus Schulz and Stoyan Mihov (now famous among the Lucene/Solr committers!) detailing an efficient algorithm for building the Levenshtein Automaton from a given base term and max edit distance. All he had to do is code it up! It's just software after all. Somehow, he roped Mark Miller, another Lucene/Solr committer, into helping him do this.

    Unfortunately, the paper was nearly unintelligible! It's 67 pages, filled with all sorts of equations, Greek symbols, definitions, propositions, lemmas, proofs. It uses scary concepts like Subsumption Triangles, along with beautiful yet still unintelligible diagrams. Really the paper may as well have been written in Latin.

    Much coffee and beer was consumed, sometimes simultaneously. Many hours were spent on IRC, staying up all night, with Mark and Robert carrying on long conversations, which none of the rest of us could understand, trying desperately to decode the paper and turn it into Java code. Weeks went by like this and they actually had made some good initial progress, managing to loosely crack the paper to the point where they had a test implementation of the N=1 case, and it seemed to work. But generalizing that to the N=2 case was… daunting.


    Read full article from Levenshtein automata can be simple and fast


    Scalable thoughts...: Configure Solr Did You Mean



    Scalable thoughts...: Configure Solr Did You Mean

    Apache Solr is one of my top favorite tools. This open-source enterprise search engine is very powerful, scalable, resilient and functional. In this article, my purpose is to show how to configure 'did you mean' feature. I used to say to coworkers that 'did you mean' is the most cost-benefit Solr feature. This happens because, as you will see, it's very easy to setup and provide to target user the feeling that your search mechanism is actually very smart.

    Some background

    Solr 'did you mean' support can be configured using SpellCheckComponent which is built on top Lucene implementations of org.apache.lucene.search.spell.StringDistance interface. This interface provides the getDistance(String string1, String string2) method. These implementations just compare two parameterized strings and returns a (double) factor. If factor tends to 1.0, then words is more similar to each other.

    Read full article from Scalable thoughts...: Configure Solr Did You Mean


    Solr's mm parameter - Explanation of Min Number Should Match - Vijay Mhaskar's Blog



    Solr's mm parameter - Explanation of Min Number Should Match - Vijay Mhaskar's Blog

    This article explains the format used for specifying the "Min Number Should Match" criteria of the BooleanQuery objects built by the DisMaxRequestHandler.  Using this it is possible to specify a percentage of query words (or blocks) that should appear in a document.

    There are 3 types of "clauses" that Solr (Lucene) knows about: mandatory, prohibited, and 'optional'.  By default all words or phrases specified in the "q" param are treated as "optional" clauses unless they are preceeded by a "+" or a "-". When dealing with these "optional" clauses, the "mm" option makes it possible to say that a certain minimum number of those clauses must match (mm).


    Read full article from Solr's mm parameter - Explanation of Min Number Should Match - Vijay Mhaskar's Blog


    Google 面试题 - 快乐的小鸟_ZJU - 博客频道 - CSDN.NET



    Google 面试题 - 快乐的小鸟_ZJU - 博客频道 - CSDN.NET

    题目一  :题目就是给你一个matrix,里面的数字代表bar的高度,现在说降雨量如果高于bar的高度水可以漫过去,降雨量0开始每天+1这样,问最早第几天水可以有一条路径从src漫到dst。即 起点到终点的所有路径中,求路径最大点的最小值。

    解法: BFS+贪心。  从起点开始,把能 access 的点都加到一个 heap 中.每次取 heap 头,再把取到的点周围能 access 的都加进 heap.已取到的就标记 visited.每次取 heap 头就要更新当前最大值.当遇到 dst 的时候就可以返回这个最大值了.


    题目二  : 题目就是单链表版addOne,然后要求时间O(n),空间O(1)


    解法: two scann O(n)...找到连续的9的部分 如果这部分在end of the list,记录这部分的初始位置,然而第二遍从这个位置开始+1.


    Read full article from Google 面试题 - 快乐的小鸟_ZJU - 博客频道 - CSDN.NET


    Getting into Silicon Valley's Tech Giants as a Software Developer | Azra Rabbani | Pulse | LinkedIn



    Getting into Silicon Valley's Tech Giants as a Software Developer | Azra Rabbani | Pulse | LinkedIn

    In Silicon Valley, if you are trying to get a job as a Software Developer then you must prove yourself in certain areas, no matter at what level of experience you are. The topic of this article is the evaluation process and the skills required to crack the series of interviews a candidate should expect. Look at the following:

    Skills

    1. Data structures and Algorithms
    2. Software Analysis and Design
    3. Knowledge you gain from your past Experiences
    4. Your Attitude


    Read full article from Getting into Silicon Valley's Tech Giants as a Software Developer | Azra Rabbani | Pulse | LinkedIn


    A Map of the Bay Area's Top Web Companies on Tripline



    A Map of the Bay Area's Top Web Companies on Tripline

    This is a map of some of the most influential and relevant American web related companies located in the San Francisco Bay Area.

    Read full article from A Map of the Bay Area's Top Web Companies on Tripline


    Getting into Silicon Valley's Tech Giants as a Software Developer | Azra Rabbani | Pulse | LinkedIn



    Getting into Silicon Valley's Tech Giants as a Software Developer | Azra Rabbani | Pulse | LinkedIn

    1. Data structures and Algorithms
    2. Software Analysis and Design
    3. Knowledge you gain from your past Experiences
    4. Your Attitude

    Above are the main areas for evaluation, If you are applying for a Full Time Role. Although for a Contract role there can be many other criterion or none; depending upon the preferences of the hiring manager. Sometimes only one criteria works i.e. if your vendor have a good "Working Relationship" with the hiring manager then you don't have to be really good :D and sometimes no matter how good you are, you will not be able to get the contract because someone was already SELECTED and you were called just to up the count of the candidates appeared in the hiring process for that position huh... This might sound negative but unfortunately this is the reality we face here. Anyways we are not covering this today but we can discuss it in a separate and more focused post later ;)


    Read full article from Getting into Silicon Valley's Tech Giants as a Software Developer | Azra Rabbani | Pulse | LinkedIn


    subject:"Re\: Kafka Streams Failed to rebalance error"



    subject:"Re\: Kafka Streams Failed to rebalance error"

    I `CommitFailedException` can still occur if an instance misses a rebalance. I thinks, this is two different problems. Having said this, Streams should recover from `CommitFailedException` automatically by triggering another rebalance afterwards. Nevertheless, we know that there is an issue with rebalancing, as if state recreation takes long, an rebalancing instance might miss another rebalance... This is one of the top priority things we want to work on, after 0.11 was released.

    Read full article from subject:"Re\: Kafka Streams Failed to rebalance error"


    Config Sets | Apache Solr Reference Guide



    Config Sets | Apache Solr Reference Guide

    On a multicore Solr instance, you may find that you want to share configuration between a number of different cores. You can achieve this using named configsets, which are essentially shared configuration directories stored under a configurable configset base directory.

    To create a configset, simply add a new directory under the configset base directory. The configset will be identified by the name of this directory. Then into this copy the config directory you want to share. The structure should look something like this:


    Read full article from Config Sets | Apache Solr Reference Guide


    Solr - User - Performance warning overlapping onDeckSearchers



    Solr - User - Performance warning overlapping onDeckSearchers

    You can find your answers a lot faster by inspecting the Solr logs.
    You'll see a commit message, messages about opening new searchers,
    autowarming, etc. All that is in the log file, along with timestamps.
    So rather than ask them an open ended question, you can say something
    like "I see commits coming through every N seconds. This is an
    anti-pattern. Fix this" ;)......

    Read full article from Solr - User - Performance warning overlapping onDeckSearchers


    FAQ - Solr logging "PERFORMANCE WARNING: Overlapping onDeckSearchers" and its meaning – DataStax Support



    FAQ - Solr logging "PERFORMANCE WARNING: Overlapping onDeckSearchers" and its meaning – DataStax Support

    When a commit is issued to a Solr core, it makes index changes visible to new search requests. Commits may come from an application or an auto-commit. A "normal" commit in DSE is usually more often than not from an auto commit in which is, as outlined here, configured in the solr config file.

    Each time a commit is issued a new searcher object is created. When there are too many searcher objects this warning will be observed.

    Also if the configuration is such that a searcher has pre-warming queries, this can delay the start time meaning that the searcher is still starting up when a new commit comes in.


    Read full article from FAQ - Solr logging "PERFORMANCE WARNING: Overlapping onDeckSearchers" and its meaning – DataStax Support


    solr4 - SOLR autoCommit vs autoSoftCommit - Stack Overflow



    solr4 - SOLR autoCommit vs autoSoftCommit - Stack Overflow

    You have openSearcher=false for hard commits. Which means that even though the commit happened, the searcher has not been restarted and cannot see the changes. Try changing that setting and you will not need soft commit.

    SoftCommit does reopen the searcher. So if you have both sections, soft commit shows new changes (even if they are not hard-committed) and - as configured - hard commit saves them to disk, but does not change visibility.

    This allows to put soft commit to 1 second and have documents show up quickly and have hard commit happen less frequently.


    Read full article from solr4 - SOLR autoCommit vs autoSoftCommit - Stack Overflow


    Cassandra at Scale: The Problem with Secondary Indexes | Pantheon



    Cassandra at Scale: The Problem with Secondary Indexes | Pantheon

    Maybe you're a seasoned Cassandra veteran, or maybe you're someone who's stepping out into the world of NoSQL for the first time—and Cassandra is your first step. Maybe you're well versed in the problems that secondary indexes pose, or maybe you're looking for best practices before you invest too much time and effort into including Cassandra in your stack. The truth is, if you're using Cassandra or planning on using it to retrieve data efficiently, there are some limits and caveats of indexes you should be aware of.

    Here at Pantheon, we like to push things to their limits. Whether it's packing thousands of containers onto a single machine, or optimizing our PHP internals to serve up the fastest WordPress sites on the internet, we put our stack under intense pressure. Sometimes that yields diamonds, and other times that yields cracks and breakages. This article is about the latter situation—how we reached the limit of Cassandra's secondary indexes, and what we did about it. But first, some background.


    Read full article from Cassandra at Scale: The Problem with Secondary Indexes | Pantheon


    Our Solution to Solr Multiterm Synonyms: The Match Query Parser



    Our Solution to Solr Multiterm Synonyms: The Match Query Parser

    You have probably heard us talk about Solr multiterm synonyms a lot! It's a big problem that prevents a lot of organizations from getting reasonable search relevance out of Solr. The problem has been described as the "sea biscuit" problem. Because, if you have a synonyms.txt file like:

    sea biscuit => seabiscuit  

    … you unfortunately won't get what you expect at query time. This is because most Solr query parsers break up query strings on spaces before running query-time analysis. If you search for "sea biscuit" Solr sees this first as [sea] OR [biscuit]. The required analysis step then happens on each individual clause – first on just "sea" then on just "biscuit." Without analysis seeing a "sea" right before a "biscuit", query time analysis doesn't recognize the synonym listed above. Bummer.


    Read full article from Our Solution to Solr Multiterm Synonyms: The Match Query Parser


    [SOLR-10290] New Publication Model for Solr Reference Guide - ASF JIRA



    [SOLR-10290] New Publication Model for Solr Reference Guide - ASF JIRA

    The current Solr Ref Guide is hosted at cwiki.apache.org, a Confluence installation. There are numerous reasons to be dissatisfied with the current setup, a few of which are:

    • Confluence as a product is no longer designed for our use case and our type of content.
    • The writing/editing experience is painful and a barrier for all users, who need to learn a lot of Confluence-specific syntax just to help out with some content.
    • Non-committers can't really help improve documentation except to point out problems and hope someone else fixes them.
    • We really can't maintain online documentation for different versions. Users on versions other than the one that hasn't been released yet are only given a PDF to work with.

    I made a proposal in Aug 2016 (email link) to move the Ref Guide from Confluence to a new system that relies on asciidoc-formatted text files integrated with the Solr source code.

    This is an umbrella issue for the sub-tasks and related decisions to make that proposal a reality. A lot of work has already been done as part of a proof-of-concept, but there are many things left to do. Some of the items to be completed include:

    • Creation of a branch and moving the early POC work I've done to the project
    • Conversion the content and clean up of unavoidable post-conversion issues
    • Decisions about location of source files, branching strategy and hosting for online versions
    • Meta-documentation for publication process, beginner tips, etc. (whatever else people need or want)
    • Integration of build processes with the broader project

    For reference, a demo of what the new ref guide might look like is currently online at http://people.apache.org/~ctargett/RefGuidePOC/.


    Read full article from [SOLR-10290] New Publication Model for Solr Reference Guide - ASF JIRA


    Gracenote Developer Video + Sports APIs I/O Docs



    Gracenote Developer Video + Sports APIs I/O Docs

    http://spe1.tmsimg.com/assets/p8729531_b1t_v5_aa.jpg

    Read full article from Gracenote Developer Video + Sports APIs I/O Docs


    [SOLR-2242] Get distinct count of names for a facet field - ASF JIRA



    [SOLR-2242] Get distinct count of names for a facet field - ASF JIRA

    When returning facet.field=<name of field> you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1.

    The feature is called "namedistinct". Here is an example:

    Parameters:
    facet.numTerms or f.<field>.facet.numTerms = true (default is false) - turn on distinct counting of terms

    facet.field - the field to count the terms
    It creates a new section in the facet section...


    Read full article from [SOLR-2242] Get distinct count of names for a facet field - ASF JIRA


    Developing a Domain Specific Language in Gremlin | DataStax



    Developing a Domain Specific Language in Gremlin | DataStax

    An earlier Aurelius blog post entitled "Educating the Planet with Pearson," spoke of the OpenClass platform and Titan's role in Pearson's goal of "providing an education to anyone, anywhere on the planet". It described the educational domain space and provided a high-level explanation of some of the conceptual entity and relationship types in the graph. For example, the graph modeled students enrolling in courses, people discussing content, content referencing concepts and other entities relating to each other in different ways. When thinking in graph terminology, these "conceptual entity and relationship types" are expressed as vertices (e.g. dots, nodes) and edges (e.g. lines, relationships), so in essence, the domain model embeds conceptual meaning into graph elements.
    domain-over-graph2At Pearson, the OpenClass domain model is extended into a programmatic construct, a DSL based on Gremlin, which abstracts away the language of the graph. Engineers and analysts can then ask questions of the graph in their educational domain language, as opposed to translating those familiar terms into the language of vertices and edges. The OpenClass DSL defines the graph schema, extends the Gremlin graph traversal language into the language of education, provides standalone functions that operate over these extensions, and exposes algorithms that are developed from those extensions and functions. Together these components form a coarsely-grained API which helps bring general accessibility to complex graph traversals.

    Read full article from Developing a Domain Specific Language in Gremlin | DataStax


    Supplejack : Schema DSL (Domain Specific Language)



    Supplejack : Schema DSL (Domain Specific Language)

    Fields

    Fields are defined using the following syntax:

    type name options do    block  end  

    where:

    • type is the type of the field. Must be one of [string, integer, datetime, boolean].
    • name can be any valid ruby identifier (i.e. must not start with a number or a Ruby reserved word).
    • options (all optional):
      • search_boost is an integer passed to Sunspot/Solr to increase the search relevance by the given factor. Default: 1.
      • search_as determines whether the field is can be searched a filter, fulltext, or both. Valid values are [:filter], [:fulltext] or [:filter, :fulltext]. Default: [].
      • store is a boolean value which determines whether the field is stored in the Mongo database or not. Default: true.
      • multi_value is a boolean value which determines whether the value is stored as an array or single value. Default: false.
      • solr_name is a string which is the name of the field in Solr. Default: field's name.
    • block
      • search_value is a Ruby Proc which produces the value which should be indexed by Solr. The block is executed when the field is indexed. Must be the same type as the field's type. Default: nil.

    Read full article from Supplejack : Schema DSL (Domain Specific Language)


    Why Auto Increment Is A Terrible Idea - Clever Cloud Blog



    Why Auto Increment Is A Terrible Idea - Clever Cloud Blog

    Use UUIDs as primary keys. They can be freely exposed without disclosing sensitive information, they are not predictable and they are performant.


    Faceted Search - Apache Solr Reference Guide - Apache Software Foundation



    Faceted Search - Apache Solr Reference Guide - Apache Software Foundation

    Facet & Analytics Module

    The new Facet & Analytics Module is a rewrite of Solr's previous faceting capabilities, with the following goals:

    • First class native JSON API to control faceting and analytics
    • First class integrated analytics support
    • Nest any facet type under any other facet type (such as range facet, field facet, query facet)
    • Ability to sort facet buckets by any calculated metric
    • Easier programmatic construction of complex nested facet commands
    • The structured nature of nested sub-facets are more naturally expressed in a nested structure like JSON rather than the flat structure that normal query parameters provide.
    • Support a much more canonical response format that is easier for clients to parse
    • Support a cleaner way to implement distributed faceting
    • Support better integration with other search features
    • Full integration with the JSON Request API

    Read full article from Faceted Search - Apache Solr Reference Guide - Apache Software Foundation


    Solr 6.5 Features - Solr 'n Stuff



    Solr 6.5 Features - Solr 'n Stuff

    Field Type related changes

    • PointFields (fixed-width multi-dimensional numeric & binary types enabling fast range search) are now supported
    • In-place updates to numeric docValues fields (single valued, non-stored, non-indexed) supported using atomic update syntax
    • A new LatLonPointSpatialField that uses points or doc values for query
    • It is now possible to declare a field as "large" in order to bypass the document cache

    Query

    • New sow=false request param (split-on-whitespace) for edismax & standard query parsers enables query-time multi-term synonyms
    • XML QueryParser (defType=xmlparser) now supports span queries

    Highlighting

    • hl.maxAnalyzedChars now have consistent default across highlighters
    • UnifiedSolrHighlighter and PostingsSolrHighlighter now support CustomSeparatorBreakIterator

    Streaming Expressions

    • Scoring formula is adjusted for the scoreNodes function
    • Calcite Planner now applies constant Reduction Rules to optimize plans
    • A new significantTerms Streaming Expression that is able to extract the significant terms in an index
    • StreamHandler is now able to use runtimeLib jars
    • Arithmetic operations are added to the SelectStream

    Read full article from Solr 6.5 Features - Solr 'n Stuff


    'Alt-right' Portland rally sees skirmishes with counter-protesters | US news | The Guardian



    'Alt-right' Portland rally sees skirmishes with counter-protesters | US news | The Guardian

    'Alt-right' Portland rally sees skirmishes with counter-protesters Far-right and 'anti-fascist' groups face off with each other and law enforcement, a little over a week after two men died in a racially charged stabbing 'Alt-right' Portland rally sees skirmishes with counter-protesters Far-right and 'anti-fascist' groups face off with each other and law enforcement, a little over a week after two men died in a racially charged stabbing First published on Sunday 4 June 2017 16.18 EDT Tension was high in Portland on Sunday as "alt-right" and opposing "antifa" activists gathered around a rightwing rally, a little over a week after two men were killed and one wounded in a stabbing attack on city transportation. Is there a neo-Nazi storm brewing in Trump country? Read more Jeremy Christian , 35, was charged in the attack, in which Rick Best, 53, and Taliesin Myrddin Namkai Meche, 23, were killed after they intervened to help two young women who were the target of racial abuse.

    Read full article from 'Alt-right' Portland rally sees skirmishes with counter-protesters | US news | The Guardian


    What Happened to chrome://plugins in Google Chrome?



    What Happened to chrome://plugins in Google Chrome?

    SuperUser reader Jedi wants to know what happened to chrome://plugins in Google Chrome:

    Until recently, Google Chrome allowed a person to enable or disable plugins (like Adobe Flash Player) using the chrome://plugins page. But it seems that the page no longer exists (as of Google Chrome 57.0.2987.98). So how do I access Google Chrome's plugins now?

    What happened to chrome://plugins in Google Chrome?

    The Answer

    SuperUser contributor Steven has the answer for us:

    The chrome://plugins page was removed in Google Chrome, version 57.

    • Objective: Remove the chrome://plugins page, moving configuration for the last remaining plugin, Adobe Flash Player, to its own explicit place in content settings (including an option, in settings, to disable it).

    Source: Chromium – Issue-615738: Deprecate chrome://plugins

    Use chrome://settings/content to control when Adobe Flash content is displayed and chrome://components to display the version of Adobe Flash Player installed.


    Read full article from What Happened to chrome://plugins in Google Chrome?


    615738 - Deprecate chrome://plugins - chromium - Monorail



    615738 - Deprecate chrome://plugins - chromium - Monorail

    Objective: Remove the chrome://plugins page, moving configuration for the last remaining plugin, Flash Player, to it's own explicit place in content settings (including an option, in settings, to disable). Rationale: This change should make the controls for Flash Player more discoverable, in settings (i.e. most users probably know what Flash is, but not what a "plugin" is), and will consolidate modes related to Flash Player (e.g. Plugin Power Savings mode), into a single location. Supporting Rationale: Since we've deprecated NPAPI, Flash Player is now our last remaining plugin (i.e. 3rd party binary modules). Those remaining "plugins" (PDF, CDM, etc...) started life as 3rd party code, but have since been built and maintained by Google... and at this point are effectively just specialized libraries for Chrome.

    Read full article from 615738 - Deprecate chrome://plugins - chromium - Monorail


    WeChat Verification Code Doesn't Work | How to Chat Online



    WeChat Verification Code Doesn't Work | How to Chat Online

    2-) Clear all data of WeChat from your mobile and uninstall application. Restart your phone and install WeChat again. Register to WeChat and request another verification code.

    3-) If the second step couldn't fix your problem at all, then you will need to contact to WeChat Staff. There are various ways to do it. You can send a feedback on WeChat page of your OS app store or you can contact WeChat Staff via official website of the application. There's also a contact mail of them on the site. It will be easier for you to fix your problem with sending an e-mail.


    Read full article from WeChat Verification Code Doesn't Work | How to Chat Online


    Labels

    Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

    Popular Posts