All About Programming: December 2018

今際の国の呵呵君: [System Design]Location Based Service

Problem: Design a location based service like Uber?

Scenario:
Rider should be able to request a service, an available driver should be matched
We should be able to keep track of driver's location->drivers report location every 4 seconds(based on this article)

Services:

Geo Service: stores drivers location information, handle queriers like "give me drives within 2kms of this location". Since drivers report location information every 4 seconds, it will be write heavy. Assume 0.2M active drivers per day, QPS will be 200000 / 4 = 50K, Peak QPS might be 50 * 3 = 150K, writes need to be efficient

Dispatch Service: To handle riders request, matches the driver and keep track of the trip information

Read full article from 今際の国の呵呵君: [System Design]Location Based Service

今際の国の呵呵君: [System Design]TypeAhead

Scenario: Design a typeahead service which support following functionalities:

Be able to collect the data and keep track of the sentences users has typed recently
For every word user typed, return the top k hot sentences start with this prefix

Services:

Data Collection Service: collect the data form users and do preprocessing to get frequency of each service

Query Service: Store hot words for every prefix and return the top k hot words for given query request. Let's assume there are 1B active user and each of them do 10 searches daily and average sentence length is 10(every time user type a char we may need to search for the new prefix), QPS = 1B * 10 * 10 / 86400 = 1M, peak QPS = 2 * 1M = 2M. It is a pretty ready heavy service.

Storage:

Where do we get the data?

We can get the data by logging, each time user typed a full sentence and send a request, we will log the data in the disk, like <user_id, sentence, timestamp>. The log information can be stored in distributed file system like GFS, since usually the new generated data every day is really large. Let's use the estimated number above, assume each log entry will be 20 bytes the data generated every day will be 1B * 10 * 20 Bytes = 200G. And if we want to the data for past two weeks, it will be 200G * 14 = 2.8 T.

After we have the log data, we can perform the map reduce to get the <sentence, frequency> pair we want, since it is a simple key-value pair, we can store it in distributed nosql database like BigTable.

It will still be pretty expensive to store this large amount of data, actually we don't want the service to be that accurate, we just want to know the general trend. This is why we need to bring up probabilistic logging, every time we can generate a random number between [0, 1000) and if the number equals to 0, we log this data, so generally we are logging with 1 / 1000 probability. And the data generate every day can be reduced from 200G to 200M.

Read full article from 今際の国の呵呵君: [System Design]TypeAhead

LeetCode862. Shortest Subarray with Sum at Least K - f91og - 博客园

上面的出入队列顺序是这样的：首先对于每个索引i，对应的是B[i]，将这个索引作为y位置来考虑，因为双端队列保持的索引是的B[i]是递增的，为了从最大处逼近K，我们从队头依次取索引出来计算：

B[i] - B[d.getFirst()]

如果比K大，那么则要找这其中距离索引i最近的那一个：

res = Math.min(res, i - d.pollFirst());

然后是队列要keep indexes of increasing B[i]，索引判断当前的B[i]是否大于队列尾部的索引处的

B[i] <= B[d.getLast()

如果不能构成递增，根据之前的分析，当前y所在的位置i的最优解opt(y)一定不会是在前面递增的部分取，所以队列要从后往前一个个弹出队尾直至能和B[i]构成递增序列。

Read full article from LeetCode862. Shortest Subarray with Sum at Least K - f91og - 博客园

LeetCode691. Stickers to Spell Word - f91og - 博客园

We are given N different types of stickers. Each sticker has a lowercase English word on it.

You would like to spell out the given target string by cutting individual letters from your collection of stickers and rearranging them.

You can use each sticker more than once if you want, and you have infinite quantities of each sticker.

What is the minimum number of stickers that you need to spell out the target? If the task is impossible, return -1.

Example 1:

Input:

["with", "example", "science"], "thehat"

Output:

Explanation:

We can use 2 "with" stickers, and 1 "example" sticker. After cutting and rearrange the letters of those stickers, we can form the target "thehat". Also, this is the minimum number of stickers necessary to form the target string.

Example 2:

Input:

["notice", "possible"], "basicbasic"

Output:

-1

Explanation:

We can't form the target "basicbasic" from cutting letters from the given stickers.

分析

乍一看应该是个dp问题，但是不知道怎么建模，还是多做题目来积累经验吧。

首先是要明白题目的意思，从绕来绕去的描述中寻找题目真正要求的是什么。这题的输入是一组字符串，和一个目标字符串，对于给定的字符串可以取它任意的分割部分，同一个字符串可以使用多次，以此来组成目标字符串，求从输入的那组字符串中取最少的字符串个树来组成目标字符串。很繁琐的描述，剥离下无用信息其实就是从给定一个字符串集合S,以及一个目标字符串T.求使用S中字符串的最小个数，能够满足T需要的字符数。

利用数组下标与值的映射关系来存储每个sticker以及target string中的字符及其数目，并且使用回溯来遍历所有的可能。

看了下递归遍历的代码，还是不是很懂，还是去看了下dp的解法。利用dp加上回溯递归的方法遍历求解所有的可能。对于每一个sticker，应用到组成target string的话，组成了一部分，还剩下了一部分，这样递归求解即可。

利用一个map存储dp数组，key是要组成的target string，值是最优解，也就最少使用多少个sticker。

dp[s] is the minimum stickers required for string s (-1 if impossible). Note s is sorted. clearly, dp[""] = 0, and the problem asks for dp[target].

状态转移方程：

dp[s] = min(1+dp[reduced_s]) for all stickers,  here reduced_s is a new string after certain sticker applied

Read full article from LeetCode691. Stickers to Spell Word - f91og - 博客园

Math related problems on LeetCode – Algorithms and Leetcode – Medium

Greatest Common Dividor (GCD)
Prim Numbers
Ugly Number
The Smallest Larger Number

1. Greatest common divisor (GCD)

GCD (Greatest Common Divisor) or HCF (Highest Common Factor) of two numbers is the largest number that divides both of them.

Read full article from Math related problems on LeetCode – Algorithms and Leetcode – Medium

Zhiwen's Technical Note: [Leetcode] 483. Smallest Good Base

For an integer n, we call k>=2 a good base of n, if all digits of n base k are 1.

Now given a string representing n, you should return the smallest good base of n in string format.

Example 1:

Input: "13"
Output: "3"
Explanation: 13 base 3 is 111.

Example 2:

Input: "4681"
Output: "8"
Explanation: 4681 base 8 is 11111.

Example 3:

Input: "1000000000000000000"
Output: "999999999999999999"
Explanation: 1000000000000000000 base 999999999999999999 is 11.

Note:

The range of n is [3, 10^18].

The string representing n is always valid and will not have leading zeros.

Read full article from Zhiwen's Technical Note: [Leetcode] 483. Smallest Good Base

[leetcode]710. Random Pick with Blacklist Problem Solving Report - Programmer Sought

Given N, and a blacklist array, each time a random number is selected from [0, N), the random number cannot exist in the blacklist.
Idea:
The number of random numbers in the final result is definitelyN-len(blacklist), so we have to choose the last random value, one is to map the value in the blacklist to the whitelist
The number of random numbers in the result isN-len(blacklist), then we must first take the array [0, N] beforeN-len(blacklist)Elements, but before thisN-len(blacklist)What if there are elements in the blacklist? We use a pointer to scan the array from back to front. If the latter value is not in the blacklist and the previous value is in the blacklist, replace the previous one with the following value.
After the above operation, we will range from randomNShrinked toN- len(blacklist)In the range of size, save it with a map, the key is1~N-len(blacklist), the value is the number in the white list

Read full article from [leetcode]710. Random Pick with Blacklist Problem Solving Report - Programmer Sought

微软笔试题3:Demo Day - martinmateng的博客 - CSDN博客

You work as an intern at a robotics startup. Today is your company's demo day. During the demo your company's robot will be put in a maze and without any information about the maze, it should be able to find a way out.

The maze consists of N * M grids. Each grid is either empty(represented by '.') or blocked by an obstacle(represented by 'b'). The robot will be release at the top left corner and the exit is at the bottom right corner.

Unfortunately some sensors on the robot go crazy just before the demo starts. As a result, the robot can only repeats two operations alternatively: keep moving to the right until it can't and keep moving to the bottom until it can't. At the beginning, the robot keeps moving to the right.

Read full article from 微软笔试题3:Demo Day - martinmateng的博客 - CSDN博客

城市里的间谍（A Spy in the Metro UVa 1025）最详细题解 - Monica - CSDN博客

这道题一看要问最少需要等待多少时间，自然会想到用dp，那么怎么来处理这个问题呢？我们可以用dp[i][j]来表示时刻i，你现在身处第j个站，最少还需要等待多长时间，我们所知的是，在T时刻，你人一定需要在第n个车站去完成间谍任务，所以dp[T][n]=0;可以由这个位置来进行反推，比如我在时刻T-1可以仍然位于车站n或者说如果有往左开的车我就有选择往左坐车的可能；
那么我们对于一个dp[i][j]进行状态转移，有三种转移方式：

Read full article from 城市里的间谍（A Spy in the Metro UVa 1025）最详细题解 - Monica - CSDN博客

774. Minimize Max Distance to Gas Station | NOEY

题意为：给定一些坐标点，表示坐标轴上已有的gas station，同时给定K，表示将要再插入K个gas station，插完以后使gas station之间的最大距离最小。

首先要从离散的点抽象出Interval的概念，其实已有的gas station就是一个个Interval。更为关键的是，这些interval可以视为永远不变，唯一变化的就是他们里边可能插入多少个K。基于这些interval，我们每次选择插入K的时候，只要选择插入当前所有interval中dist最大的那个，一直插入直到它比第二大小为止。插完所有的K，我们就得到了最后的解。

有了以上的理解，一个priority_queue就可以轻易的解决问题，用addK()去update当前interval里的value，然后把update完的top重新插回priority_queue。解法复杂度为O(klogn)

struct Interval {      int length, k, dist;        Interval(int i) : length(i), k(1), dist(i) {}        bool operator< (const Interval& other) const {          return dist < other.dist;      }        void addK() {          k++;          dist = length/k;      }  };    priority_queue<Interval> buildQueue(const vector<int>& v) {      vector<int> ret(v.size());      adjacent_difference(v.begin(), v.end(), ret.begin());      ret.erase(ret.begin());      return priority_queue<Interval>(ret.begin(), ret.end());  }    int minMaxDist(const vector<int>& v, int k) {      auto pq = buildQueue(v);      int maxDist = (v.back() - v.front())/(k+1); // just for optimization        while(k) {          Interval top = pq.top();          pq.pop();            while(pq.top() < top || top.dist > maxDist) {              top.addK();              k--;          }          pq.push(top);      }        return pq.top().dist;  }  

这题的难点在于正确理解题意并抽象化后找出算法，关键点在于：

由于最早给定的gas station，随着K的插入，变化的并不是interval，而是每个interval内部的dist。
只关注最后的最大距离，并不关心每个gas station插在了哪里

Read full article from 774. Minimize Max Distance to Gas Station | NOEY

leetcode按题目总结（十五）—— 贪心法

738. Monotone Increasing Digits

此题大意：求比一个自然数小且单调递增的数中最大的那个。例如对于1002来说，答案是999.对于14436来说答案是13399。
解题思路：寻找要求解答案的规律，发现，答案是原来的正数，从右向左看，第一个违反递减的数字可以减1，然后后面全部用9补齐。注意边界条件：(1)例如1455中后面两个5算不算违反递减规律，其实不用考虑。因为1455的答案是1399，真正违反规律的是4。(2)例如17756，那么违反规律的是第一个7，而前7变为6后，前面的第一个7也会违反规律。这里要做的就是只要出现违反，那么就立即就地减1，继续往前扫描。

452. Minimum Number of Arrows to Burst Balloons

此题大意：给出元素是区间的数组。把区间当作是气球，然后，区间点上射箭会让气球爆炸，现在求让全部气球爆炸的最小射箭次数。
解：先按照区间的右端点的由小到大排序。然后，从第一个右端点开始射箭，看一次能射爆几个气球（只要后面区间的左端点小于射箭点就可以被射爆）。对排序好的区间一遍扫描，每次更新最近射箭点，更新的方法就是跳过可以被射爆的区间，然后每次更新射箭点，需要的箭的数量就加1。

605. Can Place Flowers

此题大意：给出0和1的数组，1代表花，现在要向数组里面加入花，要求是花和花之间必须要有间隔。
解：贪心法。从头到尾扫描一遍数组，如果能放入花，那么就放。

621. Task Scheduler

题意：给出字母数组，字母代表一种任务，每种任务单位时间为1就能解决，在给出一个CPU间隔n，表示不同任务执行之间必须要有n个单位时间的间隔。例如：A任务执行后，必须要执行n-1个其他任务，然后才能重新执行A，如果没有那么多种类的任务，那只有空闲着单位时间了。要求解的是执行完所有任务花费的最少时间，任务执行顺序不受数组顺序限制。
解：此题是贪心的，通用解法就是，把任务装进优先级队列，然后每次往每一轮里面装。
由于每个任务都是间隔时间为n，那么假设n足够大（比任务种类多），那么会发现执行完所有任务其实取决于数量最多的任务，只要每一轮间隔优先执行数量最多的任务，期间安插一些其他任务，就是最优解的形式。当n比任务种类小的时候，贪心的放入，由于每一轮都将被放满，所以最后的长度就是等于任务的数量。
有了最优形式解，那么如果编写程序模拟这样往里面填，其实还是有操作难度的，所以最好还是进一步找答案的规律。当n很大的时候，最优解的形式，假设数量最多的任务数量是maxN，那么前maxN-1轮是不会一定有的，也就是前maxN-1轮的结果一定是(maxN-1)*(n+1)。再来考虑最后一轮，由于前面已经间隔插入了其他任务，所以留到最后一层的任务必然是等于数量最多的任务的（任务数量最多的可能不止一个）。只要加上即可，当然有一种例外，那就是发现最后一轮的任务超过了间隔数，或者算出来的任务比任务的个数（下限）还要少，那就就选任务的个数做结果返回。
PS.计算数量最多任务种类时候，要从map遍历，不能从原数组进行遍历。另外，在max中使用size()时候，要显式转化为int。

253. Meeting Rooms II

题意：给出开会开始结束时间区间为元素的数组，开会房间不能重叠，求开会所需要的最少房间数量。
解：此题使用贪心法+sweepline。原理是这样的：先按会议开始时间将所有区间进行排序，然后扫描每个区间，模拟分配房间的过程：首先，第一个区间占一个房间；对于第二个区间，如果第二个区间的开始时间大于第一个区间的结束时间，意味着第二个区间可以使用第一个区间分配的房间，反之，则要新开一个房间给第二个会议使用。对于第三个房间，如果前面分配的是一个房间，那么要看第二个会议结束时间和自己的开始时间；如果前面分配的是两个房间，那么，看自己起始时间是否比其中的一个结束时间要小，要是比两个都小，那么就只能新开第三个房间，否则就占用其中一个最早结束（所以是贪心法）的房间。

Read full article from leetcode按题目总结（十五）—— 贪心法

Bipartite Matching — amoshyc's CPsolution 1.0 documentation

Bipartite Matching

Read full article from Bipartite Matching — amoshyc's CPsolution 1.0 documentation

有向图的Dijakstra算法

算法原理就不多说了，就讲一讲实现代码采用邻接矩阵的数据结构，a(i,j)代表第i个节点到第j个节点的距离，不能到达为无穷，代码中设为10000000。另外数组len代表节点0到每个节点到最短路程，每循环一次就更新一次，vis数组表示访问情况，1表示访问过，0表示未访问。算法分为2步：（1）置len数组的值，直接将矩阵的第一行给len赋值即可 vis[0] = 1; （2）n-2次循环，每次加入一个节点（为什么不是n-1次呢？最后一次就一个点了，没什么好更新的）每次循环做的事： 1）找到下一个节点（next_node函数） 2）将找到节点的后继的len值进行更新

Read full article from 有向图的Dijakstra算法

Dynamic Programming | 酷狗的小窝

题外话：以前做题的时候老是碰到dp，不过队内有两位dp大师，我也不怎么输出dp题，没想到最近看书又碰到了dp。。。DP真是个需要智商的东西，很绝望啊。。。
UPD：自己之前以为值迭代和策略迭代相比会更加不稳定，但是后来问了一下why，被告诉其实两者都会产生震荡（不稳定）。

Policy Evaluation (Prediction)

police evalution(prediction problem)：We consider how to compute the state-value function $v_\pi$ for an arbitrary policy $\pi$ .

For our purposes, iterative solution methods are most suitable. The initial approximation, $v_0$ , is chosen arbitrarily (except that the terminal state, if any, must be given value 0), and each successive

Read full article from Dynamic Programming | 酷狗的小窝

Aizu - 0531 Paint Color（离散化+imos） - MeloのblogMeloのblog

为了宣传信息竞赛，要在长方形的三合板上喷油漆来制作招牌。三合板上不需要涂色的部分预先贴好了护板。被护板隔开的区域要涂上不同的颜色，比如上图就应该涂上5种颜色。

请编写一个程序计算涂色数量，输入数据中，保证看板不会被护板全部遮住，并且护板的边一定是水平或垂直的。

输入：
第一个数是宽w（1 ≤ w ≤ 1000000），第二个数是高h（1 ≤ h ≤ 1000000）。
第二行是护板的数量n（1 ≤ n ≤ 1000），接着n行是每个护板的左下角坐标 (x1 , y1 )和右上角坐标 (x2 , y2 )，用空格隔开： x1 , y1 , x2 , y2 (0 ≤ x1< x2 ≤ w, 0 ≤ y1 < y2 ≤ h 都是整数)
招牌的坐标系如下，左下角是 (0, 0) ，右上角是(w, h) ，测试集中的30%都满足w ≤ 100, h ≤ 100, n ≤ 100。

题解：

由于N远远小于w和h，所以我们对坐标进行离散化处理。离散化的话，没有什么好说的，需要注意的是这里的给出的是坐标左下角(x1,y1)和右上角(x2,y2)，即对应的方格区域应该是左闭右开的。

这里推荐一下imos算法，可以高效的计算多维空间里重复矩阵的面积。
这里给甩出一个链接，解释的挺好的。传送BiuBiu~

Read full article from Aizu - 0531 Paint Color（离散化+imos） - MeloのblogMeloのblog

USACO Training "Fence Loops" and the Workings of Floyd-Warshall's Algorithm

Floyd-Warshall's is an algorithm for computing all-pairs shortest paths, taking a DP approach: selct any "middle node" K and two nodes I and J in which you'd like to calculate the shortest path for, and then define the shortest path between I and J as whichever is smaller, the current shortest path betwen I and J OR the current shortest path between I and K plus the current shortest path between K and J.

At first this may seem obvious. Let's delve into why it works.
If we iterate K from the beginning (1 to N), at any given point in time whilst K is strictly less than a definite number X, we can say that the current shortest path between any two nodes I and J (regardless of whether I=X or J=X) that the current shortest path will NOT include X. The start and end nodes may include X, but no node in between will be X.

The DP approach - 3 parameter version

Of course we need to understand the three-parameter version!

Let D(i, j, k) represent the shortest path between i and j passing only through vertices between 1 and k, inclusive.

Assume that we have an oracle, knowing all the shortest paths.
Let us define D recursively:

Case 1: the shortest path between i and j does not pass through k
The answer will be D(i, j, k-1) since we can shorten the vertice set to between 1 and k-1, inclusive - the shortest path will not pass through k.

Case 2: the shortest path between i and j will pass through k.
Since using Floyd-Warshall's for SP implies there are no negative weight loops, it is optimal to pass through k exactly once. In other words, the shortest path between i and k as well as the shortest path between k and j will not pass through k. Let us cut the shortest path into two shortest paths that share a endpoint k. The length of the shortest path between i and j can be defined as the shortest path between i and k plus the shortest path between k and j. As such, the answer is D(i, k, k-1) + D(k, j, k-1).

Read full article from USACO Training "Fence Loops" and the Workings of Floyd-Warshall's Algorithm

[POJ] 3258. River Hopscotch — amoshyc's CPsolution 1.0 documentation

最小值最大化，標準的二分搜。二分搜教學請參這

判斷函式為:

bool C(int d) = 移除 M 個石頭後，可否能使相鄰石頭間的最短距離 >= d。 

該函式明顯有單調性，相鄰石頭間的最短距離 d 越小越容易達成。

這個函式可以用 Greedy 實作:

從左至右迭代「起點終點外」的每個石頭 rocks[i]， 若該石頭與前一個石頭的距離 < d，我們必需移除該石頭，才能使相鄰石頭的距離 >= d。 我們計算總共有多少個石頭要移除，該個數必需 <= M。 

實作時，有兩點要注意:

1. 若 rocks[i - 1], rocks[i], rocks[i + 1] 中，rocks[i] 被移除了，    那 rocks[i + 1] 的前一顆石頭是 rocks[i - 1]，而不是 rocks[i]。  2. 上述演算法沒考慮到終點與其前一個石頭。    設中間那 N 個石頭中，最後面的那個石頭為 rocks[-2]，若與終點那石頭的距離 < d，    則要移除的是 rocks[-2]，而不是終點的石頭，因為終點那個石頭是不能移除的。 

解的空間 [0, L]。下界即為在還沒移除任何石頭前，相鄰石頭間最短距離，但要計算出此值還得多打一些程式碼，而我們知道此值必大於 0，反正 C(0) = 1，所以可以直接將下界設為 0。上界發生在 M = N 時，相鄰石頭間最短距離即為 L。

C(d) 在解的空間 [0, L] 上的分佈為:

... 1 1 1 0 0 0 0 ... 

我們的目標是求左邊數來的最後一個 1 在哪？

另外，存在全部都為 1 的情況，我們可以預判掉這情形。

Read full article from [POJ] 3258. River Hopscotch — amoshyc's CPsolution 1.0 documentation

求数组中若干个元素之和等于给定值 - Virtual_Func的博客 - CSDN博客

这个问题的基础版本是：在数组中找两个元素，使其之和等于某个给定值。解法简单：将数组排序后，用两个指针分别位于数组首与数组尾，然后计算两个指针所指元素的和，若大于给定的元素，则尾部的指针向前移动；若小于给定的元素，则首部的指针向后移动。

但该问题提升后，成为了子集和问题，这是一个NP问题。在leetcode上还有升级版本的问题：求给定数组中四个元素之和等于给定元素的所有元素的子集。该问题，只能用暴力搜索，加点启发式的暴力搜索，在数量到两个数之后用上述的两数之和的方法。

看了一个作者的博客里面的例子，理解了在数组中找若干个元素之和等于给定值的方法。作者博客见：点击打开链接

举一个简单的适合自己理解的例子：

数组为： [1,2,3,4,5,6]。找给定值 12 。元素个数不限定，只要找到的元素之和=12即可

Read full article from 求数组中若干个元素之和等于给定值 - Virtual_Func的博客 - CSDN博客

[Poj]1236——最小点基 - PerSeAwe - 博客园

给定一张图，求最小点基

[分析题解]

我是来卖萌的。
最小点基其实很好求，就是极高强连通分量的数目。
但是要注意：只有一个强连通分量的情况呀！！！！！！！！！！！！！WA了无数次。

Read full article from [Poj]1236——最小点基 - PerSeAwe - 博客园

[Algorithm]Splay(1) - PerSeAwe - 博客园

今天好好研究了下传说中的Splay，如果真的比Treap强大的话那就将常用的平衡树换成Splay。

看了一上午别人的论文什么的，又在纸上反复的做了推理，下午尝试着将几个不同的Splay写法进行整理，晚上的时候用splay解决了几个问题。总体来讲，splay的确是一个非常强大的数据结构。当做平衡树只是它的用处之一，作为"伸展树"，它还能做些别人做不了的事情。

首先把今天的结论放在最前面：（只是个人的看法，因为对Splay了解不多，有错误请指出，万分感激）

1.splay的旋转操作与Treap是完全相同的(或者说二叉平衡树都是相同的)，唯一的不同就是Splay有其独特的操作"伸展"。这也就意味着Splay的代码量稍微大一些，速度也稍微慢一些，但是其功能比treap强大而且不需要维护额外的随机值。如果题目仅仅需要进行一下非常基本，如插入，查找，删除之类的基本平衡树操作。那treap应该还是首选，因为treap的维护实在是太简单了。不过一旦涉及到一下较为特殊的操作，那就只能用splay了。

2.另外一个经常与splay进行对比的就是线段树了，与treap和splay相反的是，~~好像在维护区间这方面，splay的要比线段树快上那么一些。~~splay的常数比线段树大太多了，能用线段树的时候不要用splay了，虽然splay的维护操作很神奇。但是同样的，因为线段树的结构和操作实在是太简单了，所以代码量要比splay少那么一个级别吧。

综上所述，当遇到基本的操作时，treap和线段树因为其优秀的代码复杂度应该是首选，但是一旦出现比较复杂或是比较奇特的维护时，就应该用splay在强行乱搞来解决。我一下在想起来去年zbwmqlw神牛给我们讲课讨论的时候，碰到一道数据结构题目的时候，最常用的一句话就是："呀，用splay硬搞一下就行了。"splay的功能真是强大的太过分了。不过，话说回来，上面说这么多，只是对我像我一样的普通人，对于像nehzilrz这样的神驼来说，完全不存在这样的问题。（强烈不建议初学者观看）

Read full article from [Algorithm]Splay(1) - PerSeAwe - 博客园

POJ 2976 Dropping tests 题解《挑战程序设计竞赛》-码农场

乍看以为贪心或dp能解决，后来发现贪心策略与当前的总体准确率有关，行不通，于是二分解决。

依然需要确定一个贪心策略，每次贪心地去掉那些对正确率贡献小的考试。如何确定某个考试[a_i, b_i]对总体准确率x的贡献呢？a_i / b_i肯定是不行的，不然例子里的[0,1]会首当其冲被刷掉。在当前准确率为x的情况下，这场考试"额外"对的题目数量是a_i – x * b_i，当然这个值有正有负，恰好可以作为"贡献度"的测量。于是利用这个给考试排个降序，后k个刷掉就行了。

之后就是二分搜索了，从0到1之间搜一遍，我下面的注释应该很详细，不啰嗦了。

题目的大数据测试用例可以在http://ai.stanford.edu/~chuongdo/acm/2005/找到，有完整的输入输出可供检验自己的程序。

这题吃了点亏，开始没看清楚题目，给主循环的跳出逻辑写为while (cin >> n >> k && (n && k))，后来看到上面的测试用例里有1 0才恍然大悟，整个人表情变成下面这样：

Read full article from POJ 2976 Dropping tests 题解《挑战程序设计竞赛》-码农场

[Algorithm]01分数规划——Update：2012年7月27日 - PerSeAwe - 博客园

根据楼教主的回忆录，他曾经在某一场比赛中秒掉了一道最优比率生成树问题，导致很多人跟风失败，最终悲剧。

自己总结了一些这种问题的解法，因为水平有限，如果有错误或是麻烦的地方，尽管喷，邮箱或是下方留言。

联系我的话perseawe@163.com，欢迎讨论，请在标题前注明[acm]或是[oi]，以免被垃圾邮件。

【知识储备】

只会用到简单的公式的整理与变形，还有求和sigma。

【定义】

01分数规划问题：所谓的01分数规划问题就是指这样的一类问题，给定两个数组，a[i]表示选取i的收益，b[i]表示选取i的代价。如果选取i，定义x[i]=1否则x[i]=0。每一个物品只有选或者不选两种方案，求一个选择方案使得R=sigma(a[i]*x[i])/sigma(b[i]*x[i])取得最值，即所有选择物品的总收益/总代价的值最大或是最小。

01分数规划问题主要包含一般的01分数规划、最优比率生成树问题、最优比率环问题、最大密度子图等。我们将会对这四个问题进行讨论。

永远要记得，我们的目标是使R取到最值。这句话我会在文中反复的强调。

【一些分析】

数学分析中一个很重要的方法就是分析目标式，这样我们来看目标式。

R=sigma(a[i]*x[i])/sigma(b[i]*x[i])

我们来分析一下他有什么性质可以给我们使用。

我们先定义一个函数F(L):=sigma(a[i]*x[i])-L*sigma(b[i]*x[i])，显然这只是对目标式的一个简单的变形。分离参数，得到F(L):=sigma((a[i]-L*b[i])*x[i])。这时我们就会发现，如果L已知的话，a[i]-L*b[i]就是已知的，当然x[i]是未知的。记d[i]=a[i]-L*b[i]，那么F(L):=sigma(d[i]*x[i])，多么简洁的式子。我们就对这些东西下手了。

再次提醒一下，我们的目标是使R取到最大值。

我们来分析一下这个函数，它与目标式的关系非常的密切，L就是目标式中的R，最大化R也就是最大化L。

F的值是由两个变量共同决定的，即方案X和参数L。对于一个确定的参数L来说，方案的不同会导致对应的F值的不同，那么这些东西对我们有什么用呢?

假设我们已知在存在一个方案X使得F(L)>0，这能够证明什么?

F(L)=sigma(a[i]*x[i])-L*sigma(b[i]*x[i])>0即sigma(a[i]*x[i])/sigma(b[i]*x[i])>L也就是说，如果一个方案使得F(L)>0说明了这组方案可以得到一个比现在的L更优的一个L，既然有一个更优的解，那么为什么不用呢？

显然，d数组是随着L的增大而单调减的。也就是说，存在一个临界的L使得不存在一种方案，能够使F(L)>0. 我们猜想，这个时候的L就是我们要求的最优解。之后更大的L值则会造成无论任何一种方案，都会使F(L)<0.类似于上面的那个变形，我们知道，F(L)<0是没有意义的，因为这时候的L是不能够被取得的。当F(L)=0使，对应方案的R值恰好等于此时的L值。

综上，函数F(L)有这样的一个性质：在前一段L中可以找到一组对应的X使得F(L)>0，这就提供了一种证据，即有一个比现在的L更优的解，而在某个L值使，存在一组解使得F(L)=0,且其他的F(L)<0，这时的L无法继续增大，即这个L就是我们期望的最优解，之后的L会使得无论哪种方案都会造成F(L)<0.而我们已经知道，F(L)<0是没有任何意义的，因为此时的L值根本取不到。

Read full article from [Algorithm]01分数规划——Update：2012年7月27日 - PerSeAwe - 博客园

LintCode/LevelReviewPage.md at master · awangdev/LintCode · GitHub

第一步：生题型, 理解题意需要时间：从字面和画图而言, 就是定住房子一个个过，房子左右的distance需要足够达到heater. 目标是招尽可能小的radius, 所以house和heater紧贴着是必要的. 在for loop里面定下house，把heater当作一个区间移动, 达到的第一个合适区间，这就是当下最小的理想radius，取这个值跟既定radius作比较。比较之后，继续移动house，再试着移动heater区间去match。

Read full article from LintCode/LevelReviewPage.md at master · awangdev/LintCode · GitHub

LintCode-2/KnowledgeHash.md at master · pxu/LintCode-2 · GitHub

Arrays.asList([1,2,3]);
Partial sort: Arrays.sort(arr, 0, arr.length())
Copy: Arrays.copyOf(arr, arr.length())
Arrays.toString(int[] arr) => string representation: "[1,2,3,4]"
见到数组要想到: 是否可以/需要先排序?

Honorable Problems

Median of two sorted arrays: find kth element of two sorted array where k = n/2. Recursive findKth(A[],indexA, B[], indexB, k)

Read full article from LintCode-2/KnowledgeHash.md at master · pxu/LintCode-2 · GitHub

野球拳刷刷刷: 938. Range Sum of BST

Given the root node of a binary search tree, return the sum of values of all nodes with value between L and R (inclusive).

The binary search tree is guaranteed to have unique values.

Example 1:

Input: root = [10,5,15,3,7,null,18], L = 7, R = 15
Output: 32

Example 2:

Input: root = [10,5,15,3,7,13,18,1,null,6], L = 6, R = 10
Output: 23

Note:

The number of nodes in the tree is at most 10000.
The final answer is guaranteed to be less than 2^31.

Solution:

Generally for a tree problem, we can think of either using a recursive traversal, or bfs.

We choose to use recursion here.

Knowing the property of BST and given a root node.
1. The exit condition is that when root is null, we return 0.
2. If root.val < L, we know all nodes in its left subtree are smaller than L. So we do not need to count that. We only consider its right subtree.
3. If root.val > R, we know all nodes in its right subtree are larger than R. So we do not need to count that. We only consider its left subtree.
4. If root.val is between L and R, we need to count this value. And we need to count both its left and right subtrees recursively.

The time complexity is O(n), assume n is the total number of nodes, since each node will be traversed only once.

Read full article from 野球拳刷刷刷: 938. Range Sum of BST

Copy Books (LintCode) – I fly

Given an array A of integer with size of n( means n books and number of pages of each book) and k people to copy the book. You must distribute the continuous id books to one people to copy. (You can give book A[1],A[2] to one people, but you cannot give book A[1], A[3] to one people, because book A[1] and A[3] is not continuous.) Each person have can copy one page per minute. Return the number of smallest minutes need to copy all the books.

Example

Given array A = [3,2,4], k = 2.

Return 5( First person spends 5 minutes to copy book 1 and book 2 and second person spends 4 minutes to copy book 3. )

Challenge

Could you do this in O(n*k) time ?

Solution:

dynamical programming; dp[ia][ib] represents the smallest minutes needed for ia people to copy ib books. Initially dp[ia][ib] = min(dp[ia][ib], max(dp[ia-1][ic], sum(pages[ic], …, pages[ib])) ) for ia <= ic <= ib-1.

Read full article from Copy Books (LintCode) – I fly

leetcode 410. Split Array Largest Sum - 淡然坊 - CSDN博客

Given an array which consists of non-negative integers and an integer m, you can split the array into m non-empty continuous subarrays. Write an algorithm to minimize the largest sum among these m subarrays.

Note:
If n is the length of array, assume the following constraints are satisfied:

1 ≤ n ≤ 1000
1 ≤ m ≤ min(50, n)

Examples:

Input:  nums = [7,2,5,10,8]  m = 2    Output:  18    Explanation:  There are four ways to split nums into two subarrays.  The best way is to split it into [7,2,5] and [10,8],  where the largest sum among the two subarrays is only 18.

这道题很明显又是一道递归 with memo 或者 DP 的题。

这道题又一次印证了 map 没有 int[][]数组快的结论！因为我使用 HashMap<Integer,HashMap<Integer,Integer> 华丽丽地TLE了，但是改成 int[][] 就AC了。

我的思路是：int[row][column] ：row存储了当前数组的开始索引 left，column 存储了当前的 m。因此 memo[i][j] 就是：从 i 开始的数组要被划分为 j 个子数组时的结果值。

Read full article from leetcode 410. Split Array Largest Sum - 淡然坊 - CSDN博客

巧用二分的一些题目 | gjxhlan's blog

有些时候，某些题目从不同的角度思考都可以得到正确的结果，但是其时间和空间复杂度却大不一样。

例如一道题可以通过递推去正向求解，但是时间复杂度可能会较高；如果这道题的某些值在一个特定的范围之内，而且我们又可以给出一个采纳或放弃这个解的线性时间的判别条件的话，不妨换个思路直接对解的值采用二分查找的策略，从而可以快速排除一些不需要枚举的状态。

Read full article from 巧用二分的一些题目 | gjxhlan's blog

LeetCode – 875. Koko Eating Bananas

Koko loves to eat bananas. There are N piles of bananas, the i-th pile has piles[i] bananas. The guards have gone and will come back in H hours.

Koko can decide her bananas-per-hour eating speed of K. Each hour, she chooses some pile of bananas, and eats K bananas from that pile. If the pile has less than K bananas, she eats all of them instead, and won't eat any more bananas during this hour.

Koko likes to eat slowly, but still wants to finish eating all the bananas before the guards come back.

Return the minimum integer K such that she can eat all the bananas within H hours.

Example 1:
Input: piles = [3,6,7,11], H = 8
Output: 4

Example 2:
Input: piles = [30,11,23,4,20], H = 5
Output: 30

Example 3:
Input: piles = [30,11,23,4,20], H = 6
Output: 23

Note:
1 <= piles.length <= 10^4
piles.length <= H <= 10^9
1 <= piles[i] <= 10^9

题意：

Koko是一只喜欢吃香蕉的大猩猩。现在有N堆香蕉，第i堆有piles[i]个香蕉。管理员现在离开了，会在H（H >= piles.length）小时之后回来。Koko可以决定吃香蕉的速度，即每小时吃K个香蕉。每个小时它选择一堆香蕉，并吃掉其中的K个香蕉。如果某一堆里剩下的香蕉少于K个，它会吃完这一堆剩下的所有香蕉，但它这一小时内不会吃更多的其他堆的香蕉。

假设Koko喜欢慢慢地吃香蕉，但又想在管理员回来之前吃完所有香蕉。请问Koko每小时至少需要吃多少香蕉才能在管理员回来之前吃完呢？

例如，有4堆香蕉，每堆香蕉的数目为[3, 6, 7,11]，如果管理员在8小时后回来，那么Koko每小时吃4个香蕉。

分析：

由于Koko在一个小时内把一堆香蕉吃完之后不会再去吃其他的香蕉，那么它一小时能吃掉的香蕉的数目不会超过最多的一堆香蕉的数目（记为M）。同时，它每小时最少会吃1个香蕉，所以最终Koko决定的吃香蕉的速度K应该是在1到M之间。

我们可以应用二分查找的思路，先选取1和M的平均数，(1+M)/2，看以这个速度Koko能否在H小时内吃掉所有香蕉。如果不能在H小时内吃掉所有的香蕉，那么它需要尝试更快的速度，也就是K应该在(1+M)/2到M之间，下一次我们尝试(1+M)/2和M的平均值。

如果Koko以(1+M)/2的速度能够在H小时内吃完所有的香蕉，那么我们来判断这是不是最慢的速度。可以尝试一下稍微慢一点的速度，(1+m)/2 - 1。如果Koko以这个速度不能在H小时之内吃完所有香蕉，那么(1+M)/2就是最慢的可以在H小时吃完香蕉的速度。如果以(1+m)/2 - 1的速度也能在H小时内吃完香蕉，那么接下来Koko尝试更慢的速度，1和(1+M)/2的平均值。

以此类推，我们按照二分查找的思路总能找到让Koko在H小时内吃完所有香蕉的最慢速度K。以下是这种思路的参考代码：

Read full article from LeetCode – 875. Koko Eating Bananas

算法笔记: 领扣#852 山脉数组的峰顶索引 - Memogrocery

如果数组A满足一下性质，我们就说它是一个山脉数组：

A.length >= 3
存在某个i，满足0 < i < A.length - 1 且 A[0] < A[1] < … A[i-1] < A[i] > A[i+1] > … > A[A.length - 1]

给定一个符合上面条件的数组，返回任何一个满足A[0] < A[1] < … A[i-1] < A[i] > A[i+1] > … > A[A.length - 1] 的i。

例子 1:

输入: [0,1,0]
输出: 1

例子 2:

输入: [0,2,1,0]
输出: 1

注意:

3 <= A.length <= 10000
0 <= A[i] <= 10^6
A 是一个山，满足上面所述的条件。

解法

解法 1

基本思路

符合条件的A[i]也是数组的最大值。可以找到最大值，再返回对应索引。

Read full article from 算法笔记: 领扣#852 山脉数组的峰顶索引 - Memogrocery

Algorithm Notes: Leetcode#852 Peak Index in a Mountain Array - Memogrocery

Let's call an array A a mountain if the following properties hold:

A.length >= 3
There exists some 0 < i < A.length - 1 such that A[0] < A[1] < … A[i-1] < A[i] > A[i+1] > … > A[A.length - 1]

Given an array that is definitely a mountain, return any i such that A[0] < A[1] < … A[i-1] < A[i] > A[i+1] > … > A[A.length - 1].

Example 1:

Input: [0,1,0]
Output: 1

Example 2:

Input: [0,2,1,0]
Output: 1

Note:

3 <= A.length <= 10000
0 <= A[i] <= 10^6
A is a mountain, as defined above.

Solution

Solution 1

Basic idea

The peak is the maximum value of the array. So, we can find the maximum first and then return its index.

Read full article from Algorithm Notes: Leetcode#852 Peak Index in a Mountain Array - Memogrocery

Data-Structures-and-Algorithms/problem-solving.md at master · cormacpayne/Data-Structures-and-Algorithms · GitHub

Problem solving is the ability to reach a solution for a defined problem by, potentially, using mathematical or systematic operations (algorithms).

Not only an important skill to have in competitive programming, but also in the real world where problems arise everyday at jobs; being able to attack a problem quickly and efficiently is a skill desired by all companies.

The following steps should serve as a general timeline for solving a problem:

Define the problem
Come up with all possible cases
Implement the solution

In competitive programming, having a strong problem solving ability is just as important as having a strong programming background.

Even if you've been programming your entire life, if you don't know how to attack a problem, coding isn't going to help you

Being able to identify edge cases and worst case scenarios for a problem are also important in competitive programming.

If you just rely on the input/output given to you in the problem statement, your code might not handle ALL cases (which is necessary for an accepted solution)

How do we break a problem down?

If we are given a problem to work on, what are the steps we must take in order to get an accepted solution? Here are the basic steps to follow:

Read the problem
Manually solve the problem
Take note of the constraints
Implement a solution
Optimize your solution
Handle edge cases

Step 1: Read the problem

Yes, this is an obvious step in solving a problem, but it is a step that can cause turmoil in the long run.

Even if the background story behind a problem is silly or boring, you must read the problem all the way through to make sure that you didn't miss any important details that are critical to solving the problem.

If you spent 45 minutes solving a problem with some algorithm, but realize later that the problem asked you to do something completely different, you don't get that time back.

Step 2: Manually solve the problem

You will be given input and output for a problem, so you should take a few minutes to grab pen and paper and figure out how they got their answer.

Writing out how they derived their output from the given input will give you a general algorithm for your solution; start by taking the steps you took and turning those into lines of code.

This will also stop you from jumping into the problem straight away and saying "how did they get that answer?" when debugging your code down the road.

Step 3: Take note of the constraints

Step 3: Take note of the constraints These are the size of the input variables that are given to you by the problem (e.g., X <= 1,000,000).

Knowing the sizes of the variables that are given can eliminate possible algorithms that you chose to use.

In order to know what is a good algorithm for a given constraint, you will need to understand Big O Notation.

For most problems, you can assume that your program will be able to process ~100,000,000 operations per second.

Additionally, most problems will have a time limit between 0.5 and 3 seconds, so the number of "steps" your algorithm takes cannot exceed ~300M.

Step 4: Implement a solution

Use the algorithm/pseudocode you came up with when manually solving the test cases to implement a basic solution.

Remember: this algorithm you came up with does not need to be optimal; as long as you can come up with something that will get you the correct answer, you can worry about optimizing afterwards.

Step 5: Optimize your solution

Making sure that your code can handle the constraints of the input is key to getting AC on a problem; if you don't take this step into account, you won't be able to get anything but Time Limit Exceeded on a problem.

Thinks of ways to eliminate unnecessary loops and operations within your code in order to trim off some time so that you can efficiently solve the problem.

Step 6: Handle edge cases

Being able to come up with tricky test cases that the judge possibly came up with can save you trouble in the long run (as well as avoid a Wrong Answer). Here are some things to keep in mind:

Did you handle the minimum and maximum cases?
Did you handle potential negative numbers?
Is the problem using longs instead of ints?
Is there potential for your code to throw a runtime error? (e.g., array out of bounds, stack overflow, etc.)

Read full article from Data-Structures-and-Algorithms/problem-solving.md at master · cormacpayne/Data-Structures-and-Algorithms · GitHub

47. Permutations II - LeetCode Solution

给定一个可包含重复数字的序列，返回所有不重复的全排列。

示例:

输入: [1,1,2] 输出: [   [1,1,2],   [1,2,1],   [2,1,1] ]

题目难度：中等。
英文网址：47. Permutations II 。
中文网址：47. 全排列 II 。

思路分析

求解关键：找到重复的原因，对树进行剪枝。 1、首先将数组排序，这一步很关键，是后面剪枝的基础； 2、只处理第 1 次遇到的那个数，举个具体的例子画个图。重点理解：（1） i > 0 ，（2） nums[i] == nums[i - 1] ，（3）之前那个数还没有使用，即 marked[i-1] = false。

参考解答

Read full article from 47. Permutations II - LeetCode Solution

[LeetCode] 47. Permutations II

June 13, 2017

Given a collection of numbers that might contain duplicates, return all possible unique permutations.

For example,
[1,1,2] have the following unique permutations:

[    [1,1,2],    [1,2,1],    [2,1,1]  ]

Thought process:
If the collection contains duplicate, I cannot only use the original method I used in "46. Permutations", because it will produce duplicate permutations. For example, if the collection is [0, 1, 1], the result will contain two [0, 1, 1]s.
The idea is to maintain a rule about which one of the duplicate numbers can appear in the permutations. To do this, I can sort the array and make sure that only the number with a smaller index can appear in a duplicate permutation.
For example, as I add numbers to the permutation list, if I found that I'm adding the second 1 while the first 1 is not in the list, that means the first 1 has already been used to make the exact same permutation. So I continue to the next iteration.

Read full article from [LeetCode] 47. Permutations II

LeetCode 63. Unique Paths II – 小虾大蟹

A robot is located at the top-left corner of a m x n grid (marked 'Start' in the diagram below).

The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid (marked 'Finish' in the diagram below).

Now consider if some obstacles are added to the grids. How many unique paths would there be?

An obstacle and empty space is marked as 1 and 0 respectively in the grid.

Read full article from LeetCode 63. Unique Paths II – 小虾大蟹

Interview Questions 5: Successor with delete – Sigmainfy – Learn, Share, Inspire

We try to relate the solution to the problem of Successor with delete to Union-Find algorithm here and discuss how to relate them and make it more straightforward in the first place from two different angles: 1) Analyze it in a recursive way and give an informal proof; 2) Draw it out by hand.

Problem Definition:

Note this problem is from the "Algorithm Part 1″ on Coursera. The detailed description is as follows:

Successor with delete. Given a set of N integers S={0,1,…,N−1} and a sequence of requests of the following form:

Remove x from S
Find the successor of x: the smallest y in S such that y >= x.

Design a data type so that all operations (except construction) should take logarithmic time or better.

Analysis:

Well, I think the Union-Find approach to solve this problem should already be known to quite a number of people, the challenge would be how to relate the solution to Union-Find approach at the first place without any hints about Union-Find. I first directly give the approach and then discuss how to relate it to Union-Find from two different angles to help consider the approach in Union-Find way more naturally.

Read full article from Interview Questions 5: Successor with delete – Sigmainfy – Learn, Share, Inspire

LeetCode 解题报告(684,685,721)-并查集介绍及应用 | 吴良超的学习笔记

有两个经典的算法可用来解决最小生成树问题： Kruskal 算法和 Prim 算法。其中 Kruskal 算法中便应用了并查集这种数据结构，该算法的步骤如下

新建图G，G中拥有原图中相同的节点，但没有边
将原图中所有的边按权值从小到大排序
从权值最小的边开始，如果这条边连接的两个节点于图G中不在同一个连通分量中，则添加这条边到图G中
重复3，直至图G中所有的节点都在同一个连通分量中

该算法的动图显示如下(摘自维基百科)

Kruskal 算法很简单，实际上 Kruskal 算法是一种贪心算法，并且已被证明最终能够收敛到最好结果。而在实现 Kruskal 算法时，则需要用到并查集这种数据结构来减小算法的时间复杂度。下面将详细介绍这种数据结构。

在介绍并查集前，顺便介绍一下 Prime 算法，Prime 算法也是一种贪心算法，而且也被证明了最终能够得到最好的结果，只是两者的侧重点不同， Kruskal 算法维护的是一个边的集合，而 Prime 算法则同时维护了一个边的集合和一个点的集合，Prim 算法的过程如下

输入：一个加权连通图，其中顶点集合为V，边集合为E；
初始化：Vnew = {x}，其中x为集合V中的任一节点（起始点），Enew = {}；
重复下列操作，直到Vnew = V：
1. 在集合E中选取权值最小的边（u, v），其中u为集合 Vnew 中的元素，而v则是V中没有加入Vnew的顶点（如果存在有多条满足前述条件即具有相同权值的边，则可任意选取其中之一）；
2. 将v加入集合Vnew中，将（u, v）加入集合Enew中；
输出：使用集合Vnew和Enew来描述所得到的最小生成树。

Read full article from LeetCode 解题报告(684,685,721)-并查集介绍及应用 | 吴良超的学习笔记

Leetcode 685. Redundant Connection II | Coding.Siskon

685. Redundant Connection II

Difficulty: Hard
Total Accepted: 11.2K
Total Submissions: 39.7K

In this problem, a rooted tree is a directed graph such that, there is exactly one node (the root) for which all other nodes are descendants of this node, plus every node has exactly one parent, except for the root node which has no parents.

The given input is a directed graph that started as a rooted tree with N nodes (with distinct values 1, 2, …, N), with one additional directed edge added. The added edge has two different vertices chosen from 1 to N, and was not an edge that already existed.

The resulting graph is given as a 2D-array of edges. Each element of edges is a pair [u, v] that represents a directed edge connecting nodes u and v, where u is a parent of child v.

Return an edge that can be removed so that the resulting graph is a rooted tree of N nodes. If there are multiple answers, return the answer that occurs last in the given 2D-array.