Problem: Design a location based service like Uber?
Scenario: Rider should be able to request a service, an available driver should be matched We should be able to keep track of driver's location->drivers report location every 4 seconds(based on this article)
Services:
Geo Service: stores drivers location information, handle queriers like "give me drives within 2kms of this location". Since drivers report location information every 4 seconds, it will be write heavy. Assume 0.2M active drivers per day, QPS will be 200000 / 4 = 50K, Peak QPS might be 50 * 3 = 150K, writes need to be efficient
Dispatch Service: To handle riders request, matches the driver and keep track of the trip information
Scenario: Design a typeahead service which support following functionalities:
Be able to collect the data and keep track of the sentences users has typed recently
For every word user typed, return the top k hot sentences start with this prefix
Services:
Data Collection Service: collect the data form users and do preprocessing to get frequency of each service
Query Service: Store hot words for every prefix and return the top k hot words for given query request. Let's assume there are 1B active user and each of them do 10 searches daily and average sentence length is 10(every time user type a char we may need to search for the new prefix), QPS = 1B * 10 * 10 / 86400 = 1M, peak QPS = 2 * 1M = 2M. It is a pretty ready heavy service.
Storage:
Where do we get the data?
We can get the data by logging, each time user typed a full sentence and send a request, we will log the data in the disk, like <user_id, sentence, timestamp>. The log information can be stored in distributed file system like GFS, since usually the new generated data every day is really large. Let's use the estimated number above, assume each log entry will be 20 bytes the data generated every day will be 1B * 10 * 20 Bytes = 200G. And if we want to the data for past two weeks, it will be 200G * 14 = 2.8 T.
After we have the log data, we can perform the map reduce to get the <sentence, frequency> pair we want, since it is a simple key-value pair, we can store it in distributed nosql database like BigTable.
It will still be pretty expensive to store this large amount of data, actually we don't want the service to be that accurate, we just want to know the general trend. This is why we need to bring up probabilistic logging, every time we can generate a random number between [0, 1000) and if the number equals to 0, we log this data, so generally we are logging with 1 / 1000 probability. And the data generate every day can be reduced from 200G to 200M.
We are given N different types of stickers. Each sticker has a lowercase English word on it.
You would like to spell out the given target string by cutting individual letters from your collection of stickers and rearranging them.
You can use each sticker more than once if you want, and you have infinite quantities of each sticker.
What is the minimum number of stickers that you need to spell out the target? If the task is impossible, return -1.
Example 1:
Input:
["with", "example", "science"], "thehat"
Output:
3
Explanation:
We can use 2 "with" stickers, and 1 "example" sticker. After cutting and rearrange the letters of those stickers, we can form the target "thehat". Also, this is the minimum number of stickers necessary to form the target string.
Example 2:
Input:
["notice", "possible"], "basicbasic"
Output:
-1
Explanation:
We can't form the target "basicbasic" from cutting letters from the given stickers.
Given N, and a blacklist array, each time a random number is selected from [0, N), the random number cannot exist in the blacklist. Idea: The number of random numbers in the final result is definitelyN-len(blacklist), so we have to choose the last random value, one is to map the value in the blacklist to the whitelist The number of random numbers in the result isN-len(blacklist), then we must first take the array [0, N] beforeN-len(blacklist)Elements, but before thisN-len(blacklist)What if there are elements in the blacklist? We use a pointer to scan the array from back to front. If the latter value is not in the blacklist and the previous value is in the blacklist, replace the previous one with the following value. After the above operation, we will range from randomNShrinked toN- len(blacklist)In the range of size, save it with a map, the key is1~N-len(blacklist), the value is the number in the white list
You work as an intern at a robotics startup. Today is your company's demo day. During the demo your company's robot will be put in a maze and without any information about the maze, it should be able to find a way out.
The maze consists of N * M grids. Each grid is either empty(represented by '.') or blocked by an obstacle(represented by 'b'). The robot will be release at the top left corner and the exit is at the bottom right corner.
Unfortunately some sensors on the robot go crazy just before the demo starts. As a result, the robot can only repeats two operations alternatively: keep moving to the right until it can't and keep moving to the bottom until it can't. At the beginning, the robot keeps moving to the right.
structInterval{intlength,k,dist;Interval(inti):length(i),k(1),dist(i){}booloperator<(constInterval&other)const{returndist<other.dist;}voidaddK(){k++;dist=length/k;}};priority_queue<Interval>buildQueue(constvector<int>&v){vector<int>ret(v.size());adjacent_difference(v.begin(),v.end(),ret.begin());ret.erase(ret.begin());returnpriority_queue<Interval>(ret.begin(),ret.end());}intminMaxDist(constvector<int>&v,intk){autopq=buildQueue(v);intmaxDist=(v.back()-v.front())/(k+1);// just for optimizationwhile(k){Intervaltop=pq.top();pq.pop();while(pq.top()<top||top.dist>maxDist){top.addK();k--;}pq.push(top);}returnpq.top().dist;}
police evalution(prediction problem):We consider how to compute the state-value function vπ for an arbitrary policy π.
For our purposes, iterative solution methods are most suitable. The initial approximation, v0, is chosen arbitrarily (except that the terminal state, if any, must be given value 0), and each successive
Floyd-Warshall's is an algorithm for computing all-pairs shortest paths, taking a DP approach: selct any "middle node" K and two nodes I and J in which you'd like to calculate the shortest path for, and then define the shortest path between I and J as whichever is smaller, the current shortest path betwen I and J OR the current shortest path between I and K plus the current shortest path between K and J.
At first this may seem obvious. Let's delve into why it works. If we iterate K from the beginning (1 to N), at any given point in time whilst K is strictly less than a definite number X, we can say that the current shortest path between any two nodes I and J (regardless of whether I=X or J=X) that the current shortest path will NOT include X. The start and end nodes may include X, but no node in between will be X.
The DP approach - 3 parameter version
Of course we need to understand the three-parameter version!
Let D(i, j, k) represent the shortest path between i and j passing only through vertices between 1 and k, inclusive.
Assume that we have an oracle, knowing all the shortest paths. Let us define D recursively:
Case 1:the shortest path between i and j does not pass through k The answer will be D(i, j, k-1) since we can shorten the vertice set to between 1 and k-1, inclusive - the shortest path will not pass through k.
Case 2: the shortest path between i and j will pass through k. Since using Floyd-Warshall's for SP implies there are no negative weight loops, it is optimal to pass through k exactly once. In other words, the shortest path between i and k as well as the shortest path between k and j will not pass through k. Let us cut the shortest path into two shortest paths that share a endpoint k. The length of the shortest path between i and j can be defined as the shortest path between i and k plus the shortest path between k and j. As such, the answer is D(i, k, k-1) + D(k, j, k-1).
Given the root node of a binary search tree, return the sum of values of all nodes with value between L and R (inclusive).
The binary search tree is guaranteed to have unique values.
Example 1:
Input: root = [10,5,15,3,7,null,18], L = 7, R = 15 Output: 32
Example 2:
Input: root = [10,5,15,3,7,13,18,1,null,6], L = 6, R = 10 Output: 23
Note:
The number of nodes in the tree is at most 10000. The final answer is guaranteed to be less than 2^31.
Solution:
Generally for a tree problem, we can think of either using a recursive traversal, or bfs.
We choose to use recursion here.
Knowing the property of BST and given a root node. 1. The exit condition is that when root is null, we return 0. 2. If root.val < L, we know all nodes in its left subtree are smaller than L. So we do not need to count that. We only consider its right subtree. 3. If root.val > R, we know all nodes in its right subtree are larger than R. So we do not need to count that. We only consider its left subtree. 4. If root.val is between L and R, we need to count this value. And we need to count both its left and right subtrees recursively.
The time complexity is O(n), assume n is the total number of nodes, since each node will be traversed only once.
Given an array A of integer with size of n( means n books and number of pages of each book) and k people to copy the book. You must distribute the continuous id books to one people to copy. (You can give book A[1],A[2] to one people, but you cannot give book A[1], A[3] to one people, because book A[1] and A[3] is not continuous.) Each person have can copy one page per minute. Return the number of smallest minutes need to copy all the books.
Example
Given array A = [3,2,4], k = 2.
Return 5( First person spends 5 minutes to copy book 1 and book 2 and second person spends 4 minutes to copy book 3. )
Challenge
Could you do this in O(n*k) time ?
Solution:
dynamical programming; dp[ia][ib] represents the smallest minutes needed for ia people to copy ib books. Initially dp[ia][ib] = min(dp[ia][ib], max(dp[ia-1][ic], sum(pages[ic], …, pages[ib])) ) for ia <= ic <= ib-1.
Given an array which consists of non-negative integers and an integer m, you can split the array into m non-empty continuous subarrays. Write an algorithm to minimize the largest sum among these m subarrays.
Note: If n is the length of array, assume the following constraints are satisfied:
1 ≤ n ≤ 1000
1 ≤ m ≤ min(50, n)
Examples:
Input: nums = [7,2,5,10,8] m = 2 Output: 18 Explanation: There are four ways to split nums into two subarrays. The best way is to split it into [7,2,5] and [10,8], where the largest sum among the two subarrays is only 18.
Koko loves to eat bananas. There are N piles of bananas, the i-th pile has piles[i] bananas. The guards have gone and will come back in H hours.
Koko can decide her bananas-per-hour eating speed of K. Each hour, she chooses some pile of bananas, and eats K bananas from that pile. If the pile has less than K bananas, she eats all of them instead, and won't eat any more bananas during this hour.
Koko likes to eat slowly, but still wants to finish eating all the bananas before the guards come back.
Return the minimum integer K such that she can eat all the bananas within H hours.
Example 1: Input: piles = [3,6,7,11], H = 8 Output: 4
Example 2: Input: piles = [30,11,23,4,20], H = 5 Output: 30
Example 3: Input: piles = [30,11,23,4,20], H = 6 Output: 23
Problem solving is the ability to reach a solution for a defined problem by, potentially, using mathematical or systematic operations (algorithms).
Not only an important skill to have in competitive programming, but also in the real world where problems arise everyday at jobs; being able to attack a problem quickly and efficiently is a skill desired by all companies.
The following steps should serve as a general timeline for solving a problem:
Define the problem
Come up with all possible cases
Implement the solution
In competitive programming, having a strong problem solving ability is just as important as having a strong programming background.
Even if you've been programming your entire life, if you don't know how to attack a problem, coding isn't going to help you
Being able to identify edge cases and worst case scenarios for a problem are also important in competitive programming.
If you just rely on the input/output given to you in the problem statement, your code might not handle ALL cases (which is necessary for an accepted solution)
How do we break a problem down?
If we are given a problem to work on, what are the steps we must take in order to get an accepted solution? Here are the basic steps to follow:
Read the problem
Manually solve the problem
Take note of the constraints
Implement a solution
Optimize your solution
Handle edge cases
Step 1: Read the problem
Yes, this is an obvious step in solving a problem, but it is a step that can cause turmoil in the long run.
Even if the background story behind a problem is silly or boring, you must read the problem all the way through to make sure that you didn't miss any important details that are critical to solving the problem.
If you spent 45 minutes solving a problem with some algorithm, but realize later that the problem asked you to do something completely different, you don't get that time back.
Step 2: Manually solve the problem
You will be given input and output for a problem, so you should take a few minutes to grab pen and paper and figure out how they got their answer.
Writing out how they derived their output from the given input will give you a general algorithm for your solution; start by taking the steps you took and turning those into lines of code.
This will also stop you from jumping into the problem straight away and saying "how did they get that answer?" when debugging your code down the road.
Step 3: Take note of the constraints
Step 3: Take note of the constraints These are the size of the input variables that are given to you by the problem (e.g., X <= 1,000,000).
Knowing the sizes of the variables that are given can eliminate possible algorithms that you chose to use.
In order to know what is a good algorithm for a given constraint, you will need to understand Big O Notation.
For most problems, you can assume that your program will be able to process ~100,000,000 operations per second.
Additionally, most problems will have a time limit between 0.5 and 3 seconds, so the number of "steps" your algorithm takes cannot exceed ~300M.
Step 4: Implement a solution
Use the algorithm/pseudocode you came up with when manually solving the test cases to implement a basic solution.
Remember: this algorithm you came up with does not need to be optimal; as long as you can come up with something that will get you the correct answer, you can worry about optimizing afterwards.
Step 5: Optimize your solution
Making sure that your code can handle the constraints of the input is key to getting AC on a problem; if you don't take this step into account, you won't be able to get anything but Time Limit Exceeded on a problem.
Thinks of ways to eliminate unnecessary loops and operations within your code in order to trim off some time so that you can efficiently solve the problem.
Step 6: Handle edge cases
Being able to come up with tricky test cases that the judge possibly came up with can save you trouble in the long run (as well as avoid a Wrong Answer). Here are some things to keep in mind:
Did you handle the minimum and maximum cases?
Did you handle potential negative numbers?
Is the problem using longs instead of ints?
Is there potential for your code to throw a runtime error? (e.g., array out of bounds, stack overflow, etc.)
Given a collection of numbers that might contain duplicates, return all possible unique permutations.
For example, [1,1,2] have the following unique permutations:
[ [1,1,2], [1,2,1], [2,1,1] ]
Thought process: If the collection contains duplicate, I cannot only use the original method I used in "46. Permutations", because it will produce duplicate permutations. For example, if the collection is [0, 1, 1], the result will contain two [0, 1, 1]s. The idea is to maintain a rule about which one of the duplicate numbers can appear in the permutations. To do this, I can sort the array and make sure that only the number with a smaller index can appear in a duplicate permutation. For example, as I add numbers to the permutation list, if I found that I'm adding the second 1 while the first 1 is not in the list, that means the first 1 has already been used to make the exact same permutation. So I continue to the next iteration.
A robot is located at the top-left corner of a m x n grid (marked 'Start' in the diagram below).
The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid (marked 'Finish' in the diagram below).
Now consider if some obstacles are added to the grids. How many unique paths would there be?
An obstacle and empty space is marked as 1 and 0 respectively in the grid.
We try to relate the solution to the problem of Successor with delete to Union-Find algorithm here and discuss how to relate them and make it more straightforward in the first place from two different angles: 1) Analyze it in a recursive way and give an informal proof; 2) Draw it out by hand.
Problem Definition:
Note this problem is from the "Algorithm Part 1″ on Coursera. The detailed description is as follows:
Successor with delete. Given a set of N integers S={0,1,…,N−1} and a sequence of requests of the following form:
Remove x from S
Find the successor of x: the smallest y in S such that y >= x.
Design a data type so that all operations (except construction) should take logarithmic time or better.
Analysis:
Well, I think the Union-Find approach to solve this problem should already be known to quite a number of people, the challenge would be how to relate the solution to Union-Find approach at the first place without any hints about Union-Find. I first directly give the approach and then discuss how to relate it to Union-Find from two different angles to help consider the approach in Union-Find way more naturally.
In this problem, a rooted tree is a directed graph such that, there is exactly one node (the root) for which all other nodes are descendants of this node, plus every node has exactly one parent, except for the root node which has no parents.
The given input is a directed graph that started as a rooted tree with N nodes (with distinct values 1, 2, …, N), with one additional directed edge added. The added edge has two different vertices chosen from 1 to N, and was not an edge that already existed.
The resulting graph is given as a 2D-array of edges. Each element of edges is a pair [u, v] that represents a directed edge connecting nodes u and v, where u is a parent of child v.
Return an edge that can be removed so that the resulting graph is a rooted tree of N nodes. If there are multiple answers, return the answer that occurs last in the given 2D-array.
Example 1:
Input: [[1,2], [1,3], [2,3]] Output: [2,3] Explanation: The given directed graph will be like this: 1 / \ v v 2-->3
Example 2:
Input: [[1,2], [2,3], [3,4], [4,1], [1,5]] Output: [4,1] Explanation: The given directed graph will be like this: 5 <- 1 -> 2 ^ | | v 4 <- 3
Note:
The size of the input 2D-array will be between 3 and 1000.
Every integer represented in the 2D-array will be between 1 and N, where N is the size of the input array.