HTTP vs AJP in Apache and Tomcat | The Coding Craftsman



HTTP vs AJP in Apache and Tomcat | The Coding Craftsman

HTTP

  • Simple
  • Already provided by all servers
  • Easily configured in Apache
  • The most likely target of a security breach

AJP

  • Well supported
  • More compact a protocol
  • Easily configured in Apache and Tomcat

AJP looks slightly ahead for two reasons:

  • You're not opening up HTTP on Tomcat (in fact you should close it for security)
  • As it's a more compact protocol, there's less traffic between the front server and the back server

Except… if the front server is on the same network segment, or even on the same machine, then there's no advantage from the compactness of the protocol unless you have a huge number of tiny requests to serve.

Similarly, if the whole server is not visible to the whole internet because it's behind a load balancer, then it doesn't matter whether http is open on Tomcat as nobody could get there to use it.

Simple is better, so I declare HTTP the winner.


Read full article from HTTP vs AJP in Apache and Tomcat | The Coding Craftsman


Start an Apache Web Server in Mac OS X Mavericks & Mountain Lion



Start an Apache Web Server in Mac OS X Mavericks & Mountain Lion

Setting Up and Starting the Apache Web Server in OS X

Versions of OS X prior to Mountain Lion and Mavericks can simply turn on "Web Sharing", but from 10.8 and 10.9 onward you'll need to do the following to use a local web server:

  • Launch Terminal, located in /Applications/Utilities/
  • Type the following command, replacing USERNAME with the user account short name:
  • nano /etc/apache2/users/USERNAME.conf

  • Enter the admin password when requested, then paste the following into the nano text editor:

Read full article from Start an Apache Web Server in Mac OS X Mavericks & Mountain Lion


How Apache Hadoop YARN HA Works | Cloudera Engineering Blog



How Apache Hadoop YARN HA Works | Cloudera Engineering Blog

YARN, the next-generation compute and resource management framework in Apache Hadoop, until recently had a single point of failure: the ResourceManager, which coordinates work in a YARN cluster. With planned (upgrades) or unplanned (node crashes) events, this central service, and YARN itself, could become unavailable.

This post details Cloudera's recent work in the Hadoop community (YARN-149) to make the ResourceManager (and thus YARN) highly available. We'll also explain the high-level design for how the ResourceManager's active state is preserved (state store), and how ResourceManagers fail-over to achieve high availability (HA). We also outline how to deploy an HA cluster, and review some important configuration options.

Background

CDH 5 (and Apache Hadoop 2.x) ship YARN as the vehicle to manage cluster resources, and share the said resources among compute frameworks like MapReduce, Impala, and Apache Spark. In previous posts, you have learned the high-level architecture of YARN, and how to migrate from MR1 to MR2/YARN for users and cluster admins.

To briefly re-cap, YARN has a master/worker architecture (see below); the master (the ResourceManager) manages the resources on the workers and schedules work in the cluster. Furthermore, the ResourceManager handles all client interactions.


Read full article from How Apache Hadoop YARN HA Works | Cloudera Engineering Blog


漫谈大数据之一:怎样才算大数据?(转) | Hello, Data!



漫谈大数据之一:怎样才算大数据?(转) | Hello, Data!

谷歌趋势可以看到,大数据作为一个buzzword,是从2011年声名鹊起的。对这波趋势,中国跟进并不慢,旋即2012年被称作中国的大数据元年。其中两本书功不可没:前有涂子沛先生的《大数据》,从美国政府的数据信仰、政策和实践娓娓道来,让中国政坛和知识精英接受了一次思维洗礼,汪洋副总理离任广东前一系列开风气之先的大数据举措,当属此书之功;年末维克托.迈尔.舍恩伯格先生的《大数据时代》,则是系统论述大数据理念的奠基之作。如果说前者着力于发蒙―大数据可以做什么,后者则注重解惑―大数据该怎么做。

中国做事大气魄。原著为英文的《大数据时代》美国读者尚在翘首以盼,中文版在2012年末就摆上了国内读者的书架,原来是乘舍恩伯格先生参加云世界大会不失时机宣传。在年末年初的喧闹中,大数据产业园、大数据日、大数据专委会、大数据专业、大数据实验室和各种大数据峰会接踵而来。物联网和大数据、云计算和大数据彼此抱团取暖,来抵消决策者对层出不穷新概念的审美疲劳。其实,大数据还只是在民间热。相比起物联网和云计算,它在国家最高层面上获得的关注和实质支持还颇有不如,甚至美国政府都走在了前面:后者在2012年3月发布《大数据研究和发展倡议》,6个部门投资超过两亿美金推动相关研究。两亿美金对于工信部和科技部来说是毛毛雨,按兵不动是什么原因?

根据在下与工信部官员和智库的一些交流,我感觉决策者还存在很多疑惑:大数据究竟是什么新玩意儿?与以前的数据库、数据仓库、数据挖掘和商业智能有什么区别?市场有多大?中国应该重点发展什么?竞争优势和劣势在哪里?每每官员们在台上指点江山、大谈大数据战略云云(据在下目测,基本内容都来自2011年麦肯锡的《大数据:创新、竞争和生产力的下一个前沿》和2012年达沃斯的《大数据,大影响:全球发展的新可能》),台下一见专家就虚心请教大数据新在什么地方。

在下不揣浅陋,打算把对大数据的认识写下来,对大数据做一个深度的、非主流甚至是另类的解读。

当然从基本概念说起。大数据4个V:Volume(体量大),Velocity(快速化),Variety(类型杂),Value(价值大)。关于前3个V,很多人以讹传讹说是IBM首创的,其实是METAGroup(现为Gartner的一部分)的一个分析师Doug Laney早在2001年提出的(这位老兄专门写了一个博文吐槽他人冒功)。当然,IBM也不是全无贡献,它去掉了Value,加上Veracity(真实性),也算是自成一派。而其它公司只能暗恨字典里V字头的单词太少。

今天就从体量大说起

大数据有多大?―业界巨擘自我实现的预言?

IDC对于每年创建和复制的信息之体量做了预测:2011年1.8ZB(ZB有多大,可以戳这里),2012年2.8ZB,按照每两年翻一番(摩尔定律是一切类似规律的滥觞)的速度,2020年达到40ZB。这个数据怎么算出来的?IDC秘而不宣。这个研究是在EMC赞助下的,EMC笑而不语。如果说对静态数据(data at rest)体量的预测有助于存储的销售,动态数据(data in motion)的体量无疑跟网络需求绑在一起。于是Cisco一个类似的预测说道:2016年数据移动的总量达到1.3ZB。其实所有这些数据加起来都不如谷歌Eric Schmidt的说法有感染力:从人类文明曙光到2003年数以万计的时间长河里人类一共产生了5EB(天知道他怎么算出来的),而到2010年每两天人类就能产生5EB的数据(这个有可能是从IDC的数据里推知的)。

这是不是业界巨擘们自我实现的预言?我觉得是。克里斯.安德森2008年在《连线》做了个专题"拍字节时代(The Petabyte Age)",显然作为数字时代预言家的老安胆子不够大。

数据总量的增长主要归功于非结构化数据的增长。广义的非结构化数据也包括了半结构化和多结构化数据,目前普遍被认为占到总量的85%以上,而且增速比结构化数据快得多(有说法是快10-50倍)。低信息密度的非结构化数据是大数据的一大挑战,以后在Variety这一专题中会细细阐述。挑战就是机会,业界巨擘们创造了很多新的概念来迎接非结构化数据,NoSQL数据库是其中最亮丽的一个。对此,数据库界的老法师Mike Stonebraker对此耿耿于怀,不惜力推"血统"更纯正的NewSQL数据库;Sybase的CTO Irfan Khan甚至说大数据(这个新概念)根本就是个大谎言,声称他们的数据仓库工具早就能分析包括非结构化数据在内的大数据。

这类总量数据的预测,对于存储和网络企业的投资者来说,无疑能提升信心,但对其他人来说,没有太大意义。他们更关心的是个体行业、企业甚至个人数据的状况。

麦肯锡对大数据的定义就是从个体数据集的大体量入手的:大数据是指那些很大的数据集,大到传统的数据库软件工具已经无法采集、存储、管理和分析。传统数据库有效工作的数据大小一般来说在10-100TB,因此10-100TB通常成为大数据的门槛。无独有偶,IDC在给大数据做定义时也把阈值设在100TB(它同时也给出了velocity和variety的量化指标,以后再表)。其实这种方法未必科学,对于非结构化数据的存储来说,本来就跟数据库无关,而且传统文件系统能够处理的数据量往往受限于元数据而非原始数据大小,因此能处理的上限要比数据库要高。不管怎样,有一个简单明晰的数值来指导企业大数据的判断,总是好事。


Read full article from 漫谈大数据之一:怎样才算大数据?(转) | Hello, Data!


Redis作者:深度剖析Redis持久化 | Hello, Data!



Redis作者:深度剖析Redis持久化 | Hello, Data!

Redis是一种面向"key-value"类型数据的分布式NoSQL数据库系统,具有高性能、持久存储、适应高并发应用场景等优势。它虽然起步较晚,但发展却十分迅速。

近日,Redis的作者在博客中写到,他看到的所有针对Redis的讨论中,对Redis持久化的误解是最大的,于是他写了一篇长文来对Redis的持久化进行了系统性的论述。文章主要包含三个方面:Redis持久化是如何工作的、这一性能是否可靠以及和其它类型的数据库比较。以下为文章内容:

一、Redis持久化是如何工作的?

什么是持久化?简单来讲就是将数据放到断电后数据不会丢失的设备中,也就是我们通常理解的硬盘上。首先我们来看一下数据库在进行写操作时到底做了哪些事,主要有下面五个过程

  • 客户端向服务端发送写操作(数据在客户端的内存中)。
  • 数据库服务端接收到写请求的数据(数据在服务端的内存中)。
  • 服务端调用write这个系统调用,将数据往磁盘上写(数据在系统内存的缓冲区中)。
  • 操作系统将缓冲区中的数据转移到磁盘控制器上(数据在磁盘缓存中)。
  • 磁盘控制器将数据写到磁盘的物理介质中(数据真正落到磁盘上)。

故障分析

写操作大致有上面5个流程,下面我们结合上面的5个流程看一下各种级别的故障

  • 当数据库系统故障时,这时候系统内核还是完好的。那么此时只要我们执行完了第3步,那么数据就是安全的,因为后续操作系统会来完成后面几步,保证数据最终会落到磁盘上。
  • 当系统断电时,这时候上面5项中提到的所有缓存都会失效,并且数据库和操作系统都会停止工作。所以只有当数据在完成第5步后,才能保证在断电后数据不丢失

通过上面5步的了解,可能我们会希望搞清下面一些问题

  • 数据库多长时间调用一次write,将数据写到内核缓冲区?
  • 内核多长时间会将系统缓冲区中的数据写到磁盘控制器?
  • 磁盘控制器又在什么时候把缓存中的数据写到物理介质上?

对于第一个问题,通常数据库层面会进行全面控制。而对第二个问题,操作系统有其默认的策略,但是我们也可以通过POSIX API提供的fsync系列命令强制操作系统将数据从内核区写到磁盘控制器上。对于第三个问题,好像数据库已经无法触及,但实际上,大多数情况下磁盘缓存是被设置关闭的,或者是只开启为读缓存,也就是说写操作不会进行缓存,直接写到磁盘。建议的做法是仅仅当你的磁盘设备有备用电池时才开启写缓存

数据损坏

所谓数据损坏,就是数据无法恢复,上面我们讲的都是如何保证数据是确实写到磁盘上去,但是写到磁盘上可能并不意味着数据不会损坏。比如我们可能一次写请求会进行两次不同的写操作,当意外发生时,可能会导致一次写操作安全完成,但是另一次还没有进行。如果数据库的数据文件结构组织不合理,可能就会导致数据完全不能恢复的状况出现。

这里通常也有三种策略来组织数据,以防止数据文件损坏到无法恢复的情况

  • 第一种是最粗糙的处理,就是不通过数据的组织形式保证数据的可恢复性。而是通过配置数据同步备份的方式,在数据文件损坏后通过数据备份来进行恢复。实际上MongoDB在不开启操作日志,通过配置Replica Sets时就是这种情况。
  • 另一种是在上面基础上添加一个操作日志,每次操作时记一下操作的行为,这样我们可以通过操作日志来进行数据恢复。因为操作日志是顺序追加的方式写的,所以不会出现操作日志也无法恢复的情况。这也类似于MongoDB开启了操作日志的情况。
  • 更保险的做法是数据库不进行旧数据的修改,只是以追加方式去完成写操作,这样数据本身就是一份日志,这样就永远不会出现数据无法恢复的情况了。实际上CouchDB就是此做法的优秀范例。

1、Redis的第一个持久化策略:RDB快照

Redis支持将当前数据的快照存成一个数据文件的持久化机制。而一个持续写入的数据库如何生成快照呢。Redis借助了fork命令的copy on write机制。在生成快照时,将当前进程fork出一个子进程,然后在子进程中循环所有的数据,将数据写成为RDB文件。

我们可以通过Redis的save指令来配置RDB快照生成的时机,比如你可以配置当10分钟以内有100次写入就生成快照,也可以配置当1小时内有1000次写入就生成快照,也可以多个规则一起实施。这些规则的定义就在Redis的配置文件中,你也可以通过Redis的CONFIG SET命令在Redis运行时设置规则,不需要重启Redis。

Redis的RDB文件不会坏掉,因为其写操作是在一个新进程中进行的,当生成一个新的RDB文件时,Redis生成的子进程会先将数据写到一个临时文件中,然后通过原子性rename系统调用将临时文件重命名为RDB文件,这样在任何时候出现故障,Redis的RDB文件都总是可用的。

同时,Redis的RDB文件也是Redis主从同步内部实现中的一环。

但是,我们可以很明显的看到,RDB有它的不足,就是一旦数据库出现问题,那么我们的RDB文件中保存的数据并不是全新的,从上次RDB文件生成到 Redis停机这段时间的数据全部丢掉了。在某些业务下,这是可以忍受的,我们也推荐这些业务使用RDB的方式进行持久化,因为开启RDB的代价并不高。 但是对于另外一些对数据安全性要求极高的应用,无法容忍数据丢失的应用,RDB就无能为力了,所以Redis引入了另一个重要的持久化机制:AOF日志。

2、Redis的第二个持久化策略:AOF日志

AOF日志的全称是Append Only File,从名字上我们就能看出来,它是一个追加写入的日志文件。与一般数据库不同的是,AOF文件是可识别的纯文本,它的内容就是一个个的Redis标准命令。比如我们进行如下实验,使用Redis2.6 版本,在启动命令参数中设置开启AOF功能:


Read full article from Redis作者:深度剖析Redis持久化 | Hello, Data!


Base64原理与实现 - 可笑痴狂 - 博客园



Base64原理与实现 - 可笑痴狂 - 博客园

Base64编码说明
  Base64编码要求把3个8位字节(3*8=24)转化为4个6位的字节(4*6=24),之后在6位的前面补两个0,形成8位一个字节的形式。 如果剩下的字符不足3个字节,则用0填充,输出字符使用'=',因此编码后输出的文本末尾可能会出现1或2个'='。

  为了保证所输出的编码位可读字符,Base64制定了一个编码表,以便进行统一转换。编码表的大小为2^6=64,这也是Base64名称的由来。

转码过程举例:
  3*8=4*6
  内存1个字符占8位
  转前: s 1 3
  先转成ascii:对应 115 49 51
  2进制: 01110011 00110001 00110011
  6个一组(4组) 011100110011000100110011
  然后才有后面的 011100 110011 000100 110011
  然后计算机是8位8位的存数 6不够,自动就补两个高位0了
  所有有了 高位补0
  科学计算器输入 00011100 00110011 00000100 00110011
  得到 28 51 4 51
  查对下照表 c z E z

Read full article from Base64原理与实现 - 可笑痴狂 - 博客园


Isotonic Regression | Algorithms Notes



Isotonic Regression | Algorithms Notes

Given a sequence A of n integers, design an algorithm to find an non-decreasing sequence B of n integers, such that \sum_{i=1}^n |A_i - B_i| is minimized.

Solution:

This problem can be solved by minimum cost flow. Here we use dynamic programming.

Let F(k, z) be the optimal objective value for the first k elements with B[k] = z. Then, we have a recurrence relation: F(k, z) = min F(k-1, x) + |zA[k]|. Thus, the problem becomes how to efficiently find the z minimizing F(k, z) for all fixed k. It can be showed that F(k, z) is a piecewise linear convex function in z and the minimizing value must be in A. Thus, the dynamic programming can be speed up by using slope optimization and find the solution in O(n lg n) time.


Read full article from Isotonic Regression | Algorithms Notes


Class Design - S.O.L.I.D | miafish



Class Design – S.O.L.I.D | miafish

SRP The Single Responsibility Principle A class should have one, and only one, reason to change. OCP The Open Closed Principle You should be able to extend a classes behavior, without modifying it. LSP The Liskov Substitution Principle Derived classes must be substitutable for their base classes. ISP The Interface Segregation Principle Make fine grained interfaces that are client specific. DIP The Dependency Inversion Principle Depend on abstractions, not on concretions.

Read full article from Class Design – S.O.L.I.D | miafish


Open/Close Principle | miafish



Open/Close Principle | miafish

The Open Close Principle states that the design and writing of the code should be done in a way that new functionality should be added with minimum changes in the existing code. The design should be done in a way to allow the adding of new functionality as new classes, keeping as much as possible existing code unchanged.

Software entities like classes, modules and functions should be open for extension but closed for modifications.

Based only on the schema above, one can deduce that any class directly using another class would actually violate the Open/Closed Principle. And that is right, strictly speaking. I found it quite interesting to find the limits, the moment when you draw the line and decide that it is more difficult to respect OCP than modify existing code, or the architectural cost does not justify the cost of changing existing code.


Read full article from Open/Close Principle | miafish


Liskov Substitution Principle | miafish



Liskov Substitution Principle | miafish

what is Liskov Substitution principle

Child classes should never break the parent class' type definitions. In other way, sub-types must be substitutable for their base types. As simple as that, a subclass should override the parent class' methods in a way that does not break functionality from a client's point of view

Rectangle and Square Examples:


Read full article from Liskov Substitution Principle | miafish


The Interface Segregation Principle | miafish



The Interface Segregation Principle | miafish

The interface-segregation principle (ISP) states that no client should be forced to depend on methods it does not use.

Work example:

there are worker and super worker, they both eat and work. so the class will be defined like this way.


Read full article from The Interface Segregation Principle | miafish


Redis and its C# implementation | miafish



Redis and its C# implementation | miafish

When we use Redis?

Redis is a fantastic choice if you want a highly scalable data store shared by multiple processes, multiple applications, or multiple servers. As just an inter-process communication mechanism it is tough to beat(such as dictionary). The fact that you can communicate cross-platform, cross-server, or cross-application just as easily makes it a pretty great choice for many use cases. Its speed also makes it great as a caching layer.

How does Redis work?

Redis is key value cache and store. why it is highly scalable and can be shared by multiple processes, multiple applications or multiple servers? Let us see the design of Redis.

Redis Cluster data sharding

hash slot: there are 16384 hash slots in Redis Cluster.

Eg: Every node in a Redis Cluster is responsible of a subset of the hash slots, so for example you may have a cluster with 3 nodes, where:

  • Node A contains hash slots from 0 to 5500.
  • Node B contains hash slots from 5501 to 11000.
  • Node C contains hash slots from 11001 to 16384.

Redis Cluster mater-slave model

when the cluster is created (or at a latter time) we add a slave node to every master, so that the final cluster is composed of A, B, C that are masters, and A1, B1, C1 that are slaves, the system is able to continue if node B fails(B1 will become master and has copy data of B).

Internal work

A,B,C nodes. B with foo inside

  1. Client => A : Cluster Hints
  2. A => Client: a map of hash slots with nodes
  3. Client => B: get foo
  4. B => Client: bar

Fault tolerance

A guesses B is failing, as the latest PING request timed out. A will not take any action without any other hint.

C sends a PONG to A. with the gossip section containing  about B; C also think B is failing.

At this point  A marks B as failed, and notifies the information to all the other nodes in the cluster, that will mark the node as failing.

If B will ever return back, the first time he will PING any nodes of the cluster. It will be notified to shut down ASAP. As intermitting clients are not good for the clients.

Note: only way to rejoin a Redis cluster after massive crash is : Redis-trib by hand

Redis Partitioning

  • Redis Cluster (internal system)
  • Twemproxy (mid layer)
  • Clients supporting consistent hashing (client side)

How Redis expires keys

Redis keys are expired in two ways: a passive way, and an active way.

A key is actively expired simply when some client tries to access it, and the key is found to be timed out.

Of course this is not enough as there are expired keys that will never be accessed again. This keys should be expired anyway, so periodically Redis test a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.

Specifically this is what Redis does 10 times per second:

  1. Test 100 random keys from the set of keys with an associated expire.
  2. Delete all the keys found expired.
  3. If more than 25 keys were expired, start again from step 1.

This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%

This means that at any given moment the maximum amount of keys already expired that are using memory is at max equal to max amount of write operations per second divided by 4.


Read full article from Redis and its C# implementation | miafish


The Dependency Inversion Principle | miafish



The Dependency Inversion Principle | miafish

what is the dependency inversion principle?

A. High-level modules should not depend on low-level modules. Both should depend on abstractions.
B. Abstractions should not depend upon details. Details should depend upon abstractions.


Read full article from The Dependency Inversion Principle | miafish


Dependency Injection Pattern | miafish



Dependency Injection Pattern | miafish

James Shore has said that "Dependency Injection" is a 25-dollar term for a 5-cent concept. yeah, it is really simple concept.

What is for?

In order to solve one of principles in S.O.L.I.D – Inversion of Control.

What is it?

Dependency injection is basically providing the objects that an object needs (its dependencies) instead of having it construct them itself.


Read full article from Dependency Injection Pattern | miafish


Big table | miafish



Big table | miafish

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

Map: Big Table is a collection of (key, value)pairs where the key identifies a row and the value is the set of columns.

Persistent: the data is stored persistently on disk.

Distributed: Big table's data is distributed among many independent machines. The table is broken up among rows, with groups of adjacent rows managed by a server. A row itself is never distributed.

Sparse: the table is sparse, meaning that different rows in a table may use different columns, with many of the columns empty for a particular row.

multidimensional: A table is indexed by rows. Each row contains one or more named column families. Column families are defined when the table is first created. Within a column family, one may have one or more named columns. All data within a column family is usually of the same type. The implementation of BigTable usually compresses all the columns within a column family together. Columns within a column family can be created on the fly. Rows, column families and columns provide a three-level naming hierarchy in identifying data. For example:

edu.rutgers.cs" : { // row "users" : { // column family "watrous": "Donald", // column "hedrick": "Charles", // column "pxk" : "Paul" // column } "sysinfo" : { // another column family "" : "SunOS 5.8" // column (null name) } }

To get data from BigTable, you need to provide a fully-qualified name in the form column-family:column. For example, users:pxk or sysinfo:. The latter shows an null column name.

time-based: time is another dimension in Big Table data. Every column family may keep multiple versions of column family data. if an application does not specify a time-stamp. it will retrieve the latest version of the column family. Alternatively, it can specify a time-stamp and get the latest version that is earlier than or equal to that time-stamp.

Data Model

The map is indexed by a row key, column key and a time stamp.


Read full article from Big table | miafish


Lamport & Vector Clocks | miafish



Lamport & Vector Clocks | miafish

They are two algorithms used to determine the order of events in a distributed computer system.

Why do we need it?

Now, imagine process 1 sends a message to the disk asking for access to write, and then sends a message to process 2 asking it to read. Process 2 receives the message, and as a result sends its own message to the disk. Now, due to some timing delay, the disk receives both messages at the same time: how does it determine which message happened-before the other?

two synchronized servers that write timestamps to the same system. If one server falls behind by even a few milliseconds, it would quickly be impossible to know the actual order of events.

Lamport Timestamps

It follows some simple rules:

  • A process increments its counter before each event in that process;
  • When a process sends a message, it includes its counter value with the message;
  • On receiving a message, the receiver process sets its counter to be greater than the maximum of its own value and the received value before it considers the message received.

lamport clocks

if  A -> B, then L(A) < L(B). so we know who happened first.

Vector clock

It follows rules:

  • Initially all clocks are zero.
  • Each time a process experiences an internal event, it increments its own logical clock in the vector by one.
  • Each time a process prepares to send a message, it sends its entire vector along with the message being sent.
  • Each time a process receives a message, it increments its own logical clock in the vector by one and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element).

Read full article from Lamport & Vector Clocks | miafish


Dynamo DB | miafish



Dynamo DB | miafish

Techniques Used in Dynamo Systems

Problem Technique Advantage
dataset partitioning Consistent Hashing Incremental, possibly linear scalability in proportion to the number of collaborating nodes.
highly available writes Vector Clock or Dotted-Version-Vector Sets, reconciliation during reads Version size is decoupled from update rates.
handling temporary failures Sloppy Quorum and Hinted Handoff Provides high availability and durability guarantee when some of the replicas are not available.
recovering from permanent failures anti-entropy using Merkle tree Can be used to identify differences between replica owners and synchronize divergent replicas pro-actively.
membership and failure detection gossip-based membership protocol and failure detection Avoids having a centralized registry for storing membership and node liveness information, preserving symmetry.

Consistent hashing: See memcached, they used the same hashing algorithm.

Vector Clock: see here for more information.

Sloppy Quorum and Hinted handoff:

In this method during writes if we find that the destination nodes are down we store a "hint" of the updated value on one of the alive nodes. Then when these down nodes come back up the "hints" are pushed to them thereby making the data consistent

anti-entropy using Merkle tree:

Each node maintains a separate Merkle tree for each key range (the set of keys covered by a virtual node) it hosts. This allows nodes to compare whether the keys within a key range are up-to-date. In this scheme, two nodes exchange the root of the Merkle tree corresponding to the key ranges that they host in common. Subsequently, using the tree traversal scheme described above the nodes determine if they have any differences and perform the appropriate synchronization action. The disadvantage with this scheme is that many key ranges change when a node joins or leaves the system thereby requiring the tree(s) to be recalculated

gossip-based membership protocol:


Read full article from Dynamo DB | miafish


System Scalability notes | miafish



System Scalability notes | miafish

1. server scale: every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory.

way to do: this user-related data need to be stored in a centralized data store which is accessible to all your application servers. It can be an external database or an external persistent cache, like Redis

2. Database scale: if there is no much join in query, try NOSql.

way to do: more database servers, sharding, change to NOSql.

3. cache: do not touch database if you can

A cache is a simple key-value store and it should reside as a buffering layer between your application and your data storage. Whenever your application has to read data it should at first try to retrieve the data from your cache. Only if it's not in the cache should it then try to get the data from the main data source

4. cache object instead of database query

when your class has finished the "assembling" of the data array, directly store the data array, or better yet the complete instance of the class, in the cache! This allows you to easily get rid of the object whenever something did change and makes the overall operation of your code faster and more logical.

5. async: implement it as much as you can

Hadoop would be a good desgin

  • Can't use just one database. Use many databases, partitioned horizontally and vertically.
  • Because of partitioning, forget about referential integrity or cross-domain JOINs.
  • Forget about 100% data integrity.
  • At large scale, cost is a problem: hardware, databases, licenses, storage, power.
  • Once you're large, spammers and data-scrapers come a-knocking.
  • Cache!
  • Use asynchronous flows.
  • Reporting and analytics are challenging; consider them up-front when designing the system.
  • Expect the system to fail.
  • Don't underestimate your growth trajectory.

Read full article from System Scalability notes | miafish


Find number of bits to be flipped to get maximum 1’s in array | miafish



Find number of bits to be flipped to get maximum 1's in array | miafish

We have a bit array like below

{1 0 1 0 0 1 0 1}  

Number of bits in above array is 8

If we take range from [1,5] then number of bits in [1,5] range is [0 1 0 0 1].
If we flip this range then after flipping it will be [ 1 0 1 1 0]
So total number of 1's after flipping [1,5] range is [1 1 0 1 1 0 0 1] = 5

If we take range from [1,6] then number of bits in [1,6] range is [0 1 0 0 1 0].
If we flip this range then after flipping it will be [ 1 0 1 1 0 1]
So total number of 1's after flipping [1,5] range is [1 1 0 1 1 0 1 1] = 6

So the answer is range [1,6] and after flipping we can get 6 1's in array


Read full article from Find number of bits to be flipped to get maximum 1's in array | miafish


Codility Challenges: Future training - TreeHeight



Codility Challenges: Future training - TreeHeight

Here is another codility problem solution from the codility lessons (TreeHeight -Compute the height of a binary tree.) due to the copy rights I can't copy the content of the problem here so to view the problem description click here.

Read full article from Codility Challenges: Future training - TreeHeight


Time Complexity of Loop with Powers - GeeksforGeeks



Time Complexity of Loop with Powers - GeeksforGeeks

What is the time complexity of below function?

void fun(int n, int k)
{
    for (int i=1; i<=n; i++)
    {
      int p = pow(i, k);
      for (int j=1; j<=p; j++)
      {
          // Some O(1) work
      }
    }
}

Time complexity of above function can be written as 1k + 2k + 3k + … n1k.

Let us try few examples:

k=1  Sum = 1 + 2 + 3 ... n       = n(n+1)/2       = n2 + n/2    k=2  Sum = 12 + 22 + 32 + ... n12.      = n(n+1)(2n+1)/6      = n3/3 + n2/2 + n/6    k=3  Sum = 13 + 23 + 33 + ... n13.      = n2(n+1)2/4      = n4/4 + n3/2 + n2/4       

In general, asymptotic value can be written as (nk+1)/(k+1) + Θ(nk)

Note that, in asymptotic notations like Θ we can always ignore lower order terms. So the time complexity is Θ(nk+1 / (k+1))


Read full article from Time Complexity of Loop with Powers - GeeksforGeeks


Java 理论与实践: JDK 5.0 中更灵活、更具可伸缩性的锁定机制



Java 理论与实践: JDK 5.0 中更灵活、更具可伸缩性的锁定机制

ReentrantLock 类

java.util.concurrent.lock 中的 Lock 框架是锁定的一个抽象,它允许把锁定的实现作为 Java 类,而不是作为语言的特性来实现。这就为 Lock 的多种实现留下了空间,各种实现可能有不同的调度算法、性能特性或者锁定语义。 ReentrantLock 类实现了 Lock ,它拥有与 synchronized 相同的并发性和内存语义,但是添加了类似轮询锁、定时锁等候和可中断锁等候的一些特性。此外,它还提供了在激烈争用情况下更佳的性能。(换句话说,当许多线程都想访问共享资源时,JVM 可以花更少的时候来调度线程,把更多时间用在执行线程上。)

reentrant 锁意味着什么呢?简单来说,它有一个与锁相关的获取计数器,如果拥有锁的某个线程再次得到锁,那么获取计数器就加1,然后锁需要被释放两次才能获得真正释放。这模仿了 synchronized 的语义;如果线程进入由线程已经拥有的监控器保护的 synchronized 块,就允许线程继续进行,当线程退出第二个(或者后续) synchronized 块的时候,不释放锁,只有线程退出它进入的监控器保护的第一个 synchronized 块时,才释放锁。


Read full article from Java 理论与实践: JDK 5.0 中更灵活、更具可伸缩性的锁定机制


生产者/消费者问题_2 | 指尖的舞客



生产者/消费者问题_2 | 指尖的舞客

生产者/消费者问题_1中,我们使用wait()/notify()方法实现了这个多线程通信模型,其中借助了synchronized关键字实现。本篇主要阐述使用可重入锁(ReentrantLock)来实现生产者和消费者之间的同步。ReentrantLock实现了Lock接口,是JDK5.0新增的java类,并没有作为一种语言特性来实现,可能是考虑到扩展性问题。

ReentrantLock与synchronized关键字的关系比较微妙,都是用于线程间的同步问题。区别在于:当线程间争用频繁发生时,ReentrantLock的同步开销比synchronized要低得多;ReentrantLock是新增的接口,在低于5.0版本的jkd中不兼容;ReetrantLock是新增类,且使用常因为在忘记unlock而出错,相反编码人员对synchronized关键字比较熟悉,不易出错。事实上,在较新的jvm实现中对同步(synchronized)的优化比较理想,只要线程间的争用不是特别频繁,那么同步(synchroized)的开销是比较小的,所以在使用ReentrantLock和synchronized关键字时,分析下使用需求,就能很快做出选择了。(对于两者的理解和比较,请参考:http://blog.csdn.net/fw0124/article/details/6672522)


Read full article from 生产者/消费者问题_2 | 指尖的舞客


Unbounded Searching Problem | Algorithms Notes



Unbounded Searching Problem | Algorithms Notes

Let A[n] be an array with n elements sorted in ascending order. It is simple to construct an O(n lg n) algorithm to find the position k in A[n] for a given value v. Assume that k is much less than n (i.e., k << n). Write an O(lg k) time algorithm to search for k.
(Note: you do not know the value of k in advance, only v is known)

Solution:

This is an application of binary search. Although we do not know k in advance, we can find the range of k. That is, find an integer i such that 2^ik < 2^{i+1}. Using linear search to find i costs O(i) = O(lg k). Then applying the binary search on the subarray A[2^i]~A[2^{i+1}-1] requires O(\lg (2^{i+1} - 2^i)) = O(i) = O(lg k) time, so the total time complexity is O(lg k).

This idea can also be generalized to higher dimension space.

解法

這是一個二分搜尋法的應用。雖然事先不知道k的大小,但是可以利用演算法找出k的範圍,也就是找一個整數i使得2^i ≤ k < 2^{i+1}。利用線性搜尋只需要花O(i) = O(lg k)的時間。接著只要在子陣列A[2^i]~A[2^{i+1}-1]中做二分搜尋即可,需要花O(\lg (2^{i+1} - 2^i)) = O(i) = O(lg k)時間,所以整體時間還是O(lg k)。


Read full article from Unbounded Searching Problem | Algorithms Notes


Saddleback Search | Algorithms Notes



Saddleback Search | Algorithms Notes

M is an n * n integer matrix in which the entries of each row are in increasing order (reading left to right) and the entries in each column are in increasing order (reading top to bottom). Give an efficient algorithm to find the position of an integer x in M, or determine that x is not there. Tell how many comparisons of x with matrix entries your algorithm does in the worst case.

Solution:

If M[i][j] > x, then M[i][k] ≠ x for all j ≤ kn and M[k][j] ≠ x for all 1 ≤ ki. Likewise, M[i][j] < x reveals corresponding information. Hence, let i = 1 and j = n initially, if M[i][j] = x, report it; if M[i][j] > x or M[i][j] < x, compare x with M[i][j-1] or M[i+1][j] respectively in the next iteration, unless j-1 < 1 or i+1 > n, which implies that x is not in M. Since the increments and decrements of i and j are O(n), the total running time is O(n).

For m time n sorted matrix, finding an element can be done in O(m lg (n / m)) [1], where m is the number of rows and n is the number of columns, m ≤ n. First do binary in the middle row M[mid] to find the index i, such that M[mid][i] < x < M[mid][i+1]. Then, the submatrix in the top-left of M[mid][i] and the submatrix in the bottom-right of M[mid][i+1] cannot have x. We can recursively solve the problem in the remaining two submatrices.

Note: This technique can be generalized to 3D [2]

[1] Richard S. Bird, "Improving Saddleback Search: A Lesson in Algorithm Design," Mathematics of Program Construction Lecture Notes in Computer Science Volume 4014, 2006, pp 82-89

[2] N Linial, M Saks, "Searching ordered structures," Journal of Algorithms Volume 6, Issue 1, March 1985, Pages 86–103

解法

如果 M[i][j]大於x,那M[i][k]不等於x對於所有j ≤ k ≤ nM[k][j]不等於x對於所有1 ≤ k ≤ i。相同的,如果M[i][j]小於x也能得到類似的資訊。因此,令i為1、jn,如果M[i][j]等於x,那就找到了解答。如果M[i][j]大於或是小於x,在下一輪中就要拿xM[i][j-1]或M[i+1][j]來比較,除非j-1小於1或是i+1大於n,這代表x不在M中。因為ij頂多只能被遞增或是遞減O(n)次,所以時間複雜度為O(n)。


Read full article from Saddleback Search | Algorithms Notes


The Duplicate Detection Problem | Algorithms Notes



The Duplicate Detection Problem | Algorithms Notes

Given an integer array A of length n, such that each integer is in the range [1, n-1]. Array A is read-only. Design an linear time algorithm to find one duplicate element using O(1) variables.

Solution:

Since each element has a value in [1, n-1], we can consider each element as a node in the linked list, that is, element i points to element A[i]. Since no node can point to node n, we can consider node n as the head of this linked list. If node i is pointed by more than one element, then i is a duplicate value in the array. Thus, we can use classical cycle detection algorithm to find the head of the cycle, which is a duplicate element.

解法

因為每個元素的值都在[1, n-1]之中,我們可以把每個元素當作是一個串列的結點,意即元素i指向元素A[i]。 因為沒有一個元素可以指向結點n,我們可以把節點n視為串列的頭部。此外,如果節點i被超過一個以上的結點所指向的話,那就代表i在陣列中是重複的元素。因此,我們可以使用迴圈偵測演算法來找出迴圈的頭部,此元素必為陣列中的重複元素。


Read full article from The Duplicate Detection Problem | Algorithms Notes


The Duplicate Detection Problem | Algorithms Notes



The Duplicate Detection Problem | Algorithms Notes

Given an integer array A of length n, such that each integer is in the range [1, n-1]. Array A is read-only. Design an linear time algorithm to find one duplicate element using O(1) variables.

Solution:

Since each element has a value in [1, n-1], we can consider each element as a node in the linked list, that is, element i points to element A[i]. Since no node can point to node n, we can consider node n as the head of this linked list. If node i is pointed by more than one element, then i is a duplicate value in the array. Thus, we can use classical cycle detection algorithm to find the head of the cycle, which is a duplicate element.

解法

因為每個元素的值都在[1, n-1]之中,我們可以把每個元素當作是一個串列的結點,意即元素i指向元素A[i]。 因為沒有一個元素可以指向結點n,我們可以把節點n視為串列的頭部。此外,如果節點i被超過一個以上的結點所指向的話,那就代表i在陣列中是重複的元素。因此,我們可以使用迴圈偵測演算法來找出迴圈的頭部,此元素必為陣列中的重複元素。


Read full article from The Duplicate Detection Problem | Algorithms Notes


Shortest Subarray with Sum is at least k | Algorithms Notes



Shortest Subarray with Sum is at least k | Algorithms Notes

Given an integer array A of length n and a number k, design a linear time algorithm to find the shortest subarray whose sum is at least k.

Solution:

Let P[i] be the sum of A[1..i]. For an ending index e, we want to compute the largest index b, such that the sum of A[b..e] = P[e+1] – P[b] is at least k. If there exists i and j, such that i < j and P[i] > P[j], then i cannot be the answer for any ending index at least j. Hence, we can maintain a queue q initialized empty. The queue stores indices increasing by position and also by P value. When processing A[i], dequeue all elements x, such that P[i+1] – P[x] is at least k. The last dequeued element is the optimal solution ending at i and we can update the current optimal solution. Then, enqueue i if P[i] is larger than the P value of the tail of the queue. Since every index will be considered as a starting index and an ending index only once, this algorithm runs in linear time.

解法

P[i]為A[1..i]的總和。對於一結尾索引e而言,我們想要計算最大的索引b使得A[b..e] = P[e+1] – P[b]至少為k。如果存在ij,使得i < jP[i] > P[j],那麼i不可能是至少為j的結尾的最佳解。 因此,我們維護一個佇列q初始為空。此佇列儲存的陣列的索引值,且依照索引以及P值的大小遞增。當處理到A[i]時,把所有x使得P[i+1] – P[x] ≥ k從佇列中移除。最後一個移除的元素即是在i結尾的最佳解。所以我們可以依此更新當前最佳解。接著,如果P[i]大於佇列尾端索引的P值,那麼就把i enque。因為每個索引只會被考慮為開始和結束各一次,此演算法時間為線性。


Read full article from Shortest Subarray with Sum is at least k | Algorithms Notes


Maximum Sum and Distance | Algorithms Notes



Maximum Sum and Distance | Algorithms Notes

Given an array A of n integers, design a linear time algorithm to find i and j such that A[i] + A[j] + (ij) is maximized subject to i > j.

Solution:

The optimal solution ending at i is to pair with j such that A[j] – j is maximized over all j < i. Thus we can scan the array from beginning and remember the current maximum of A[j] – j in all previous locations. Then, we can find optimal solution easily.


Read full article from Maximum Sum and Distance | Algorithms Notes


Maximum Sum and Distance | Algorithms Notes



Maximum Sum and Distance | Algorithms Notes

Given an array A of n integers, design a linear time algorithm to find i and j such that A[i] + A[j] + (ij) is maximized subject to i > j.

Solution:

The optimal solution ending at i is to pair with j such that A[j] – j is maximized over all j < i. Thus we can scan the array from beginning and remember the current maximum of A[j] – j in all previous locations. Then, we can find optimal solution easily.


Read full article from Maximum Sum and Distance | Algorithms Notes


九章算法面试题26 方格取数



九章算法面试题26 方格取数

初阶:有一个n*m的矩阵,需要从坐上角(1,1)走到右下角(n,m),只能向右和向下走。矩阵中的每个方格有一个非负整数,问从左上角到右下角的所有路径中,数字之和最大的路径是哪一条。


进阶:如果可以从左上角往下走k次,每个方格中的数被取走之后第二次经过则不再累加数值。请问走k次最多能取到的数值之和是多少。



解答

答:


初阶:采用动态规划算法。F[i][j]代表从左上角到(i,j)这个位置的最大路径和。有F[i][j] = max{F[i-1][j], F[i][j-1]} + A[i][j]。其中A[i][j]为(i,j)这个格子中的数值。答案为F[n][m]。时间复杂度O(n*m)。


进阶:采用网络流算法。将每个格子拆为两个点――入点和出点,入点到出点之间的流量限制为1,费用为格子的数值。所有出点均向它右方和下方的入点连一条流量限制为无穷大,费用为0的边。连接源点到左上角入点,流量设为k,费用设为0。设右下角的出点为汇点。从源点到汇点做一次最小费用最大流算法,即可得到答案。


面试官角度:


初阶问题是经常会被问到的问题。也是必须要掌握的动态规划算法问题之一。类似的题目有数字三角形。进阶问题一般不会被问到,在北美的面试中基本没有被问到过。再国内的面试中,有一些公司很牛但招聘名额很少时会问到这个题目,笔者曾经在参加面试时被问到的这个题。一般来说网络流算法不会在面试中被问到,所以读者不用担心也不必准备此类问题。只需要掌握初阶问题这种最基本的动态规划算法即可。


Read full article from 九章算法面试题26 方格取数


实用算法的分析与程序设计――递推法(倒推法) - 梦里水乡的专栏 - 博客频道 - CSDN.NET



实用算法的分析与程序设计――递推法(倒推法) - 梦里水乡的专栏 - 博客频道 - CSDN.NET

倒推法就是在不知初始值的情况下,经某种递推关系而获知问题的解或目标,再倒过来,推知它的初始条件。因为这类问题的运算过程是一一映射的,故可分析得其递推公式,然后再从这个解或目标出发,采用倒推手段,一步步地倒推到这个问题的初始陈述。

贮油点 一辆重型卡车欲穿过1000公里的沙漠,卡车耗油为1升/公里,卡车总载油能力为500公升,显然卡车装一次油是过不了沙漠的,因此四级必须设法在沿途建立几个贮油点,使卡车能顺利穿越沙漠,试问司机如何建立这些贮油点?每一贮油点应存多少汽油,才能使卡车以消耗最少汽油的代价通过沙漠?


Read full article from 实用算法的分析与程序设计――递推法(倒推法) - 梦里水乡的专栏 - 博客频道 - CSDN.NET


几本信息学竞赛算法书 - smallgyy的专栏 - 博客频道 - CSDN.NET



几本信息学竞赛算法书 - smallgyy的专栏 - 博客频道 - CSDN.NET

(1)实用算法的分析与程序设计   作者:吴文虎  王建德

 

(2)算法艺术和信息学竞赛      作者:刘汝佳 黄亮

本书较为系统和全面地介绍了算法学最基本的知识。这些知识和技巧既是高等院校"算法与数据结构"课程的主要内容,也是国际青少年信息学奥林匹克(IOI)竞赛和ACM/ICPC国际大学生程序设计竞赛中所需要的。书中分析了相当数量的问题。
本书共3章。第1章介绍算法与数据结构;第2章介绍数学知识和方法;第3章介绍计算机几何。全书内容丰富,分析透彻,启发性强,既适合读者自学,也适合于课堂讲授。
本书适用于各个层次的信息学爱好者、参赛选手、辅导老师和高等院校计算机专业的师生。本书既是信息学入门和提高的好帮手,也是一本内容丰富、新颖的资料集。

(3)算法竞赛入门经典   作者:刘汝佳

本书是一本算法竞赛的入门教材,把c/c++语言、算法和解题有机地结合在了一起,淡化理论,注重学习方法和实践技巧。全书内容分为11章,包括程序设计入门、循环结构程序设计、数组和字符串、函数和递归、基础题目选解、数据结构基础、暴力求解法、高效算法设计、动态规划初步、数学概念与方法、图论模型与算法,覆盖了算法竞赛入门所需的主要知识点,并附有大量习题。书中的代码规范、简洁、易懂,不仅能帮助读者理解算法原理,还能教会读者很多实用的编程技巧。另外,书中包含的各种开发、测试和调试技巧也是在传统的语言、算法类书籍中难以见到的。.
  本书可作为全国青少年信息学奥林匹克联赛(noip)的复赛教材及acm国际大学生程序设计竞赛(acm/icpc)的入门参考,还可作为it工程师与科研人员的参考用书。

(4)算法竞赛入门经典--训练指南   作者:刘汝佳  

  《算法竞赛入门经典》一书是刘汝佳老师的经典作品之一,自出版以来受到了广大读者的喜爱,近年来大家一直都在期盼着刘老师新作的诞生,可以说是"望眼欲穿"!3年的等待,现在终于可以迎接《算法竞赛入门经典――训练指南》的到来了,欢迎大家来阅读本书!
  作为《算法竞赛入门经典》的重要补充,本书旨在补充原书中没有涉及或者讲解得不够详细的内容,从而构建一个较完整的知识体系,通过近200道例题深入浅出地介绍了上述领域的各个知识点、经典思维方式以及程序实现的常见方法和技巧。
  "覆盖面广,点到为止,注重代码"是本书的最大特点,而这3个特点都是为了向业界靠拢而设定,注重广度而非深度。本书题目多选自近年来ACM/ICPC区域赛和总决赛真题,内容全面,盖了常见算法竞赛中的大多数细分知识点。
  书中还给出了所有重要的经典算法的完整程序,以及重要例题的核心代码,既适合选手自学,也方便教练组织学习和训练。

(5)新编实用算法分析与程序设计    作者: 王建德 吴永辉

本书是一部程序设计竞赛教程。书中首先讲述了算法的基本概念、各种排序与解题的方法及策略,然后论述了初等数论、计算几何学、搜索和图论的有关算法,最后讨论了动态规划。本书不仅从教学的角度详细讲解算法理论,而且从竞赛的角度对经典习题进行详细解析,培养学生灵活运用算法的能力。
  本书既可以作为大专院校计算机专业算法类课程的教材,亦可以作为大中学校计算机竞赛活动的培训教材,还可供计算机软硬件研发人员参考。


Read full article from 几本信息学竞赛算法书 - smallgyy的专栏 - 博客频道 - CSDN.NET


ACM书籍推荐 - 疯狂算法~ - 博客频道 - CSDN.NET



ACM书籍推荐 - 疯狂算法~ - 博客频道 - CSDN.NET

算法 算法 还是算法!!!

入门三本:《数据结构与算法》(傅清祥,王晓东编著,我所见过的最好的算法教材)
程序设计导引及在线实践  作者: 李文新
ACM程序设计培训教程 吴昊

基础提高:
算法艺术与信息学竞赛 第二版 刘汝佳
算法设计与分析  王晓东
算法设计与试验题解 王晓东
科曼:《算法导论》
组合数学 第三版 冯舜玺 译
计算几何-算法设计与分析 周培德
国际信息学奥林匹克竞赛指导― ― 实用算法的分析与程序设计   吴文虎 王建德
  
网络算法与复杂性理论  谢政 李建平
《Concrete Mathematics --- A Foundation For Computer Science》 Ronald L. Graham , Donald E. Knuth , Oren Patashnik《具体数学》(能买到中文版最好)
《计算机程序设计艺术》三卷  Knuth
组合数学的算法与程序设计
《程序设计中的组合数学》 吴文虎
图论的算法与程序设计
图、网络与算法

国际大学生程序设计竞赛辅导教程  郭嵩山 崔昊

《ACM国际大学生程序设计竞赛试题与解析》全部册(吴文虎著,清华大学出版社)
C算法.第1卷,基础、数据结构、排序和搜索(第三版)
C算法(第2卷图算法第3版中文版)译者:周良忠 (美国)塞奇威克著  
国际大学生程序设计竞赛例题解 四本  郭嵩山


Read full article from ACM书籍推荐 - 疯狂算法~ - 博客频道 - CSDN.NET


四边形不等式优化 << Yangff's Blog



四边形不等式优化 « Yangff's Blog

a<b<c<d => w[a][c]+w[b][d]<=w[a][d]+w[b][c]

然后证明对于m同样满足

最后对于i,j选择决策点k,j-i单调,可以得到

s[i][j]<=s[i][j+1]<=s[i+1][j+1]

也就是

s[i][j-1]<=s[i][j]<=s[i+1][j]

HOJ:2952

合并石子。

dp[l,r]=dp[l,k]+dp[k+1,r]+c[i,j]

k=s[i][j - 1]..s[i+1][j]

HDU:3480

排了个序,然后就木有然后了。

HDU:2829

同上

pku:1160

dp[i][j]=i个村用j个邮局

dp[i][j] = min{dp[k][m-1]+w[k+1][m]}

注:处理的时候先循环邮局数再循环村庄数。

这样每次处理村庄i的时候,i-1和i+1都被处理过了。


Read full article from 四边形不等式优化 « Yangff's Blog


四边形不等式优化 << Yangff's Blog



四边形不等式优化 « Yangff's Blog

a<b<c<d => w[a][c]+w[b][d]<=w[a][d]+w[b][c]

然后证明对于m同样满足

最后对于i,j选择决策点k,j-i单调,可以得到

s[i][j]<=s[i][j+1]<=s[i+1][j+1]

也就是

s[i][j-1]<=s[i][j]<=s[i+1][j]

HOJ:2952

合并石子。

dp[l,r]=dp[l,k]+dp[k+1,r]+c[i,j]

k=s[i][j - 1]..s[i+1][j]

HDU:3480

排了个序,然后就木有然后了。

HDU:2829

同上

pku:1160

dp[i][j]=i个村用j个邮局

dp[i][j] = min{dp[k][m-1]+w[k+1][m]}

注:处理的时候先循环邮局数再循环村庄数。

这样每次处理村庄i的时候,i-1和i+1都被处理过了。


Read full article from 四边形不等式优化 « Yangff's Blog


Calculating Document Distance



Calculating Document Distance

Previously we looked at the first part in my Back to Basics series where we understood and implemented Peak-Finding. This time we are going to talk about something slightly different; Calculating Document Distance. I really recommend you to take a look at the MIT course on Introduction to Algorithms, for this post I really recommend watching the part about document distance.

Practicing algorithms is both fun and educating, even if you are 20 years into your career you will most certainly always learn something new when analyzing algorithms. There's no greater feeling than when you've tackled a problem for a long time and suddenly you understand it deeply enough to optimize it and play with the edge cases.

What is Document Distance?

Consider that you have two documents containing a huge amount of text in them, be it essays or websites. Now you want to know how similar these documents are, in the sense of: how many words overlap in these documents. Conceptual the algorithm is really simple there's just a few steps that you'll have to go through:

  1. Open and read both documents that you are going to compare. Only read words and numbers, skip special characters (spaces, dots, etc..) and convert the words to lower case
  2. Calculate the word frequency in both collections of words, this means how many times each word occur in each document
  3. Compare the frequencies from both computations and calculate the distance

The distance itself is calculated using a predefined formula that you don't really have to pay too much attention too at this moment, unless you really fancy computations on vectors


Read full article from Calculating Document Distance


Understanding Peak-Finding



Understanding Peak-Finding

No matter how far we are in our careers as professional developers, it's great to freshen up on our fundamentals. Be it the importance of Memory Access Patterns or algorithms in general, it's really beneficial. I find it quiet interesting that it's been a pretty long time since I sat in the algorithms and data structures course on my technical institute and I tend to understand it completely different now. I heard a really great thing from a professor at MIT who said the following:

You can practice really hard for two years to become a great programmer and you can practice for 10 years to become an excellent programmer. Or you can practice for two years and take an algorithms course and become an excellent programmer

A lot of us might not think about the daily algorithms and data structures that we use, in fact, we hide behind ORMs and such that hides complexity and introduces behavior that we might not be aware of. Which is one of the reasons I personally like to read up on algorithms from time to time. This time though, I've decided to share some of the things I learn and like, hopefully you'll like it and it will help you to become an excellent programmer.

If you haven't seen this, MIT has a website called Open Courseware where they have video recordings, lecture notes, assignments and exams from their courses. There is one in particular which I recently found and it's excellent so far, it's called Introduction to Algorithms. If it's been a while since you looked into these topics, have a look at their content. Some of the examples and snippets here are from the lectures in this particular course.


Read full article from Understanding Peak-Finding


cresspahl: simple algorithm for 2D-Peakfinding



cresspahl: simple algorithm for 2D-Peakfinding

When it comes to analyzing data, a common task is to identify peaks in an x-y dataset. There are several ways to do that. Maybe the first idea that would come into one's mind is to look for zeros in the derivatives. This works fine for smooth theoretical curves but fails when noise is present (as it is certainly all kinds of experimental data).
To get rid of noise you can apply a low-pass filter on your data, but in some cases this is not desired. A simple approach is to use an algorithm that can does

  1. find peak candidates that are above some user-defined threshold level
  2. find out for each candidate whether it is the local maximum of a user defined environment
  3. return the valid peaks

Read full article from cresspahl: simple algorithm for 2D-Peakfinding


把二元查找树转变成排序的双向链表 | kiddy



把二元查找树转变成排序的双向链表 | kiddy

把二元查找树转变成排序的双向链表
题目:输入一棵二元查找树,将该二元查找树转换成一个排序的双向链表。要求不能创建任何
新的结点,只调整指针的指向。
比如将二元查找树
10
/ \
6 14
/ \ / \
4 8 12 16
转换成双向链表
4=6=8=10=12=14=16

Read full article from 把二元查找树转变成排序的双向链表 | kiddy


arithmetic expression validation | kiddy



arithmetic expression validation | kiddy

Write code to do arithmetic expression validation:
digits: 0..9
operators: +, -, (, )

E.g.:
1+2-3 valid
1+(2-(3+4)) valid
-2 not valid


Read full article from arithmetic expression validation | kiddy


cpp_projects/mitbbs at master ・ xqian/cpp_projects



cpp_projects/mitbbs at master ・ xqian/cpp_projects

1. 近似字符串匹配。比如给定一个长串(10G以上的量级),再给定一个短串(比如长 度为10),然后允许最多有两个字符的匹配误差,让找出长串中所有的匹配位置。(自 己觉得是�u在了这题。面试过程中一直没找到正确的思路,面试官开始试图给了一些引 导,但无奈没找到他所想的方向,后来他也有些不知道如何引导,因为他觉得除了答案 好像也无法给其它提示了。感觉就是两人都很无奈:他觉得都提示得那么明显了,你怎 么还不能想到呢,我自己也很着急,觉得某个点没想到,想到了就会简单,但到底是哪 个点呢。。。面试官也看他的code review去了 /* 【 在 novice (novice) 的大作中提到: 】 : 能说说第一题: : 1. 近似字符串匹配。比如给定一个长串(10G以上的量级),再给定一个短串(比如长 : 度为10),然后允许最多有两个字符的匹配误差,让找出长串中所有的匹配位置。 : 面试官是如何引导的吗? 我的想法是滚动hash 比如短串是abcd 那么把它本身和误差为一个或两个char的短串的滚动hash值都算出来放到一个set里面 e.g. abcd bbcd cbcd ... zbcd aacd ... azcd 一共有26*10+45*26*26个备选的短串。 然后在长串里match就行了 hash function随便找个小的质数就ok 比如abcd ==>> a*7^4+b*7^3+c*7^2+d*7 */ 2. 设计 api 并用 mutex 实现一个读写锁。 3. 6. 用 C/C++ 的基本语言特性判断某个系统上栈的生长方向。 4. 给一堆点, 找一条线穿过最多的点 第四题: 任选两个点都能组成一条线y=ax+b,所以可以有个Pair (a,b) 然后我就弄个Map<Pair, Set<point>> ,找所有两点组合算它们的(a, b) pair, 把两 个点加到map.get(pair)中。 最后看map中所有的set谁的size最大, return这个对应的(a, b).

Read full article from cpp_projects/mitbbs at master ・ xqian/cpp_projects


谷歌面经 - 未名空间(mitbbs.com)



谷歌面经 - 未名空间(mitbbs.com)

在国内面试的,两轮电面 + 四轮onsite,已挂

一方面说是正逢校招,北京这边的面试官资源比较紧张,另外应该也是中间面的不太好
,所以整个过程历经了快两个月,刚开始是挺焦急地等着下一步的消息,后来也淡定。
终于在这周面完最后两轮后,并于周五晚收到了HR的拒信,算是一个了结。

电面一: 安排的是硅谷的面试官,时间为北京早8点。结果从7点多开始等到9点都没有
接到电话,只好联系了HR,HR很快联系到了面试官并道歉说面试官有事给错过了,问是
接着面试还是另外安排一个时间,并说面试官马上要出差,另外安排的话可能要一两周
之后了。当时想反正都等了一个多小时了,就接着面吧。很快面试官就打了电话过来,
然后直接扔了一道题过来。 题目是:给两个以字符串形式表示的大整数,求其和。现
在回头看,其实这是一道挺普通的题目,但当时就不知怎么没能一开始就理清两个大数
的正负等不同情况了。所以最后是基本做了出来,但中间改了又改,代码很乱,存在不
少冗余,时间花得也比较长。。。

结束之后,整个人情结很低落,也懊恼万分,感觉人生的第一次谷歌面试就要至此结束
了。

大约一周左右,HR给打电话,问上次面得怎样,我回答不好,他说是的,并说让他考虑
下是否安排另一次电面。在此得再次感谢HR(以及帮忙内推的师兄,后来了解到的)很
快地很帮我安排了下一次电面,让这次谷歌面试之行得以继续。

电面二:也是硅谷的面试官。这次倒蛮顺利的,一上来开始聊了几分钟项目,然后开始
做题。第一道是二叉树相关的,假定允许交换二叉树中任意节点下的左子树与右子树,
然后给定两棵二叉树,判断它们之间是否满足这种交换关系。第二道是括号匹配验证问
题(算是leetcode的原题,虽说我是第一次电面后才开始看 待字闺中,并知道
leetcode 的,当时还没刷到这一题)。最后还有几分钟的时间就是问了些问题。

onsite 四轮,分为两次,每次两轮。中间间隔了半个多月。因为签了协议,就不细说
了,主要是题目有:

onsite 一 & 二
1. 近似字符串匹配。比如给定一个长串(10G以上的量级),再给定一个短串(比如长
度为10),然后允许最多有两个字符的匹配误差,让找出长串中所有的匹配位置。(自
己觉得是�u在了这题。面试过程中一直没找到正确的思路,面试官开始试图给了一些引
导,但无奈没找到他所想的方向,后来他也有些不知道如何引导,因为他觉得除了答案
好像也无法给其它提示了。感觉就是两人都很无奈:他觉得都提示得那么明显了,你怎
么还不能想到呢,我自己也很着急,觉得某个点没想到,想到了就会简单,但到底是哪
个点呢。。。面试官也看他的code review去了)
2. 找出两链表的交叉节点。(经典老题,但因为前一轮的题目没时间做完,这时心里
比较着急,就急切地先把代码写了出来,并没有注意代码的精练。写完给面试官看时,
自己也指出一处冗余了,并说了是因为赶时间的原因,平时的话自己肯定还会再
refactor。不过从后来HR的反馈来看还是在这里失了分。)
3. 在一个正整数的数组中找出不在数组中的最小正整数。(leetcode原题,但我又是
在面试后才发现的。。。不过感觉答得还好,而且当时自己是想了另一种不是交换元素
的思路。怪不得当时面试官对我的思路想了好一会,最后确认是可行的。)

onsite 三 & 四
4. 设计 api 并用 mutex 实现一个读写锁。
5. 设计 api 并实现所有操作都为O(1)的 LRUCache.
6. 用 C/C++ 的基本语言特性判断某个系统上栈的生长方向。

这几题除了LRUcache那题之前有看过,其它都没接触过,但感觉现场答得还可以吧。只
是担心可能无法挽回 onsite 1& 2 的一般表现,果然两天之后收到了拒信。

最后说下自己对这次谷歌面试的感想:就是后来从这里也了解到对于FLAG等注重算法的
公司,大家都会在面试之前刷题。所以至少感觉谷歌面试官对于solution的要求比较高
,方法要是最优的,代码得是清晰无死角的。老实说,如果是一个完全没见过的新题目
,要做45分钟内做到一点,还是挺挑战的。因为有时光是想想有哪些需要注意的Corner
case 都要花不少时间。不过好像这也是当前的风气以及大势所趋,所以想去这些公司
的同学,在投简历面试之前,还是要多花些时间做题,做好准备,哪怕没能遇到原题,
至少在面试时心态上也会有底气得多。

呵呵,��哩��嗦写了这么,谢谢你的耐心看完,也希望能所有所帮助。

========

另外,版上有对国内amazon了解的同学吗?现在手上有一个sde的口头offer,做
payment 相关的。自己毕业后在国内ms做传统软开,两年多了,之前项目比较动荡,接
下来可能也会有变化。所以也想趁些换个环境做一些新的东西,扩展下自己,顺便增点
工资,不知那边值得跳过去吗?谢谢!

Read full article from 谷歌面经 - 未名空间(mitbbs.com)


Android 6.0 Marshmallow new features | BGR



Android 6.0 Marshmallow new features | BGR

By Brad Reed on Aug 28, 2015 at 4:40 PM Both iOS 9 and Android 6.0 Marshmallow have been tagged as incremental upgrades that are mostly looking to add stability improvements without overwhelming us with new features. But Android fan  Salman Ahmad  makes the case that Marshmallow is actually a bigger deal than a lot of people — including Google — are giving it credit for being. • Now on Tap • Android Pay • Automatic app data backups • App Links (you're going to see less of those "what do you want to open this in?" prompts) • Doze and App Standby • Dark theme(removed, uncertain future) • Visual Voicemail Support • New "Memory" Section in Settings(it was there before,

Read full article from Android 6.0 Marshmallow new features | BGR


传说中的华为面试题(8分钟写出代码) - Tour大咸鱼 - 博客园



传说中的华为面试题(8分钟写出代码) - Tour大咸鱼 - 博客园

这个题目是在网上看到了,题目描述如下:有两个数组a,b,大小都为n,数组元素的值任意,无序。要求:通过交换a,b中的元素,使数组a元素的和与数组b元素的和之间的差最小。不知道是否真的出自华为,但题目难度很大,以我的水平8分钟确实无法写出完整的代码,查阅网上的牛人的思路,理解整理如下:

  1. 对两个数字值进行累加,设差值 A = sum(a) - sum(b)
  2. a的第i个元素和b的第j个元素交换后,a和b的和之差为:A' = sum(a) - a[i] + b[j] - (sum(b) - b[j] + a[i]) = sum(a) - sum(b) - 2 (a[i] - b[j]) = A - 2 (a[i] - b[j])
  3. 设x = a[i] - b[j],带入上式得,|A| - |A'| = |A| - |A-2x|
  4. 假设A > 0,当x 在 (0,A)之间时,做这样的交换才能使得交换后的a和b的和之差变小,x越接近A/2效果越好,如果找不到在(0,A)之间的x,则当前的a和b就是答案。

  算法描述为:将a、b两个数组看出天平两端,各自寻找使得x在(0,A)之间并且最接近A/2的i和j,交换相应的i和j元素,重新计算A后,重复前面的步骤直至找不到(0,A)之间的x为止。根据算法描述实现代码:


Read full article from 传说中的华为面试题(8分钟写出代码) - Tour大咸鱼 - 博客园


Google V8 引擎运用了哪些优秀的算法? - Milo Yip 的回答 - 知乎



Google V8 引擎运用了哪些优秀的算法? - Milo Yip 的回答 - 知乎

在1980年之前,许多C语言标准库中的 printf() 都会产生「不正确」的结果。Coonen在那时候做了相关的博士研究[1],但并没有受到广泛的知悉和应用。1990年Steele等人发表了名为Dragon4的算法[2],通过使用大数运算来确保精确的转换。而这个算法应用在大部分C程序语言库的 printf(),以及其他用 printf() 来实现这功能的语言之中。

这样就20年过去,虽然中间有一些小改进[3][4],但基本的思路仍然是这样。到了2010年,Loitsch发表了Grisu算法[5],而V8也实现了这个算法。

该篇文章阐述了3个算法,分别是原始的Grisu,及其改进版Grisu2和Grisu3。Grisu算法比Dragon4等算法优越的地方就是一个字──快。以下简单介绍一下它们的特点。

首先,什么是「正确」的转换?其定义是,一个浮点数转换成的十进位字符串之后,该字符串可以完美的转换回去原来的浮点数,如果用C语言来描述的话:

// 除+/-inf及NaN外的浮点数都应该传回true。 bool Verify(double d) {     assert(!isnan(d) && !isinf(d));     char buffer[32]; // 足够大么?     dtoa(buffer, d); // 双精度浮点数转换至字符串,是double-to-ascii,不是dota     char* end;     double d2 = strtod(buffer, &end); // 字符串转换至双精度浮点数     return *end == '\0' && d == d2; } 

如前所述,Dragon4使用大数运算,还涉及动态内存分配(你知道printf()里可能会做malloc()么?)。而Grisu则不需要,只需要用到64位整数运算便可以实现转换!所以那篇文章题目就以"... with integers"作结尾。

Grisu 的算法非常简单,但它有一个缺点,就是其结果并不像是给人看的。如文中的例子,Grisu 把 0.3 的打印成 29999999999999998e-17。这是「正确的」转换结果,它可以通过 Verify() 验证。

虽然该算法非常快,但一般人大概不会接受这样的结果。作者也因此研发出改进版本Grisu2,在使用64位整数实现 double 的转换时,可以利用额外的 64 - 53 = 11 位去缩减转换的结果(53为double的尾数位数)。Grisu2可以把>99.9%的浮点数转换成最短的「正确」字符串,其馀<0.1%的浮点数仍然是「正确」的,但不是最短的答案。

也许一般人就见好就收了,不竟已证明算法的正确性,只是有那么<0.1%情况未达至最完美的结果。不过该作者还是继续研究出 Grisu3。Grisu3并不能解决那一小撮麻烦制造者,但它能在计算期间侦查到哪些 double 在这算法中未能得出最优的结果。既然办事快捷的小部门搞不定,就可以把它交给Dragon4或其他较慢的算法。

V8 里实现了 Grisu3 以及大整数的算法(我不肯定是Dragon4还是其他),后来Google也把它分离成为一个独立的C++程序库 http://code.google.com/p/double-conversion/

为了优化 RapidJSON (miloyip/rapidjson 路 GitHub) 的浮点数转换,也由于 RapidJSON 是仅需头文件的JSON库,我就按 Loitsch 的实现编写了一个 Grisu2 的头文件库,可以在 miloyip/dtoa-benchmark ・ GitHub 取得,那里还比较了多个 dtoa() 实现的性能。因为 Grisu3 需要另一个更庞大的大数实现,而暂时 RapidJSON 不需要最优结果,所以就仅实现了一个性能更好及更简短的 Grisu2。

加入了 Grisu2 之后,RapidJSON 的 JSON 字符串化(stringify)性能远超其他 JSON 库。没想到读到最后是广告吧。

Read full article from Google V8 引擎运用了哪些优秀的算法? - Milo Yip 的回答 - 知乎


算法帝国--读书笔记 - 天才白痴梦 - 博客园



算法帝国--读书笔记 - 天才白痴梦 - 博客园

大约三周前我把这本书看了个大概吧,一直拖到现在才开始动手写这一篇读书笔记,对我的执行力,连我自己都看不下去了,一直都有在不断的改改改。话说回来,这个时候我提到的读书笔记和我博客里的其他读书笔记有点出入,博客里其他的博文讲到的读书笔记是指我在学习一些技术上的书籍和动手实践的过程中的经验(就目前而言,主要是为毕业寻找工作而选择学习一些技术,次要是纯属兴趣),然后把这些记录在一篇篇的博文中,而我在这一篇文章里提到的读书笔记则是指阅读一些IT方面的信息性的书籍时的心情和感悟,两者还是有一丢丢的区别的吧。

 

今天,算法涉足的领域已经远远超出了其创造者的预期。特别是进入信息时代以后,算法的应用涵盖金融、医疗、法律、体育、娱乐、外交、文化、国家安全等诸多方面,显现出源于人类又超出人类的强大威力。本书是《纽约时报》畅销书作者斯坦纳的又一力作,通过一个又一个引人入胜的故事,向读者介绍了算法掌控世界的真实情况,揭示了"机器人革命"是如何悄然在我们身边发生的。

本书适合所有对科技史、信息革命、算法原理、数据分析感兴趣的读者阅读参考。

-------------------摘自《算法帝国》一书内容提要部分

 

我们首先看看书的目录吧

第一章  华尔街,第一张多米诺骨牌

黑客成长的艰难道路
改变华尔街的算法
1980年华尔街的黑客生涯:天时地利
算法交易之父
算法传到好莱坞
平板电脑的先驱
算法从东海岸传到西海岸
一个登上华尔街巅峰的黑客
金融领域的未知前沿

第二章  人与算法的简史

算法源自何处
黄金分割
现代算法的教父
高斯:实现算法的逻辑
帕斯卡、伯努利和改变世界的博弈游戏
赋形于算法
布尔逻辑机器

第三章  机器评出的前四十榜单

你有41%的机会成为Lady Gaga
A&R机器说没有听到一首单曲
巴赫机器
揭开披头士乐队之谜

第四章  计算机的秘密高速公路

交易大过天
不论好坏,金钱、速度与科技总是携手并肩
速度孵化器

第五章 系统里的博弈论

让算法变狡猾
体育博彩
算法:中情局的幽灵
对算法来说,爱情和棒球的世界万事皆有可能

第六章  呼叫机器人医生

生命的仲裁人
你的机器人医生
但是我预约的那个医生呢

第七章  人的分类

选对人,不靠运气靠科学
从美国航空航天局走进日常生活

第八章  华尔街与硅谷的较量

第九章  华尔街的损失让我们大家获益

他们走向硅谷
华尔街带来的伤害

第十章  未来属于算法和它们的创造者

条条大道
两个海岸的故事

 

读完此书你联想到了些什么

坦白的说,这本书我比较认真的阅读了第一、二、十章,其余章节比较快速的阅读了,当然内容也是杠杠的精彩。

想起了第一章中彼得菲在华尔街通过编写代码来实现股票的选择和交易,后来加入公司、建立公司、招募算法程序员,一群人为股票和算法而痴迷得废寝忘食,疯狂又富有激情,最后用算法创造了属于自己的巨大财富。我们在提到计算机算法的时候,应该也会毋庸置疑的同时想到数学这一领域,因为计算机和数学本来就分不开,有着千丝万缕的联系,在书中的第二章非常详细的介绍了几位伟大的数学家和他们各自的贡献,这些贡献不止存在于数学界,还在计算机科学与算法中扮演着至关重要的角色。

也许你会这样思考:算法、数学和代码之间的联系是什么?以己拙见,数学是算法的灵魂,而代码是算法的脊梁。没有数学思维的算法感觉就不美妙了,就如行尸走肉般,只有脊梁撑着,没有精气神。

和算法联系在一起的还有另外一个词:人工智能。在读到本书第十章也就是最后一章的时候,作者谈了一下人工智能和未来。到底这个世界最终会不会被机器人操控,这个问题我是没有资格讨论的。不过就目前来说,算法给我们生活带来很大的便利,比如互联网获取信息、谷歌的无人驾驶汽车、疾病的预测等等。

 

最后

本文由《算法帝国》内容提要开头,最后我也以本书最后一章的最后一段话结束吧:

未来,会编写代码的人有许多事情可以做。如果你还能构思并构造出复杂精妙的算法,那就更好了--你很有可能统治世界,如果没有机器人抢在你前头的话。


Read full article from 算法帝国--读书笔记 - 天才白痴梦 - 博客园


Jersey - Passing ContainerRequest headers to a resource



Jersey - Passing ContainerRequest headers to a resource

In the above you are setting header values on the Servlet request object. Jersey does not sync between the Servlet request and it's own request.

I recommend you utilize either servlet request attributes or Jersey request attributes rather than add request headers.

Your filter can add Jersey request properties using ContanerRequest.getProperties().

Your resource class can inject HttpContext and obtain the same properties using HttpContext.getProperties.

Read full article from Jersey - Passing ContainerRequest headers to a resource


Google Home Services Ads Launch In Google Express



Google Home Services Ads Launch In Google Express

Local Visibility SUBSCRIBE Google Home Services Ads Launch In Google Express Limited to the San Francisco Bay Area, the ads are available to a plumbers, locksmiths, house cleaners and handymen. Ginny Marvin on August 28, 2015 at 12:42 pm Google has opened up access to its new home services ad program in AdWords Express. The new home services ad program launched in beta in the San Francisco area  last month to connect service providers with local residents searching for help. Started initially for locksmiths and plumbers, it's opened up to house cleaners and handymen as well. To get started with AdWords Express for home services ads, service providers in the area can fill out this form  with contact information, the business' URL and the service sector of the business. Google screens all applicants and uses a third-party to conduct background checks. If approved, Google will "organize the information you give us into a polished profile page".

Read full article from Google Home Services Ads Launch In Google Express


Random knight | Calculus VII



Random knight | Calculus VII

A knight walks randomly on the standard chessboard. What is the proportion of time that it will spend at e4?

The answer (1/42) is not hard to get, but I'll take the computational approach first (testing Scilab 5.5.0 beta 1, by the way). Begin by placing the knight at b1 and letting it make five random moves. This is the distribution of its position after five moves:


Read full article from Random knight | Calculus VII


Three Parallel Backtracking Designs | Dr Dobb's



Three Parallel Backtracking Designs | Dr Dobb's

Before we get into the meat of this post, let me revisit the serial code I used for illustration last time. Specifically, I want to use a better method for recording queen positions.

Rather than keeping a 2D array that mimics a physical chessboard, I propose using an integer vector with length equal to the number of rows. The data stored in each element will be the column of a placed queen. This eliminates the chance that two queens will be on the same row, and the test for a legal position can simply search all lower indexed elements to see if those elements correspond to the same column (vertical attack) or are along a diagonal from the newly placed queen. If such a case is found, the newly placed queen attacks an already placed queen and stillLegal() will return FALSE since the position has no hope of leading to a legal solution.

It may seem like a "six of one, half dozen of another" situation, but having the simpler and smaller data structure to represent the current state of a partial solution will be a benefit to our parallel solutions. The simpler aspect is seen in the tests needed in stillLegal(). Checking for queens in the same column and testing the diagonal attack squares is easy, as seen in the code below. The positions of queens placed in rows "above" the currently placed queen are all that need to be examined. Any new move generated at the same level simply overwrites the previous column value in the vector and the test ignores any column values that are stored in higher indexed ("below") locations.


Read full article from Three Parallel Backtracking Designs | Dr Dobb's


Problems for the chess knight ...



Problems for the chess knight ...

Trouble for the knight

The idea of looking at a single the chess knight on the board goes all the way back to the 18th century, when Leonhard Euler, a swiss matematician, introduced a problem in Berlin, 1758. The problem, that is usually referred to as Springerproblem or The Knight's Tour, is given by

Starting with an empty chess board, is there a path that has a knight visit all the fields (black and white) of the board exactly one time?

The knight is paticularly interesting because of his strange way of moving. It would be much less interesting to try that with the queen, for example, since most attempts should lead to a solution. The knight in turn has a very limited way of moving that keeps the number of accessible fields per move below eight.

When Euler first thought of that problem, he imagined an 8x8 board, the regular chess board, while today's mathematical and computerized methods make us want to take a look at bigger boards, too.
But in order to find out about the chances of solving the problem, we have to take a closer look at it first.

Different flavours...

During the last almost 250 years, different ways of looking at the problem have evolved. It is remarkable, that even the smallest one among them took about 230 years to be sufficiently solved in a way that is scalable even for huge boards.
The problem of the Knight's Tours is often used as an example in graph theory, since it is easy to imagine and to try out but nevertheless a very big problem with many different flavours to emphasize on. It can be used to show the idea of symmetry, of directed and indirected tours/graphs and of paths and closed tours. Though it is a Hamiltonian problem, solutions can easily be found.
Since it was proven to have solutions for all boards >= 5x5 in the early 1990s, it has now become even more popular and commenly known and there were several attempts and applications in this field that reach from simple Java-Applets for demonstration and visualisation purposes up to parallel computation and neural networks (why ever...)

All under a general headline

The Knight's Tour represents a special case of one of the biggest problems in computer science: The Hamilton Cycle.
R.W. Hamilton was an irish (cheerz!) mathematician of the 19th century. The problem of the Hamilton Cycle describes the search for a path in a graph to connect all knodes so that they build a cycle, similar to "connect the dots" (Malen-nach-Zahlen) adding the slight difficulty that the numbers are missing and you have to find the right order yourself - and usually there are quite a few dots to connect... The Hamilton Cycles belong to the class of NP-complete problems and are therefore believed to be only solvable with an afford that grows exponentially with the size of the problem.

To find a Knight's Tour is by far more simple since the limitation to a chess board provides some benefits like indirectness, symmetry and (here we go:) recursion. In fact, finding a Knight's Tour is so "easy" that todays home computers do this for giant boards with some billions of fields in an almost unmeasurably short period of time. You can just wait for it...


Read full article from Problems for the chess knight ...


Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts