One from my first job (not mine though). Java app, deployed to four servers (by rsyncing a zip file and unzipping it). One of the four fails on startup, in some way that's hard to trace (I think it might have been a JNI crash)? But all four servers have the same OS, same version, same packages installed, same JVM, and it was the same zip file. Check the md5sum on the zip, it matches. In desperation one of my colleagues writes a script to recursively go through the unpacked version of the app and check the md5sums of all the files. Still matches perfectly, and the same files are present on all machines. We get the dev team to try the app - there's a bit more variety in our devboxes than servers. Two of them can reproduce the failure, but there's no obvious correlation - one's java 1.5, one's java 1.6. One's Debian, one's Gentoo. For every combination there's another developer with a similar machine where it works fine.
Read full article from My Hardest Bug | Hacker News
No comments:
Post a Comment