sysadvent: Day 15 - Down the 'ls' Rabbit Hole



sysadvent: Day 15 - Down the 'ls' Rabbit Hole

Too often sysadmins are afraid to dive into the source code of our core utilities to see how they really work. We're happy to edit our scripts but we don't do the same with our command line utilities, libraries, and kernel. Today we're going to do some source diving in those core components. We'll answer the age-old interview question, "What happens when you type ls at the command line and press enter?" The answer to this question has infinite depth, so I'll leave out some detail, but I'll capture the essence of what is going, and I'll show the source in each component as we go. The pedants in the crowd may find much to gripe about but hopefully they'll do so by posting further detail in the comments.

Requirements

It'll be helpful if you install the source on your machine for the software we'll be looking at. Below are the commands I used to get the source for the needed packages on Ubuntu 9.10, and similar packages are available for your Linux distribution.

apt-get install linux-source   apt-get source coreutils   apt-get source bash  apt-get source libc6  apt-get install manpages-dev  

I'm using linux-source version 2.6.31.22.35, coreutils (for the code to ls) version 7.4-2ubuntu1, bash version 3.5.21, and libc6 version 2.10.1-0ubuntu18, and finally manpages-dev to get the programmer's man pages.

Starting Out - strace & bash

One of the most useful tools in the sysadmin's arsenal is strace, a command that will show you most of the standard library and system calls a program makes while it executes. We'll use this tool extensively to figure out what code we are looking for in each component.

Let's start by strace'ing bash when it runs ls. To do so, we'll start a new instance of bash under strace. Note that I'll be cutting the output of strace down a lot in the post for readability.

adamf@kid-charlemagne:~/foo$ strace bash  execve("/bin/bash", ["bash"], [/* 30 vars */]) = 0    [... wow that's a lot of output ...]    write(2, "adamf@kid-charlemagne:~/foo$ ", 29adamf@kid-charlemagne:~/foo$ ) = 29  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0  rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0  read(0,  

... and that's where the output stops. If you're new to strace the key to reading it is to make liberal use of man pages to figure out what each library call does. Be aware that the relevant pages you want are in section 2 of the man pages, so you'll need to do man 2 read to find the page on read; this is because many of the system functions have the same name as regular commands that are found in chapter 1 of the man pages.

The read call is waiting for input on file descriptor 0, which is standard input. So we type ls and hit enter (you'll see more read & write calls as you type).

There's a lot of output, but we know we want to see ls related output, so let's do the simple thing and look at the lines that have ls in them:

stat("/usr/local/sbin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory)  stat("/usr/local/bin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory)  stat("/usr/sbin/ls", 0x7fff03f1fd60)    = -1 ENOENT (No such file or directory)  stat("/usr/bin/ls", 0x7fff03f1fd60)     = -1 ENOENT (No such file or directory)  stat("/sbin/ls", 0x7fff03f1fd60)        = -1 ENOENT (No such file or directory)  stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=114032, ...}) = 0  stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=114032, ...}) = 0  

If we man 2 stat we see that stat returns information about a file if it can find it, and an error if it can't (much more on stat later). In this case what bash is doing is searching my $PATH environment variable in hopes of finding an executable file with the name ls. Bash will stat every directory in my $PATH, and if it can't find the file it returns command not found. In this case, Bash found ls in /bin, and then that's the last we see of the string ls in our output.

We don't see ls in our output anymore because once Bash knows it can execute the program it spawns a child process to execute that program, and we haven't told strace to follow children of the command it is tracing. It's the next few lines of strace that give this spawning away:

pipe([3, 4])                            = 0  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2c853217c0) = 30125  

If we man 2 pipe and man 2 clone we see that bash is creating a pipe (two file descriptors that can be read and written to; this way a shell can link commands input and output together when you give the shell a | character) and clone'ing itself so that there are two copies of bash running. Remember, every UNIX process is a child of another process, and a brand new process starts out as a copy of its parent. So when does ls actually happen? Let's strace ls and find out!

adamf@kid-charlemagne:~/foo$ strace ls  execve("/bin/ls", ["ls"], [/* 30 vars */]) = 0  

That first line is the key. execve is the library call to load and run a new executable. Once execve runs we're actually ls (well, the loader runs first, but that's another article). Interestingly, the call to execve is in the bash source code, not the ls source code. Let's find it in the bash code:

adamf@kid-charlemagne:/usr/src/bash-4.0/bash-4.0$ find . | xargs grep -n "execve ("  ./builtins/exec.def:201:  shell_execve (command, args, env);  ./execute_cmd.c:4323:   5) execve ()  ./execute_cmd.c:4466:      exit (shell_execve (command, args, export_env));  ./execute_cmd.c:4577:  return (shell_execve (execname, args, env));  ./execute_cmd.c:4653:/* Call execve (), handling interpreting shell scripts, and handling  ./execute_cmd.c:4656:shell_execve (command, args, env)  ./execute_cmd.c:4665:  execve (command, args, env);  

If we look at line 4323 in execute_cmd.c we see this helpful comment:

/* Execute a simple command that is hopefully defined in a disk file  somewhere.    1) fork ()  2) connect pipes  3) look up the command  4) do redirections  5) execve ()  6) If the execve failed, see if the file has executable mode set.  If so, and it isn't a directory, then execute its contents as  a shell script.  [...]  */  

And looking at line 4665 we do see the call to execve. Take a look at the code around execve - it's a bunch of error handling but nothing too hard to understand. What's interesting is what is not there; the code exists only to handle errors and nothing to handle success. That is because execve will only return if there's a failure, which makes sense - a successful call to execve means we're running something completely different!

Look around execute_cmd.c at the code around calls to shell_execve and you'll see that that code is fairly straightforward.

Inside ls(1)

Let's look at what ls is doing by creating a single file in our directory and ls'ing that file under strace.

adamf@kid-charlemagne:~/foo$ touch bar  adamf@kid-charlemagne:~/foo$ strace ls bar  execve("/bin/ls", ["ls", "bar"], [/* 30 vars */]) = 0  

Interesting! We can see that bar is now being passed to our execve call. Let's keep looking at the strace output to find bar:

stat("bar", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0  lstat("bar", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467abbe000  write(1, "bar\n", 4bar  )                    = 4  

Right at the end of the strace output we see bar a few times. It looks like bar gets passed to stat, lstat, and write. Working backwards, we can man 2 write to figure out that write sends data to a file descriptor, in this case standard out, which is our screen. So the call to write is just ls printing out bar. The next two library calls, stat and lstat, share a man page, with the difference between the commands being that lstat will get information on a symbolic link while stat will only get information on a file. Let's look in in the ls source code for these calls to see why ls does both lstat and stat:

adamf@kid-charlemagne:/usr/src/coreutils-7.4/src$ grep -n "stat (" ls.c  967:      assert (0 <= stat (Name, &sb));       \   2437:      ? fstat (fd, &dir_stat)  2438:      : stat (name, &dir_stat)) < 0)  2721:     err = stat (absolute_name, &f->stat);  2730:         err = stat (absolute_name, &f->stat);  2749:     err = lstat (absolute_name, &f->stat);  2837:         && stat (linkname, &linkstats) == 0)  

That call to lstat stands out amongst the other calls, and so it is a pretty good guess that lstat happens for some exceptional reason that programmer would notate with a comment. Looking at line 2749 in ls.c we see an interesting comment a few lines above:

         /* stat failed because of ENOENT, maybe indicating a dangling               symlink.  Or stat succeeded, ABSOLUTE_NAME does not refer to a               directory, and --dereference-command-line-symlink-to-dir is               in effect.  Fall through so that we call lstat instead.  */          }        default: /* DEREF_NEVER */        err = lstat (absolute_name, &f->stat);        do_deref = false;        break;      }  

That comment means that if we're not talking about a directory and stat has already succeeded, we need to see if we are looking at a symlink. We can see that this is true by ls'ing a directory under strace:

adamf@kid-charlemagne:~/foo$ strace ls /home/adamf/foo/  [...]  stat("/home/adamf/foo/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0  open("/home/adamf/foo/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3  fcntl(3, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)  getdents(3, /* 3 entries */, 32768)     = 72  getdents(3, /* 0 entries */, 32768)     = 0  close(3)                                = 0  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f873dda4000  write(1, "bar\n", 4bar  

Note that there was no call to lstat this time.


Read full article from sysadvent: Day 15 - Down the 'ls' Rabbit Hole


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts