sysadvent: Day 15 - Down the 'ls' Rabbit Hole
Too often sysadmins are afraid to dive into the source code of our core utilities to see how they really work. We're happy to edit our scripts but we don't do the same with our command line utilities, libraries, and kernel. Today we're going to do some source diving in those core components. We'll answer the age-old interview question, "What happens when you type ls
at the command line and press enter?" The answer to this question has infinite depth, so I'll leave out some detail, but I'll capture the essence of what is going, and I'll show the source in each component as we go. The pedants in the crowd may find much to gripe about but hopefully they'll do so by posting further detail in the comments.
Requirements
It'll be helpful if you install the source on your machine for the software we'll be looking at. Below are the commands I used to get the source for the needed packages on Ubuntu 9.10, and similar packages are available for your Linux distribution.
apt-get install linux-source apt-get source coreutils apt-get source bash apt-get source libc6 apt-get install manpages-dev
I'm using linux-source
version 2.6.31.22.35
, coreutils
(for the code to ls
) version 7.4-2ubuntu1
, bash
version 3.5.21
, and libc6
version 2.10.1-0ubuntu18
, and finally manpages-dev
to get the programmer's man
pages.
Starting Out - strace
& bash
One of the most useful tools in the sysadmin's arsenal is strace
, a command that will show you most of the standard library and system calls a program makes while it executes. We'll use this tool extensively to figure out what code we are looking for in each component.
Let's start by strace
'ing bash
when it runs ls
. To do so, we'll start a new instance of bash
under strace
. Note that I'll be cutting the output of strace
down a lot in the post for readability.
adamf@kid-charlemagne:~/foo$ strace bash execve("/bin/bash", ["bash"], [/* 30 vars */]) = 0 [... wow that's a lot of output ...] write(2, "adamf@kid-charlemagne:~/foo$ ", 29adamf@kid-charlemagne:~/foo$ ) = 29 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 read(0,
... and that's where the output stops. If you're new to strace
the key to reading it is to make liberal use of man
pages to figure out what each library call does. Be aware that the relevant pages you want are in section 2 of the man
pages, so you'll need to do man 2 read
to find the page on read
; this is because many of the system functions have the same name as regular commands that are found in chapter 1 of the man
pages.
The read
call is waiting for input on file descriptor 0, which is standard input. So we type ls
and hit enter (you'll see more read
& write
calls as you type).
There's a lot of output, but we know we want to see ls
related output, so let's do the simple thing and look at the lines that have ls
in them:
stat("/usr/local/sbin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory) stat("/usr/local/bin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory) stat("/usr/sbin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory) stat("/usr/bin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory) stat("/sbin/ls", 0x7fff03f1fd60) = -1 ENOENT (No such file or directory) stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=114032, ...}) = 0 stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=114032, ...}) = 0
If we man 2 stat
we see that stat
returns information about a file if it can find it, and an error if it can't (much more on stat
later). In this case what bash
is doing is searching my $PATH
environment variable in hopes of finding an executable file with the name ls
. Bash will stat
every directory in my $PATH
, and if it can't find the file it returns command not found
. In this case, Bash found ls
in /bin
, and then that's the last we see of the string ls
in our output.
We don't see ls
in our output anymore because once Bash knows it can execute the program it spawns a child process to execute that program, and we haven't told strace
to follow children of the command it is tracing. It's the next few lines of strace
that give this spawning away:
pipe([3, 4]) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2c853217c0) = 30125
If we man 2 pipe
and man 2 clone
we see that bash
is creating a pipe (two file descriptors that can be read and written to; this way a shell can link commands input and output together when you give the shell a |
character) and clone
'ing itself so that there are two copies of bash
running. Remember, every UNIX process is a child of another process, and a brand new process starts out as a copy of its parent. So when does ls actually happen? Let's strace ls
and find out!
adamf@kid-charlemagne:~/foo$ strace ls execve("/bin/ls", ["ls"], [/* 30 vars */]) = 0
That first line is the key. execve
is the library call to load and run a new executable. Once execve
runs we're actually ls
(well, the loader runs first, but that's another article). Interestingly, the call to execve
is in the bash
source code, not the ls
source code. Let's find it in the bash
code:
adamf@kid-charlemagne:/usr/src/bash-4.0/bash-4.0$ find . | xargs grep -n "execve (" ./builtins/exec.def:201: shell_execve (command, args, env); ./execute_cmd.c:4323: 5) execve () ./execute_cmd.c:4466: exit (shell_execve (command, args, export_env)); ./execute_cmd.c:4577: return (shell_execve (execname, args, env)); ./execute_cmd.c:4653:/* Call execve (), handling interpreting shell scripts, and handling ./execute_cmd.c:4656:shell_execve (command, args, env) ./execute_cmd.c:4665: execve (command, args, env);
If we look at line 4323 in execute_cmd.c
we see this helpful comment:
/* Execute a simple command that is hopefully defined in a disk file somewhere. 1) fork () 2) connect pipes 3) look up the command 4) do redirections 5) execve () 6) If the execve failed, see if the file has executable mode set. If so, and it isn't a directory, then execute its contents as a shell script. [...] */
And looking at line 4665 we do see the call to execve
. Take a look at the code around execve
- it's a bunch of error handling but nothing too hard to understand. What's interesting is what is not there; the code exists only to handle errors and nothing to handle success. That is because execve
will only return if there's a failure, which makes sense - a successful call to execve
means we're running something completely different!
Look around execute_cmd.c
at the code around calls to shell_execve
and you'll see that that code is fairly straightforward.
Inside ls(1)
Let's look at what ls
is doing by creating a single file in our directory and ls
'ing that file under strace
.
adamf@kid-charlemagne:~/foo$ touch bar adamf@kid-charlemagne:~/foo$ strace ls bar execve("/bin/ls", ["ls", "bar"], [/* 30 vars */]) = 0
Interesting! We can see that bar
is now being passed to our execve
call. Let's keep looking at the strace
output to find bar
:
stat("bar", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 lstat("bar", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467abbe000 write(1, "bar\n", 4bar ) = 4
Right at the end of the strace
output we see bar
a few times. It looks like bar
gets passed to stat
, lstat
, and write
. Working backwards, we can man 2 write
to figure out that write
sends data to a file descriptor, in this case standard out, which is our screen. So the call to write
is just ls
printing out bar
. The next two library calls, stat
and lstat
, share a man
page, with the difference between the commands being that lstat
will get information on a symbolic link while stat
will only get information on a file. Let's look in in the ls
source code for these calls to see why ls
does both lstat
and stat
:
adamf@kid-charlemagne:/usr/src/coreutils-7.4/src$ grep -n "stat (" ls.c 967: assert (0 <= stat (Name, &sb)); \ 2437: ? fstat (fd, &dir_stat) 2438: : stat (name, &dir_stat)) < 0) 2721: err = stat (absolute_name, &f->stat); 2730: err = stat (absolute_name, &f->stat); 2749: err = lstat (absolute_name, &f->stat); 2837: && stat (linkname, &linkstats) == 0)
That call to lstat
stands out amongst the other calls, and so it is a pretty good guess that lstat
happens for some exceptional reason that programmer would notate with a comment. Looking at line 2749 in ls.c
we see an interesting comment a few lines above:
/* stat failed because of ENOENT, maybe indicating a dangling symlink. Or stat succeeded, ABSOLUTE_NAME does not refer to a directory, and --dereference-command-line-symlink-to-dir is in effect. Fall through so that we call lstat instead. */ } default: /* DEREF_NEVER */ err = lstat (absolute_name, &f->stat); do_deref = false; break; }
That comment means that if we're not talking about a directory and stat
has already succeeded, we need to see if we are looking at a symlink. We can see that this is true by ls
'ing a directory under strace
:
adamf@kid-charlemagne:~/foo$ strace ls /home/adamf/foo/ [...] stat("/home/adamf/foo/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 open("/home/adamf/foo/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) getdents(3, /* 3 entries */, 32768) = 72 getdents(3, /* 0 entries */, 32768) = 0 close(3) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f873dda4000 write(1, "bar\n", 4bar
Note that there was no call to lstat
this time.
Read full article from sysadvent: Day 15 - Down the 'ls' Rabbit Hole
No comments:
Post a Comment