Tuesday, May 20, 2014

How subprocess and file descriptors work in Python

Frequently, I use subprocess in Python to spawn other tools to do some tasks I need. Recently, we moved a large piece of software from Linux-based to Windows-based servers. And the nightmare begun.

There are many differences between Linux and Windows in how file descriptors (or file handles) are handled by the Python interpreter. Let see.

Subprocess

It is fairly easy to use subprocess in Python. You just use the Popen constructor. Here is an example calling the ls command and retrieving the output.
import subprocess
p = subprocess.subprocess(['ls'], stdout=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout

File Descriptors (or File Handles)

File descriptors are the integers that are used to identify the open files (or pipes, fifos, tty, anything) that any running program has currently open. If you do not know what a file descriptor is, have a look at the file descriptor Wikipedia article.

When any program is "forked", the new process (usually) inherits the file descriptors of its parent.

Python and File Descriptors

Python works more or less in a similar way. File descriptors in Python are usually not handled directly. Instead, they are managed through higher level objects, like Python file objects, which automatically handle creating and destroying file descriptors using underlying C libraries.

How to Get Opened File Descriptors in Python

In order for us to troubleshoot our issues, we needed a way to get the valid file descriptors in a Python script. So, we crafted the following script. Note that we only check file descriptors from 0 to 100, since we do not open so many files concurrently.

fd_table_status.py :

import os
import stat

_fd_types = (
    ('REG', stat.S_ISREG),
    ('FIFO', stat.S_ISFIFO),
    ('DIR', stat.S_ISDIR),
    ('CHR', stat.S_ISCHR),
    ('BLK', stat.S_ISBLK),
    ('LNK', stat.S_ISLNK),
    ('SOCK', stat.S_ISSOCK)
)

def fd_table_status():
    result = []
    for fd in range(100):
        try:
            s = os.fstat(fd)
        except:
            continue
        for fd_type, func in _fd_types:
            if func(s.st_mode):
                break
        else:
            fd_type = str(s.st_mode)
        result.append((fd, fd_type))
    return result

def fd_table_status_logify(fd_table_result):
    return ('Open file handles: ' +
            ', '.join(['{0}: {1}'.format(*i) for i in fd_table_result]))

def fd_table_status_str():
    return fd_table_status_logify(fd_table_status())

if __name__=='__main__':
    print fd_table_status_str()

When simply run, it will show all open file descriptors and their respective type:

$> python fd_table_status.py
Open file handles: 0: CHR, 1: CHR, 2: CHR
$>

The output is the same by calling fd_table_status_str() .

Inherited File Descriptors - Linux vs Windows

To see the behavior, we will run the following script in Windows and in Linux. Note that the differentiated output is marked in bold.

test_fd_handling.py :

import fd_table_status
import subprocess
import platform

fds = fd_table_status.fd_table_status_str

if platform.system()=='Windows':
    python_exe = r'C:\Python27\python.exe'
else:
    python_exe = 'python'

print '1) Initial file descriptors:\n' + fds()
f = open('fd_table_status.py', 'r')
print '2) After file open, before Popen:\n' + fds()
p = subprocess.Popen(['python', 'fd_table_status.py'],
                     stdin=subprocess.PIPE,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE)
print '3) After Popen, before reading piped output:\n' + fds()
result = p.communicate()
print '4) After Popen.communicate():\n' + fds()
del p
print '5) After deleting reference to Popen instance:\n' + fds()
del f
print '6) After deleting reference to file instance:\n' + fds()
print '7) child process had the following file descriptors:'
print result[0][:-1]

Linux output

1) Initial file descriptors:
Open file handles: 0: CHR, 1: CHR, 2: CHR
2) After file open, before Popen:
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG
3) After Popen, before reading piped output:
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG, 5: FIFO, 6: FIFO, 8: FIFO
4) After Popen.communicate():
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG
5) After deleting reference to Popen instance:
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG
6) After deleting reference to file instance:
Open file handles: 0: CHR, 1: CHR, 2: CHR
7) child process had the following file descriptors:
Open file handles: 0: FIFO, 1: FIFO, 2: FIFO, 3: REG

Windows output

1) Initial file descriptors:
Open file handles: 0: CHR, 1: CHR, 2: CHR
2) After file open, before Popen:
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG
3) After Popen, before reading piped output:
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG, 4: FIFO, 5: FIFO, 6: FIFO
4) After Popen.communicate():
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG, 5: FIFO, 6: FIFO
5) After deleting reference to Popen instance:
Open file handles: 0: CHR, 1: CHR, 2: CHR, 3: REG
6) After deleting reference to file instance:
Open file handles: 0: CHR, 1: CHR, 2: CHR
7) child process had the following file descriptors:
Open file handles: 0: FIFO, 1: FIFO, 2: FIFO

Step-by-step Output Review

I will be tracing the output of both runs step-by-step and explain (or try to explain) the output.
  1. The 3 initial file descriptors 0, 1, 2, (stdin, stdout, stderr) are connected to the TTY (the controlling terminal) in both cases. Type CHR indicates a special character device, which is our attached terminal (or pseudo-terminal).
  2. After a file is opened (for reading, but it does not matter), an new file descriptor (3) is added in both cases. Type REG indicates a regular file.
  3. After the Popen call, three additional file descriptors are added, 1 for each anonymous pipe created during the spawn of the new program. The numbers of the file descriptors do not matter, so do not pay attention to them. It is up to the OS to assign file descriptor values, so this is accepted. All goes well up to here. Type FIFO indicates a pipe (named pipe or anonymous pipe).
  4. Popen.communicate() does the following: a) sends any input to the child-process (we do not specify any), b) closes the input pipe, and c) reads all available output (both stdout and stderr) until the child terminates. In this step, we have different behavior in Windows and in Linux. In Linux, all three file descriptors in the parent process are closed, while in Windows the two output file descriptors (used for stdout and stderr by the child-process) are retained. Despite the fact that the child-process has terminated, in Windows, the two pipes remain open. This has a significant implication: if the Popen instance is not destroyed (e.g. by keeping a reference somehow, and thus not allowing the garbage collector to destroy the object), the pipes remain opened. If this is continued in further calls, the "zombie" file descriptors from the pipes will pile up, and eventually cause a Python exception "IOError: [Errno 24] Too many open files". Remember, this happens only in Windows!
  5. The reference to the Popen instance is removed, the Garbage Collector destroys the object, and the remaining pipes are also destroyed in Windows. Thus, the file descriptors are now the same with the Linux run. Expected behavior.
  6. The same happens when deleting the reference to the file object. Expected behavior.
  7. Now, we look at the file descriptors of the child-process (we retrieved the output of the child program when we did the communicate() call previously). In the Linux run, we see that the child-process has the previously opened file also available as a valid file descriptor, which is the expected behavior. But in the Windows run, we see that the child process does not have this file descriptor available! This is probably caused by the Windows-version of the Python interpreter. I have run other command-line tools and examined their behavior. It seems that the Python interpreter in Windows closes all file descriptors (besides 0, 1, 2) upon start-up. This requires further investigation to confirm the observed behavior.

Wrapper Python Script to Close All File Descriptors

Two simple scripts follow, which close all file descriptors before running a command. Both are called with the command to run as the 1st command-line argument, and the arguments to the command as the following command-line arguments.

Running the final program as a subprocess.

import sys
import os
import subprocess

if __name__=='__main__':
    os.closerange(3, 100)
    subprocess_args = sys.argv[1:]
    if not subprocess_args:
        print ("USAGE: python {0} .....\n\n"
               "where are arguments passed to subprocess.Popen")
        sys.exit(1)
    popen = subprocess.Popen(subprocess_args, stdin=0, stdout=1, stderr=2)
    exit_status = popen.wait()
    sys.exit(exit_status)

Running the final program using execv.

import sys
import os

if __name__=='__main__':
    os.closerange(3, 100)
    subprocess_args = sys.argv[1:]
    if not subprocess_args:
        print ("USAGE: python {0} .....\n\n"
               "where are arguments passed to os.execvp()")
        sys.exit(1)
    os.execvp(subprocess_args[0], subprocess_args)

Conclusions

Some important conclusions:
  • Make sure that Popen instances are destroyed.
  • If you:
    a) are in Windows, and
    b) are spawning an external tool to do some processing, and
    c) have opened files that you want to work on (e.g. move) concurrently,
    THEN: use one of the above wrapper scripts. It might save you a lot of trouble.

Further Reading and Information

Some more information can be found in Python PEP-446, which might explain the behavior observed above. May be a better solution can be found for the case of inherited file handles in Windows.

An alternative (and probably better) way for achieving the same result (but avoiding the use of the above wrapper scripts) is to make a Win32 system call and mark a file descriptor as non-inheritable. In the following example code, I define an open() replacement that opens the file and marks the resulting file descriptor as non-inheritable.

import msvcrt
import win32api
import win32con
def my_open(*args, **kwargs):
    f = open(*args, **kwargs)
    win32api.SetHandleInformation(
        msvcrt.get_osfhandle(f.fileno()),
        win32con.HANDLE_FLAG_INHERIT,
        0)
    return f

Although this behavior does not cause the same issues in Linux as it does in Windows, you can achieve the same behavior by setting close_fd=True in the subprocess.Popen() constructor. In Linux, it works fine.


1 comment:

Krishna Chaurasia said...

Thanks for the informative post!