capturer: Easily capture stdout/stderr of the current process and subprocesses

https://travis-ci.org/xolox/python-capturer.svg?branch=master https://coveralls.io/repos/xolox/python-capturer/badge.svg?branch=master

The capturer package makes it easy to capture the stdout and stderr streams of the current process and subprocesses. Output can be relayed to the terminal in real time but is also available to the Python program for additional processing. It’s currently tested on cPython 2.6, 2.7, 3.4, 3.5, 3.6 and PyPy (2.7). It’s tested on Linux and Mac OS X and may work on other unixes but definitely won’t work on Windows (due to the use of the platform dependent pty module). For usage instructions please refer to the documentation.

Status

The capturer package was developed as a proof of concept over the course of a weekend, because I was curious to see if it could be done (reliably). After a weekend of extensive testing it seems to work fairly well so I’m publishing the initial release as version 1.0, however I still consider this a proof of concept because I don’t have extensive “production” experience using it yet. Here’s hoping it works as well in practice as it did during my testing :-).

Installation

The capturer package is available on PyPI which means installation should be as simple as:

$ pip install capturer

There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions ;-).

Getting started

The easiest way to capture output is to use a context manager:

import subprocess
from capturer import CaptureOutput

with CaptureOutput() as capturer:
    # Generate some output from Python.
    print "Output from Python"
    # Generate output from a subprocess.
    subprocess.call(["echo", "Output from a subprocess"])
    # Get the output in each of the supported formats.
    assert capturer.get_bytes() == b'Output from Python\r\nOutput from a subprocess\r\n'
    assert capturer.get_lines() == [u'Output from Python', u'Output from a subprocess']
    assert capturer.get_text() == u'Output from Python\nOutput from a subprocess'

The use of a context manager (the with statement) ensures that output capturing is enabled and disabled at the appropriate time, regardless of whether exceptions interrupt the normal flow of processing.

Note that the first call to get_bytes(), get_lines() or get_text() will stop the capturing of output by default. This is intended as a sane default to prevent partial reads (which can be confusing as hell when you don’t have experience with them). So we could have simply used print to show the results without causing a recursive “captured output is printed and then captured again” loop. There’s an optional partial=True keyword argument that can be used to disable this behavior (please refer to the documentation for details).

Design choices

There are existing solutions out there to capture the stdout and stderr streams of (Python) processes. The capturer package was created for a very specific use case that wasn’t catered for by existing solutions (that I could find). This section documents the design choices that guided the development of the capturer package:

Intercepts writes to low level file descriptors

Libraries like capture and iocapture change Python’s sys.stdout and sys.stderr file objects to fake file objects (using StringIO). This enables capturing of (most) output written to the stdout and stderr streams from the same Python process, however any output from subprocesses is unaffected by the redirection and not captured.

The capturer package instead intercepts writes to low level file descriptors (similar to and inspired by how pytest does it). This enables capturing of output written to the standard output and error streams from the same Python process as well as any subprocesses.

Uses a pseudo terminal to emulate a real terminal

The capturer package uses a pseudo terminal created using pty.openpty() to capture output. This means subprocesses will use ANSI escape sequences because they think they’re connected to a terminal. In the current implementation you can’t opt out of this, but feel free to submit a feature request to change this :-). This does have some drawbacks:

  • The use of pty.openpty() means you need to be running in a UNIX like environment for capturer to work (Windows definitely isn’t supported).

  • All output captured is relayed on the stderr stream by default, so capturing changes the semantics of your programs. How much this matters obviously depends on your use case. For the use cases that triggered me to create capturer it doesn’t matter, which explains why this is the default mode.

    There is experimental support for capturing stdout and stderr separately and relaying captured output to the appropriate original stream. Basically you call CaptureOutput(merged=False) and then you use the stdout and stderr attributes of the CaptureOutput object to get at the output captured on each stream.

    I say experimental because this method of capturing can unintentionally change the order in which captured output is emitted, in order to avoid interleaving output emitted on the stdout and stderr streams (which would most likely result in incomprehensible output). Basically output is relayed on each stream separately after each line break. This means interactive prompts that block on reading from standard input without emitting a line break won’t show up (until it’s too late ;-).

Relays output to the terminal in real time

The main use case of capturer is to capture all output of a snippet of Python code (including any output by subprocesses) but also relay the output to the terminal in real time. This has a couple of useful properties:

  • Long running operations can provide the operator with real time feedback by emitting output on the terminal. This sounds obvious (and it is!) but it is non-trivial to implement (an understatement :-) when you also want to capture the output.
  • Programs like gpg and ssh that use interactive password prompts will render their password prompt on the terminal in real time. This avoids the awkward interaction where a password prompt is silenced but the program still hangs, waiting for input on stdin.

Contact

The latest version of capturer is available on PyPI and GitHub. The documentation is hosted on Read the Docs. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.

License

This software is licensed under the MIT license.

© 2017 Peter Odding.

A big thanks goes out to the pytest developers because pytest’s mechanism for capturing the output of subprocesses provided inspiration for the capturer package. No code was copied, but both projects are MIT licensed anyway, so it’s not like it’s very relevant :-).

API documentation

The following documentation is based on the source code of version 2.4 of the capturer package.

Easily capture stdout/stderr of the current process and subprocesses.

capturer.interpret_carriage_returns(text)

Alias to humanfriendly.terminal.clean_terminal_output().

In capturer version 2.1.2 the interpret_carriage_returns() function was obsoleted by humanfriendly.terminal.clean_terminal_output(). This alias remains for backwards compatibility.

capturer.DEFAULT_TEXT_ENCODING = 'UTF-8'

The name of the default character encoding used to convert captured output to Unicode text (a string).

capturer.GRACEFUL_SHUTDOWN_SIGNAL = 10

The number of the UNIX signal used to communicate graceful shutdown requests from the main process to the output relay process (an integer). See also enable_graceful_shutdown().

capturer.TERMINATION_DELAY = 0.01

The number of seconds to wait before terminating the output relay process (a floating point number).

capturer.PARTIAL_DEFAULT = False

Whether partial reads are enabled or disabled by default (a boolean).

capturer.STDOUT_FD = 1

The number of the file descriptor that refers to the standard output stream (an integer).

capturer.STDERR_FD = 2

The number of the file descriptor that refers to the standard error stream (an integer).

capturer.enable_old_api()[source]

Enable backwards compatibility with the old API.

This function is called when the capturer module is imported. It modifies the CaptureOutput class to install method proxies for get_handle(), get_bytes(), get_lines(), get_text(), save_to_handle() and save_to_path().

capturer.create_proxy_method(name)[source]

Create a proxy method for use by enable_old_api().

Parameters:name – The name of the PseudoTerminal method to call when the proxy method is called.
Returns:A proxy method (a callable) to be installed on the CaptureOutput class.
class capturer.MultiProcessHelper[source]

Helper to spawn and manipulate child processes using multiprocessing.

This class serves as a base class for CaptureOutput and PseudoTerminal because both classes need the same child process handling logic.

__init__()[source]

Initialize a MultiProcessHelper object.

start_child(target)[source]

Start a child process using multiprocessing.Process.

Parameters:target – The callable to run in the child process. Expected to take a single argument which is a multiprocessing.Event to be set when the child process has finished initialization.
stop_children()[source]

Gracefully shut down all child processes.

Child processes are expected to call enable_graceful_shutdown() during initialization.

wait_for_children()[source]

Wait for all child processes to terminate.

enable_graceful_shutdown()[source]

Register a signal handler that converts GRACEFUL_SHUTDOWN_SIGNAL to an exception.

Used by capture_loop() to gracefully interrupt the blocking os.read() call when the capture loop needs to be terminated (this is required for coverage collection).

raise_shutdown_request(signum, frame)[source]

Raise ShutdownRequested when GRACEFUL_SHUTDOWN_SIGNAL is received.

class capturer.CaptureOutput(merged=True, encoding='UTF-8', termination_delay=0.01, chunk_size=1024, relay=True)[source]

Context manager to capture the standard output and error streams.

__init__(merged=True, encoding='UTF-8', termination_delay=0.01, chunk_size=1024, relay=True)[source]

Initialize a CaptureOutput object.

Parameters:
  • merged – Whether to capture and relay the standard output and standard error streams as one stream (a boolean, defaults to True). When this is False the stdout and stderr attributes of the CaptureOutput object are PseudoTerminal objects that can be used to get at the output captured from each stream separately.
  • encoding – The name of the character encoding used to decode the captured output (a string, defaults to DEFAULT_TEXT_ENCODING).
  • termination_delay – The number of seconds to wait before terminating the output relay process (a floating point number, defaults to TERMINATION_DELAY).
  • chunk_size – The maximum number of bytes to read from the captured streams on each call to os.read() (an integer).
  • relay – If this is True (the default) then captured output is relayed to the terminal or parent process, if it’s False the captured output is hidden (swallowed).
initialize_stream(file_obj, expected_fd)[source]

Initialize one or more Stream objects to capture a standard stream.

Parameters:
  • file_obj – A file-like object with a fileno() method.
  • expected_fd – The expected file descriptor of the file-like object.
Returns:

The Stream connected to the file descriptor of the file-like object.

By default this method just initializes a Stream object connected to the given file-like object and its underlying file descriptor (a simple one-liner).

If however the file descriptor of the file-like object doesn’t have the expected value (expected_fd) two Stream objects will be created instead: One of the stream objects will be connected to the file descriptor of the file-like object and the other stream object will be connected to the file descriptor that was expected (expected_fd).

This approach is intended to make sure that “nested” output capturing works as expected: Output from the current Python process is captured from the file descriptor of the file-like object while output from subprocesses is captured from the file descriptor given by expected_fd (because the operating system defines special semantics for the file descriptors with the numbers one and two that we can’t just ignore).

For more details refer to issue 2 on GitHub.

__enter__()[source]

Automatically call start_capture() when entering a with block.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Automatically call finish_capture() when leaving a with block.

is_capturing

True if output is being captured, False otherwise.

start_capture()[source]

Start capturing the standard output and error streams.

Raises:TypeError when output is already being captured.

This method is called automatically when using the capture object as a context manager. It’s provided under a separate name in case someone wants to extend CaptureOutput and build their own context manager on top of it.

finish_capture()[source]

Stop capturing the standard output and error streams.

This method is called automatically when using the capture object as a context manager. It’s provided under a separate name in case someone wants to extend CaptureOutput and build their own context manager on top of it.

allocate_pty(relay_fd=None, output_queue=None, queue_token=None)[source]

Allocate a pseudo terminal.

Internal shortcut for start_capture() to allocate multiple pseudo terminals without code duplication.

merge_loop(started_event)[source]

Merge and relay output in a child process.

This internal method is used when standard output and standard error are being captured separately. It’s responsible for emitting each captured line on the appropriate stream without interleaving text within lines.

get_bytes(partial=False)

Get the captured output as binary data.

Parameters:partial – Refer to get_handle() for details.
Returns:The captured output as a binary string.

Note

This method is a proxy for the get_bytes() method of the PseudoTerminal class. It requires merged to be True and it expects that start_capture() has been called. If this is not the case then TypeError is raised.

get_handle(partial=False)

Get the captured output as a Python file object.

Parameters:partial – If True (not the default) the partial output captured so far is returned, otherwise (so by default) the relay process is terminated and output capturing is disabled before returning the captured output (the default is intended to protect unsuspecting users against partial reads).
Returns:The captured output as a Python file object. The file object’s current position is reset to zero before this function returns.

This method is useful when you’re dealing with arbitrary amounts of captured data that you don’t want to load into memory just so you can save it to a file again. In fact, in that case you might want to take a look at save_to_path() and/or save_to_handle() :-).

Warning

Two caveats about the use of this method:

  1. If partial is True (not the default) the output can end in a partial line, possibly in the middle of an ANSI escape sequence or a multi byte character.
  2. If you close this file handle you just lost your last chance to get at the captured output! (calling this method again will not give you a new file handle)

Note

This method is a proxy for the get_handle() method of the PseudoTerminal class. It requires merged to be True and it expects that start_capture() has been called. If this is not the case then TypeError is raised.

get_lines(interpreted=True, partial=False)

Get the captured output split into lines.

Parameters:
Returns:

The captured output as a list of Unicode strings.

Warning

If partial is True (not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).

Note

This method is a proxy for the get_lines() method of the PseudoTerminal class. It requires merged to be True and it expects that start_capture() has been called. If this is not the case then TypeError is raised.

get_text(interpreted=True, partial=False)

Get the captured output as a single string.

Parameters:
Returns:

The captured output as a Unicode string.

Warning

If partial is True (not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).

Note

This method is a proxy for the get_text() method of the PseudoTerminal class. It requires merged to be True and it expects that start_capture() has been called. If this is not the case then TypeError is raised.

save_to_handle(handle, partial=False)

Save the captured output to an open file handle.

Parameters:
  • handle – A writable file-like object.
  • partial – Refer to get_handle() for details.

Note

This method is a proxy for the save_to_handle() method of the PseudoTerminal class. It requires merged to be True and it expects that start_capture() has been called. If this is not the case then TypeError is raised.

save_to_path(filename, partial=False)

Save the captured output to a file.

Parameters:
  • filename – The pathname of the file where the captured output should be written to (a string).
  • partial – Refer to get_handle() for details.

Note

This method is a proxy for the save_to_path() method of the PseudoTerminal class. It requires merged to be True and it expects that start_capture() has been called. If this is not the case then TypeError is raised.

class capturer.OutputBuffer(fd)[source]

Helper for CaptureOutput.merge_loop().

Buffers captured output and flushes to the appropriate stream after each line break.

__init__(fd)[source]

Initialize an OutputBuffer object.

Parameters:fd – The number of the file descriptor where output should be flushed (an integer).
add(output)[source]

Add output to the buffer and flush appropriately.

Parameters:output – The output to add to the buffer (a string).
flush()[source]

Flush any remaining buffered output to the stream.

class capturer.PseudoTerminal(encoding, termination_delay, chunk_size, relay_fd, output_queue, queue_token)[source]

Helper for CaptureOutput.

Manages capturing of output and exposing the captured output.

__init__(encoding, termination_delay, chunk_size, relay_fd, output_queue, queue_token)[source]

Initialize a PseudoTerminal object.

Parameters:
  • encoding – The name of the character encoding used to decode the captured output (a string, defaults to DEFAULT_TEXT_ENCODING).
  • termination_delay – The number of seconds to wait before terminating the output relay process (a floating point number, defaults to TERMINATION_DELAY).
  • chunk_size – The maximum number of bytes to read from the captured stream(s) on each call to os.read() (an integer).
  • relay_fd – The number of the file descriptor where captured output should be relayed to (an integer or None if output_queue and queue_token are given).
  • output_queue – The multiprocessing queue where captured output chunks should be written to (a multiprocessing.Queue object or None if relay_fd is given).
  • queue_token – A unique identifier added to each output chunk written to the queue (any value or None if relay_fd is given).
attach(stream)[source]

Attach a stream to the pseudo terminal.

Parameters:stream – A Stream object.
start_capture()[source]

Start the child process(es) responsible for capturing and relaying output.

finish_capture()[source]

Stop the process of capturing output and destroy the pseudo terminal.

close_pseudo_terminal()[source]

Close the pseudo terminal’s master/slave file descriptors.

restore_streams()[source]

Restore the stream(s) attached to the pseudo terminal.

get_handle(partial=False)[source]

Get the captured output as a Python file object.

Parameters:partial – If True (not the default) the partial output captured so far is returned, otherwise (so by default) the relay process is terminated and output capturing is disabled before returning the captured output (the default is intended to protect unsuspecting users against partial reads).
Returns:The captured output as a Python file object. The file object’s current position is reset to zero before this function returns.

This method is useful when you’re dealing with arbitrary amounts of captured data that you don’t want to load into memory just so you can save it to a file again. In fact, in that case you might want to take a look at save_to_path() and/or save_to_handle() :-).

Warning

Two caveats about the use of this method:

  1. If partial is True (not the default) the output can end in a partial line, possibly in the middle of an ANSI escape sequence or a multi byte character.
  2. If you close this file handle you just lost your last chance to get at the captured output! (calling this method again will not give you a new file handle)
get_bytes(partial=False)[source]

Get the captured output as binary data.

Parameters:partial – Refer to get_handle() for details.
Returns:The captured output as a binary string.
get_lines(interpreted=True, partial=False)[source]

Get the captured output split into lines.

Parameters:
Returns:

The captured output as a list of Unicode strings.

Warning

If partial is True (not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).

get_text(interpreted=True, partial=False)[source]

Get the captured output as a single string.

Parameters:
Returns:

The captured output as a Unicode string.

Warning

If partial is True (not the default) the output can end in a partial line, possibly in the middle of a multi byte character (this may cause decoding errors).

save_to_handle(handle, partial=False)[source]

Save the captured output to an open file handle.

Parameters:
  • handle – A writable file-like object.
  • partial – Refer to get_handle() for details.
save_to_path(filename, partial=False)[source]

Save the captured output to a file.

Parameters:
  • filename – The pathname of the file where the captured output should be written to (a string).
  • partial – Refer to get_handle() for details.
capture_loop(started_event)[source]

Continuously read from the master end of the pseudo terminal and relay the output.

This function is run in the background by start_capture() using the multiprocessing module. It’s role is to read output emitted on the master end of the pseudo terminal and relay this output to the real terminal (so the operator can see what’s happening in real time) as well as a temporary file (for additional processing by the caller).

class capturer.Stream(fd)[source]

Container for standard stream redirection logic.

Used by CaptureOutput to temporarily redirect the standard output and standard error streams.

is_redirected

True once redirect() has been called, False when redirect() hasn’t been called yet or restore() has since been called.

__init__(fd)[source]

Initialize a Stream object.

Parameters:fd – The file descriptor to be redirected (an integer).
redirect(target_fd)[source]

Redirect output written to the file descriptor to another file descriptor.

Parameters:target_fd – The file descriptor that should receive the output written to the file descriptor given to the Stream constructor (an integer).
Raises:TypeError when the file descriptor is already being redirected.
restore()[source]

Stop redirecting output written to the file descriptor.

exception capturer.ShutdownRequested[source]

Raised by raise_shutdown_request() to signal graceful termination requests (in capture_loop()).