Code of the Day
IntermediateShell and Processes

Building pipelines

Shell pipes pass the output of one command into the input of the next — learn how to replicate this pattern in Python using Popen, and when to stay in Python vs staying in the shell.

WorkflowIntermediate6 min read
Recommended first
By the end of this lesson you will be able to:
  • Explain how a shell pipe connects stdout of one process to stdin of the next
  • Implement a two-step pipeline in Python using Popen and stdout=PIPE
  • Decide when to do data manipulation in Python vs composing shell tools

Shell pipes are one of Unix's most powerful ideas. cmd1 | cmd2 routes the stdout of cmd1 directly into the stdin of cmd2, without either command knowing about the other. Thousands of useful data transformations are just combinations of small tools connected this way.

When you automate from Python, you sometimes need to replicate this pattern. Other times, you are better off doing the transformation in Python directly. Knowing when to do each is the skill.

How a shell pipe works

In the shell, sort data.txt | uniq -c runs sort and uniq simultaneously. The OS connects sort's file descriptor 1 (stdout) to uniq's file descriptor 0 (stdin). Data flows between them through an in-kernel buffer — sort writes, uniq reads, both processes run concurrently.

No temporary file is needed. The data never touches disk.

Replicating a pipe with Popen

subprocess.Popen gives you enough control to replicate this:

import subprocess

# Step 1: start the first process, route its stdout to a pipe
proc1 = subprocess.Popen(
    ["echo", "banana\napple\ncherry"],
    stdout=subprocess.PIPE,
)

# Step 2: start the second process, reading from proc1's stdout
proc2 = subprocess.Popen(
    ["sort"],
    stdin=proc1.stdout,
    stdout=subprocess.PIPE,
    text=True,
)

# Allow proc1 to receive a SIGPIPE if proc2 exits early
proc1.stdout.close()

output, _ = proc2.communicate()
print(output)

proc1.stdout.close() in the parent process is important: it removes the parent's reference to the pipe, so proc2 will see EOF when proc1 finishes writing.

You can chain more than two processes. Each Popen reads from the previous one's stdout. The pattern scales, but so does the complexity — beyond two or three steps, consider processing the data in Python between calls instead.

Doing the transformation in Python

Often the cleanest approach is not to pipe commands together at all. Capture the output of the first command, transform it with Python, then either use the result directly or pass it to the next command via stdin:

import subprocess

# Run the first command and capture
result = subprocess.run(["cat", "/etc/hostname"], capture_output=True, text=True, check=True)
raw = result.stdout.strip()

# Transform in Python
transformed = raw.upper()

# Pass to the next command via stdin
result2 = subprocess.run(
    ["wc", "-c"],
    input=transformed,
    capture_output=True,
    text=True,
    check=True,
)
print(result2.stdout.strip())

The input argument passes a string as the process's stdin. This is equivalent to piping from a shell heredoc.

When to use shell pipes vs Python

SituationPreferred approach
Composing standard Unix tools (grep, sort, awk) with no logic in betweenShell pipe or shell=True for a fully static command
Any step needs conditional logic, error handling, or data structure manipulationPython between subprocess calls
Output of one step is large and you want streaming behaviourPopen with stdout=PIPE
You need the intermediate result for something other than pipingCapture with run(), transform in Python

The key insight: a Python variable holding a string is just as good as a pipe for most purposes, and it comes with all of Python's data processing capabilities attached.

Avoid building long pipeline chains with Popen when Python can do the transformation. Each Popen is another process, another pipe, and another error surface. Fewer moving parts means fewer failure modes.

Where to go next

Next: Python subprocess pipeline — a runnable example that builds a two-step pipeline in Python, processes the output in between, and writes the result to a file.

Finished reading? Mark it complete to track your progress.

On this page