Python subprocess pipeline
Chain subprocess calls, process the output in Python between steps, and write the final result to a file — a complete worked example.
- Chain two subprocess calls with Python processing in between
- Filter and transform captured output using standard Python
- Write the final processed output to a file
The previous lesson explained when to use Python between subprocess calls instead of direct shell pipes. This lesson shows the pattern in a complete, runnable example: get a directory listing, filter it with Python logic, and write the result to a file.
The pattern
The structure of any Python-mediated pipeline is the same:
- Run the first command and capture its output.
- Process the output in Python (filter, transform, aggregate).
- Either write to a file or pass the result to the next command.
The advantage over a shell pipe is that step 2 can use dictionaries, regular expressions, external libraries, conditionals, and anything else Python offers.
A worked example
The example below:
- Uses
os.listdir()to get directory contents (equivalent to runningls) - Filters for Python files in Python (the "in between" step)
- Writes the filtered list to a file
import os
import subprocess
import io
# Step 1: Get directory contents
# In a real script this might be subprocess.run(["find", ".", "-name", "*.py"])
# Here we use os.listdir() which works the same way in terms of data flow
files = os.listdir(".")
# Step 2: Filter in Python — keep only .py files
py_files = [f for f in sorted(files) if f.endswith(".py")]
# Step 3: Write to a file
with open("py_files.txt", "w") as f:
for name in py_files:
f.write(name + "\n")
print(f"Found {len(py_files)} Python files")The filtering step is the key: arbitrary Python logic decides what passes through. You can check file sizes, look up metadata, apply regex patterns — none of which you could do cleanly in a bare shell pipe.
Try it
The runner below chains an actual subprocess call to Python processing and writes to an in-memory buffer:
Notice the three steps are clearly separated in the code. If something goes wrong, the structure tells you exactly where to look: did the command fail (step 1), did the filter produce wrong results (step 2), or did the file write fail (step 3)?
Passing Python output to a second command
When the second step is another subprocess rather than a file, use the input argument:
import subprocess
# Step 1: generate data in Python
data = "\n".join(["cherry", "apple", "banana", "date"])
# Step 2: pipe the Python string into an external command
result = subprocess.run(
["sort"],
input=data,
capture_output=True,
text=True,
check=True,
)
print(result.stdout)
# apple
# banana
# cherry
# dateinput=data is the Python equivalent of echo "$data" | sort. The subprocess reads
the string as if it came from stdin.
The input argument and capture_output=True cannot be combined with
stdin=subprocess.PIPE — they are different ways of providing stdin.
input is the high-level convenience; stdin=PIPE is for streaming.
Where to go next
Next: lab — subprocess pipeline — a longer exercise building a script that generates file metadata with subprocess, aggregates it in Python, and writes a JSON summary report.
Building pipelines
Shell pipes pass the output of one command into the input of the next — learn how to replicate this pattern in Python using Popen, and when to stay in Python vs staying in the shell.
Lab: Subprocess pipeline
Build a script that generates a file list with a system call, processes metadata in Python, and writes a JSON summary — end-to-end practice for the shell and processes module.