Usage

Installation

Command to install performance:

python3 -m pip install performance

The command installs a new pyperformance program.

If needed, perf and six dependencies are installed automatically.

performance works on Python 2.7, 3.4 and newer.

On Python 2, the virtualenv program (or the Python module) is required to create virtual environments. On Python 3, the venv module of the standard library is used.

At runtime, Python development files (header files) may be needed to install some dependencies like dulwich_log or psutil, to build their C extension. Commands on Fedora to install dependencies:

  • Python 2: sudo dnf install python-devel
  • Python 3: sudo dnf install python3-devel
  • PyPy: sudo dnf install pypy-devel

In some cases, performance fails to create a virtual environment. In this case, upgrading virtualenv on the system can fix the issue. Example:

sudo python2 -m pip install -U virtualenv

Run benchmarks

Commands to compare Python 2 and Python 3 performances:

pyperformance run --python=python2 -o py2.json
pyperformance run --python=python3 -o py3.json
pyperformance compare py2.json py3.json

Note: python3 -m performance ... syntax works as well (ex: python3 -m performance run -o py3.json), but requires to install performance on each tested Python version.

JSON files are produced by the perf module and so can be analyzed using perf commands:

python3 -m perf show py2.json
python3 -m perf check py2.json
python3 -m perf metadata py2.json
python3 -m perf stats py2.json
python3 -m perf hist py2.json
python3 -m perf dump py2.json
(...)

It’s also possible to use perf to compare results of two JSON files:

python3 -m perf compare_to py2.json py3.json --table

pyperformance actions:

run                 Run benchmarks on the running python
show                Display a benchmark file
compare             Compare two benchmark files
list                List benchmarks which run command would run
list_groups         List all benchmark groups
venv                Actions on the virtual environment

Common options

Options available to all commands:

-p PYTHON, --python PYTHON
                      Python executable (default: use running Python)
--venv VENV           Path to the virtual environment
--inherit-environ VAR_LIST
                      Comma-separated list of environment variable names
                      that are inherited from the parent environment when
                      running benchmarking subprocesses.

run

Options of the run command:

-r, --rigorous        Spend longer running tests to get more accurate
                      results
-f, --fast            Get rough answers quickly
-m, --track-memory    Track memory usage. This only works on Linux.
-b BM_LIST, --benchmarks BM_LIST
                      Comma-separated list of benchmarks to run. Can contain
                      both positive and negative arguments:
                      --benchmarks=run_this,also_this,-not_this. If there
                      are no positive arguments, we'll run all benchmarks
                      except the negative arguments. Otherwise we run only
                      the positive arguments.
--affinity CPU_LIST   Specify CPU affinity for benchmark runs. This way,
                      benchmarks can be forced to run on a given CPU to
                      minimize run to run variation. This uses the taskset
                      command.
-o FILENAME, --output FILENAME
                      Run the benchmarks on only one interpreter and write
                      benchmark into FILENAME. Provide only baseline_python,
                      not changed_python.
--append FILENAME     Add runs to an existing file, or create it if it
                      doesn't exist

show

Usage:

show FILENAME

compare

Options of the compare command:

-v, --verbose         Print more output
-O STYLE, --output_style STYLE
                      What style the benchmark output should take. Valid
                      options are 'normal' and 'table'. Default is normal.

list

Options of the list command:

-b BM_LIST, --benchmarks BM_LIST
                      Comma-separated list of benchmarks to run. Can contain
                      both positive and negative arguments:
                      --benchmarks=run_this,also_this,-not_this. If there
                      are no positive arguments, we'll run all benchmarks
                      except the negative arguments. Otherwise we run only
                      the positive arguments.

Use python3 -m performance list -b all to list all benchmarks.

venv

Options of the venv command:

-p PYTHON, --python PYTHON
                      Python executable (default: use running Python)
--venv VENV           Path to the virtual environment

Actions of the venv command:

show      Display the path to the virtual environment and it's status (created or not)
create    Create the virtual environment
recreate  Force the recreation of the the virtual environment
remove    Remove the virtual environment

Compile Python to run benchmarks

pyperformance actions:

compile        Compile, install and benchmark CPython
compile_all    Compile, install and benchmark multiple branches and revisions of CPython
upload         Upload JSON file

All these commands require a configuration file.

Simple configuration usable for compile (but not for compile_all nor upload), doc/benchmark.conf:

[config]
json_dir = ~/prog/python/bench_json

[scm]
repo_dir = ~/prog/python/master
update = True

[compile]
bench_dir = ~/prog/python/bench_tmpdir

[run_benchmark]
system_tune = True
affinity = 2,3

Configuration file sample with comments, doc/benchmark.conf.sample:

[config]
# Directory where JSON files are written.
# - uploaded files are moved to json_dir/uploaded/
# - results of patched Python are written into json_dir/patch/
json_dir = ~/json

# If True, compile CPython is debug mode (LTO and PGO disabled),
# run benchmarks with --debug-single-sample, and disable upload.
#
# Use this option used to quickly test a configuration.
debug = False


[scm]
# Directory of CPython source code (Git repository)
repo_dir = ~/cpython

# Update the Git repository (git fetch)?
update = True

# Name of the Git remote, used to create revision of
# the Git branch. For example, use revision 'remotes/origin/3.6'
# for the branch '3.6'.
git_remote = remotes/origin


[compile]
# Create files into bench_dir:
# - bench_dir/bench-xxx.log
# - bench_dir/prefix/: where Python is installed
# - bench_dir/venv/: Virtual environment used by performance
bench_dir = ~/bench_tmpdir

# Link Time Optimization (LTO)?
lto = True

# Profiled Guided Optimization (PGO)?
pgo = True

# Install Python? If false, run Python from the build directory
#
# WARNING: Running Python from the build directory introduces subtle changes
# compared to running an installed Python. Moreover, creating a virtual
# environment using a Python run from the build directory fails in many cases,
# especially on Python older than 3.4. Only disable installation if you
# really understand what you are doing!
install = True


[run_benchmark]
# Run "sudo python3 -m perf system tune" before running benchmarks?
system_tune = True

# --benchmarks option for 'performance run'
benchmarks =

# --affinity option for 'perf system tune' and 'performance run'
affinity =

# Upload generated JSON file?
#
# Upload is disabled on patched Python, in debug mode or if install is
# disabled.
upload = False


# Configuration to upload results to a Codespeed website
[upload]
url =
environment =
executable =
project =


[compile_all]
# List of CPython Git branches
branches = default 3.6 3.5 2.7


# List of revisions to benchmark by compile_all
[compile_all_revisions]
# list of 'sha1=' (default branch: 'master') or 'sha1=branch'
# used by the "pyperformance compile_all" command

compile

Usage:

pyperformance compile CONFIG_FILE REVISION [BRANCH]
    [--patch=PATCH_FILE]
    [-U/--no-update]
    [-T/--no-tune]

Compile Python, install Python and run benchmarks on the installed Python.

Options:

  • --no-update: Don’t update the Git repository.
  • --no-tune: Don’t run perf system tune to tune the system for benchmarks.

If the branch argument is not specified:

  • If REVISION is a branch name: use it as a the branch, and get the latest revision of this branch
  • Otherwise, use master branch by default

Notes:

compile_all

Usage:

pyperformance compile_all CONFIG_FILE

Compile all branches and revisions of CONFIG_FILE.

upload

Usage:

pyperformance upload CONFIG_FILE JSON_FILE

Upload results from a JSON file to a Codespeed website.

How to get stable benchmarks

performance virtual environment

To run benchmarks, performance first creates a virtual environment. It installs requirements with fixed versions to get a reproductible environment. The system Python has unknown module installed with unknown versions, and can have .pth files run at Python startup which can modify Python behaviour or at least slow down Python startup.

What is the goal of performance

A benchmark is always written for a specific purpose. Depending how the benchmark is written and how the benchmark is run, the result can be different and so have a different meaning.

The performance benchmark suite has multiple goals:

  • Help to detect performance regression in a Python implementation
  • Validate that an optimization change makes Python faster and don’t performance regressions, or only minor regressions
  • Compare two implementations of Python, not only CPython 2 vs Python 3, but also CPython vs PyPy
  • Showcase of Python performance which ideally would be representative of performances of applications running on production

Don’t disable GC nor ASLR

The perf module and performance benchmarks are designed to produce reproductible results, but not at the price of running benchmarks in a special mode which would not be used to run applications in production. For these reasons, the Python garbage collector, Python randomized hash function and system ASLR (Address Space Layout Randomization) are not disabled. Benchmarks don’t call gc.collect() neither since CPython implements it with stop-the-world and so applications don’t call it to not kill performances.

Include outliers and spikes

Moreover, while the perf documentation explains how to reduce the random noise of the system and other applications, some benchmarks use the system and so can get different timing depending on the system workload, depending on I/O performances, etc. Outliers and temporary spikes in results are not automatically removed: values are summarized by computing the average (arithmetic mean) and standard deviation which “contains” these spikes, instead of using median and the median absolute deviation for example which to ignore outliers. It is deliberate choice since applications running in production are impacted by such temporary slowdown caused by various things like a garbage collection or a JIT compilation.

Warmups and steady state

A borderline issue are the benchmarks “warmups”. The first values of each worker process are always slower: 10% slower in the best case, it can be 1000% slower or more on PyPy. Right now (2017-04-14), performance ignore first values considered as warmup until a benchmark reachs its “steady state”. The “steady state” can include temporary spikes every 5 values (ex: caused by the garbage collector), and it can still imply further JIT compiler optimizations but with a “low” impact on the average performance.

To be clear “warmup” and “steady state” are a work-in-progress and a very complex topic, especially on PyPy and its JIT compiler.

Notes

Tool for comparing the performance of two Python implementations.

pyperformance will run Student’s two-tailed T test on the benchmark results at the 95% confidence level to indicate whether the observed difference is statistically significant.

Omitting the -b option will result in the default group of benchmarks being run Omitting -b is the same as specifying -b default.

To run every benchmark pyperformance knows about, use -b all. To see a full list of all available benchmarks, use –help.

Negative benchmarks specifications are also supported: -b -2to3 will run every benchmark in the default group except for 2to3 (this is the same as -b default,-2to3). -b all,-django will run all benchmarks except the Django templates benchmark. Negative groups (e.g., -b -default) are not supported. Positive benchmarks are parsed before the negative benchmarks are subtracted.

If --track_memory is passed, pyperformance will continuously sample the benchmark’s memory usage. This currently only works on Linux 2.6.16 and higher or Windows with PyWin32. Because --track_memory introduces performance jitter while collecting memory measurements, only memory usage is reported in the final report.