Motivation & Comparisons¶
Why?¶
At LinkedIn we ship hundreds of command line utilities to every machine in our data-centers and all of our employees workstations. The vast majority of these utilities are written in Python. In addition to these utilities we also have many internal libraries that are uprev’d daily.
Because of differences in iteration rate and the inherent problems present when dealing with such a huge dependency graph, we need to package the executables discretely. Initially we took advantage of the great open source tool PEX. PEX elegantly solved the isolated packaging requirement we had by including all of a tool’s dependencies inside of a single binary file that we could then distribute!
However, as our tools matured and picked up additional dependencies, we became acutely aware of the
performance issues being imposed on us by pkg_resources
’s
Issue 510. Since PEX leans heavily on
pkg_resources
to bootstrap its environment, we found ourselves at an impass: lose out on the
ability to neatly package our tools in favor of invocation speed, or impose a few second
performance penalty for the benefit of easy packaging.
After spending some time investigating extricating pkg_resources from PEX, we decided to start from
a clean slate and thus shiv
was created.
How?¶
Shiv exploits the same features of Python as PEX, packing __main__.py
into a zipfile with a
shebang prepended (akin to zipapps, as defined by
PEP 441), extracting a dependency directory and
injecting said dependencies at runtime. We have to credit the great work by @wickman, @kwlzn,
@jsirois and the other PEX contributors for laying the groundwork!
The primary differences between PEX and shiv are:
shiv
completely avoids the use ofpkg_resources
. If it is included by a transitive dependency, the performance implications are mitigated by limiting the length ofsys.path
. Internally, at LinkedIn, we always include the -s and -E Python interpreter flags by specifying--python "/path/to/python -sE"
, which ensures a clean environment.Instead of shipping our binary with downloaded wheels inside, we package an entire site-packages directory, as installed by
pip
. We then bootstrap that directory post-extraction via the stdlib’ssite.addsitedir
function. That way, everything works out of the box: namespace packages, real filesystem access, etc.
Because we optimize for a shorter sys.path
and don’t include pkg_resources
in the critical
path, executables created with shiv
can outperform ones created with PEX by almost 2x. In most
cases the executables created with shiv
are even faster than running a script from within a
virtualenv!