Picking Lower Bounds Versions for your Dependencies
I always get an uneasy feeling when explaining this topic to new programmers. I limit myself to something along these lines: “In this case I’m choosing these versions because of this and that, were the circumstances different I would probably have chosen differently”. In these sessions I try to stick to the act of coding so I try to not digress too much into such hairy matters as version management. Nevertheless, I think a decent criterion for picking these versions is something important for a developer expected to work independently.
I don’t think a precise guide could be written on the topic. The ability comes naturally with experience and some common sense. In this post I’ll try try to bring forward some factors to weight in when choosing the minimal dependency versions.
For concrete examples I’m focusing on the Haskell ecosystem, where I’m spending most of my time these days, but I guess it could be directly translated to other ecosystems.
I’m assuming the word package is a unit of code shipped with a version number. In Haskell this is a Cabal package, but it could be a Python wheel, a Ruby gem, or a Rust crate.
There are 2 non-mutually exclusive broad categories here, although the distinction can get quite blurry.
In this kind of deployment you are directly pushing your code into an environment where it’ll be built1 and executed. Here you have precise knowledge of the environment where your code will run.
This is the typical setting for web server applications. In this case, version
bounds are not so important, it’s alright to just set the minimum versions
to the ones you are using for development. However in this scenario it’s
almost always a good idea to pin all the dependency when building. You
want deterministic repeatable builds and avoid nasty surprises, specially
when dependencies get updated. For most packaging systems there is always
some mechanism for freezing versions outside the package specification. In
Haskell you can manually2 pin dependency versions in the
cabal.config or you
can use Stackage and stack to manage version dependency sets smoothly.
At the package level you want to maintain maximum flexibility so that you can test different dependency versions easily, specially to get early warnings when dependencies are updated. You could think of the versions limits in the core package specification as the versions that will never be picked, no matter the circumstance.
In this case you are packing your code and uploading it to some kind of central package repository. Consumers will pull the package and install it on their system. Your code will run on systems that are outside of your control.
A frequent use case is to support as many target users as possible. Often enough, the users will run your code on very heterogeneous environments and you don’t want them to step into any problems when trying your new shiny package.
By default it’s wise to aim for low dependency versions, but you don’t need to go as far low as possible. It’s usually a good idea to target the lowest versions that are still getting maintenance upgrades (more about this later). For example, in the case of Haskell, unless you have a specific reason, I think it’s reasonable to tell your users that you are not supporting GHC-7.4.3 (Haskell compiler) anymore. You can select the version cut-off based on the fact that the oldest GHC version of the major Linux distributions, is GHC-7.6.3, in Debian Jessie3. But your mileage may vary, it all depends on the users you are targeting. The more data you have about them, the better you’d be able to make these kind of decisions.
In my experience, for pull-based packages, it’s best to build and test your code with both the maximum and minimum versions at the same time. If running both tests takes too long, you can move one test suite, the minimum one usually, to a CI system. In any case, it only makes sense to test both the latest and the oldest sets of dependencies, don’t bother trying different set combinations unless you have a very specific reason, I’ll explain why below.
For new libraries there is just a single channel following the latest version, but for more mature ones you can usually identify at least an extra channel with more conservative releases. Notice here that even if the original authors don’t explicitly maintain multiple channels, you can still find downstream packagers that do keep releasing security/performance fixes while maintaining a frozen API. And even without 3rd party packagers it’s possible to distinguish virtual channels with some implicit knowledge gained by social means. For example, do you know which version is mostly used by the community? or what early adopters are reporting when trying newer versions? It’s good to follow bug reports, mailing lists or relevant forums to get a sense of the stability of the packages.
How aggressively new features get released on each channel vary greatly between authors and communities. Some communities are notoriously disciplined about maintaining strict release policies but you should always keep an eye for authors breaking ranks. In general, you can always assume that with new features inevitably come new bugs.
Sometimes, the build with the minimum version dependencies represents the most stable edition of your code (instead of just providing extended backwards compatibility), so it’s a good rule of thumb to keep it in sync with a conservative but still active channel that you trust. If you find a killer feature that you absolutely need in production, you should be aware of the responsibilities coming with living in the bleeding-edge. For example, is the upstream author responsive to bug reports? Do I have the capability to fix bugs if push comes to shove? Notice that, although ugly and cumbersome, there are mechanisms to get conditional compilation based on the dependency version being present while building. In Haskell this is done with CPP macros.
While thinking about this, it’s also a good moment to consider any security implications. Can you afford the risks of the bleeding-edge? Perhaps your application is a prototype, but are you sure it’ll remain one? What would need to be done if it goes into production?
Known Working Sets
Identifying individual channels for each package doesn’t guarantee they will work together. What you need is a distribution of packages with versions that are frequently tested together for compatibility. These distributions usually have their own release channels as well.
Theoretically, you could pick and choose different channels for each package, and although this approach may work for projects with few dependencies, in practice this quickly spirals out of control as the number dependencies grows. If you avoid running into dependency hell, the combination of versions can make your build so unique that any potential bugs you find may be irrelevant or hard to reproduce for the rest of the community. By sticking to a trusted distribution you take advantage of more people looking for bugs in the same build you are working or if you discover the bugs yourself you’ll get better support if your working set of dependencies is a common one. This could be considered a weaker version of the Linus’s Law.
When choosing a distribution it’s often helpful to distinguish 2 rough categories: language runtime specific but system agnostic, and system specific but language runtime agnostic4.
For example, in Haskell, Stackage LTS, or even the Haskell Platform, can be considered examples of the first kind. No matter what OS or hardware architecture you are on (provided GHC supports it), if you pick the minimum versions from any of those sets, you can be pretty confident that there won’t be compatibility issues. This type of distributions work very well when your whole codebase is written for the same language runtime. However, when the number of dependencies not included in the distribution grows (for example, dependencies coming from a different language runtime), your chances of running into problems grows, too.
The other kind of distributions are the ones providing full system integration for a given hardware architecture, with kernel and userspace applications being released together. Here I have in mind the typical Linux distributions and the BSDs5.
System distributions provide the strongest guarantees that all moving parts work together, specially if you get your package included in the distribution. But doing so can be quite hard, you’ll have to make sure your packages adheres to all the distributions policies, which usually are quite strict.
That’s why integrating your code into an system distribution can sometimes add non-trivial overhead. Also, focusing on just one OS distribution can reduce drastically the users you can target. This is not an issue for push based deployments though.
Even if you don’t use system distributions directly, you can still extract a good known working set from the versions they selected. You have to keep an eye on any patches the packagers may have introduced that are not included upstream. If those changes are not distribution specific, they are usually propagated upstream, but sometimes for whatever reason, upstream authors don’t accept them.
For the lower bounds of pull-based Haskell packages, I usually follow the Debian stable channel which usually has oldest set of versions still being maintained.
For push based deployments, a very popular system distribution within the Haskell community is NixOS. Aside of the many advantages the Nix package manager over conventional managers, from a curation perspective, the NixOS community is extremely responsive and provides a very smooth barrier of entry for new Haskell packages6.
It’s also worth mentioning that the latest upstream versions of every dependencies can be considered a virtual quasi-distribution. Although this target obviously moves very fast, it might not be as chaotic as you might expect. Most upstream developers give priority to the latest version, so they usually make sure their software works well with the latest versions of other dependencies. This is not a bad choice for experimental and prototype projects, and even if you are relying exclusively on pinned versions, it’s always useful to test against the latest dependencies as a way of getting a heads up on what needs to be done to keep your code up-to-date.
I deliberately omitted writing about new projects like Docker or Snappy Ubuntu that claim to solve many all our deployment issues with container superpowers. I do believe container based deployments, both for server and for end users, are the right trend but I’m still, as of today, not sold on using Docker for production. But that’s a story for another day.
I also don’t want to get into the topic of upper version bounds because it’s a whole different debate, and there are many valid arguments pro and against. I personally lean towards not putting any restrictions because it makes upgrading new packages an easy process, even at the cost of having some users getting broken builds when new dependencies come out. This shouldn’t be a problem for production systems because I’m assuming prudent users should be using pinned versions and building their packages before deployment. However if for some reason, your package is broken and you know you can’t fix it soon enough, I’d say it’s OK to make a quick release with upper version bounds if you already know it will break. I know there are many counterarguments to my position, like for example, when you want to make sure a package released today will be buildable in 20 years at that particular version. I’d mark that old version as deprecated and forget about it, but that’s sacrilege for some. In the end this is all bikeshedding to me, so even if I’d rather have no upper limits, I don’t think it’s a big deal having to handle them.
Or wherever you build it, it should be equivalent to building it on the target system. ↩
You could use
cabal freezefor quick and dirty pinning, but don’t recommend it when strict control of versions is desired. ↩
In Wheezy GHC-7.4.3, is still officially maintained in extended support if you really want to go back. ↩
A particular underrated package management system that is both language and OS portable is pkgsrc. Unfortunately, although Haskell curation is improving, it’s still far from other distributions. ↩
You could stretch the concept to include commercial app stores, or even runtime environments such as GNOME or frameworks like RxJS. ↩
yes, creating Nix expressions adds overhead, but believe me, it’s much more accessible than creating good deb or rpm source packages. ↩