RFC Generic Packaging for Languages that have vendor/ Trees

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC Generic Packaging for Languages that have vendor/ Trees

Aleksa Sarai
Hello *,

This is a proposal for having a generic packaging system of RPMs for
languages that use "vendor/" trees. Please respond with any feedback you
have on the details of this proposal.

The main justification for the need for this proposal is that we have
seen the recent rise of languages that have an *enormous* number of
"micro-packages" (JavaScript is the most well-known offender here, where
the majority of widely used packages are only several lines long, but
Rust has a similar issue, and Go/Ruby do too). This has effectively made
it an impracticality (or even an impossibility for some languages) to
create a 1-to-1 RPM mapping for each package. So while a 1-to-1 RPM
mapping is arguably the most ideal (both from a idealogical perspective
and a tooling perspective), the maintenance burden is far too high.

Another problem is that many projects written in these sorts of
languages these days "vendor" their dependencies, usually using a
language-specific package manager to do so. (This is slightly ironic in
my opinion, because if they'd integrated more with distributions this
ideally wouldn't be necessary, but that ship has sailed.) This is a
problem that also needs to be resolved. Luckily such projects usually
have some sort of "lock file" that describes what is present inside the
"vendor/" tree -- this is something that will be useful later. It should
be noted that the 1-to-1 RPM mapping also doesn't help here either as it
further will balloon out the number of packages we would need to have
(as each project might have different version dependencies). Debian has
been attempting to do this with Go packages, and as far as I can see
it's quite a futile effort because of the maintenance burden that comes
from it.

At the moment the way that most packages deal with this problem is that
they just punt completely on reproducibility and audit-ability, and just
vendor all dependencies in a project and then tar up the vendor/ tree
and include it in the OBS project. For a JavaScript project this would
involve just running `yarn <blah>` (or whatever the command is) and then
taking node_modules/ and creating a node_modules.tar.xz that is
included in the specfile. The main problem with this approach currently
is that it is completely unauditable and nobody knows what's inside
that magic vendor blob. *However* the core idea is not completely
insane. The Rust folks have also started doing the same thing with
cargo-vendor.

And here we come to my proposal. The idea is to take what is already
being done in these projects, and create better tooling around it to
make the work of development, maintainence, security, and legal much
easier.

First, we need to provide more metadata about these vendor blobs in the
RPM layer, so that security could at least *track* what versions of
things are used by a project. And in the worst case, it should be
possible to patch a vendor blob. This would likely best be done through
RPM macros, by creating a virtual Provides for each of the vendored
libraries. This matches what Fedora does for bundled libraries[1]. The
Provides could be just as simple as

    Provides: bundled(rust:nix) = 0.8.1

Or something more involved to be extra paranoid:

    Provides: bundled(rust:registry+https://github.com/rust-lang/crates.io-index:nix) = 0.8.1

Secondly, in order to make this vendor archive reproducible, I propose
we have an OBS service that can be used to vendor a source tree (which
can obviously be run either locally or on OBS). It will produce all of
the vendor archives created by language-specific tools, and produce a
language-agnostic manifest of what was downloaded (the name, language,
version, git commit, and so on). The idea is that this manifest could be
used by the RPM macros above rather than writing language-specific
macros.

I have already started working on the OBS service, but I would love to
hear your feedback on this proposal.

[1]: https://fedoraproject.org/wiki/Bundled_Libraries?rd=Packaging:Bundled_Libraries

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC Generic Packaging for Languages that have vendor/ Trees

Aleksa Sarai
On 2017-12-20, Aleksa Sarai <[hidden email]> wrote:
> Secondly, in order to make this vendor archive reproducible, I propose
> we have an OBS service that can be used to vendor a source tree (which
> can obviously be run either locally or on OBS). It will produce all of
> the vendor archives created by language-specific tools, and produce a
> language-agnostic manifest of what was downloaded (the name, language,
> version, git commit, and so on). The idea is that this manifest could be
> used by the RPM macros above rather than writing language-specific
> macros.

I forgot to mention that one benefit of having it as an OBS service is
that we could run source validator on it (in principle at least,
assuming that the language actually creates reproducible vendor trees).

Making things easier for legal would be that we could provide in the
metadata for the vendor tree the subdirectories in the tree that
correspond to each package, and then we could add support for this
vendor concept that way (at the moment I believe that the legal tooling
doesn't have a way to handle vendor archives).

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC Generic Packaging for Languages that have vendor/ Trees

Aleksa Sarai
In reply to this post by Aleksa Sarai
On 2017-12-19, Neal Gompa <[hidden email]> wrote:

> * The current vendoring of rust crates is temporary. We're waiting on
> RPM 4.14[1] and the new product builder to come online (DimStar
> already slapped me once for breaking Tumbleweed with rich deps
> before...). I'm working on making rust2rpm make openSUSE-friendly spec
> files (mainly add the boilerplate header, skip conversion of SPDX to
> Fedora license tags, generate changes file) so that crates can be
> easily packaged and shipped in the distribution. Right now, Fedora has
> well over 230 Rust crates packaged[2], and the packaging for them is
> pretty trivial[3]. We've also got a good handle on cargo integration,
> so crates function as if they're in a local cargo registry for things
> to depend on.
Is there a document somewhere that explains how it works? I read through
the Fedora wiki page on Rust packaging[1] last time the RPM feature was
mentioned on this list, but it doesn't explain anything about the
current status (unless "rust2rpm" is the current status?).

> * I'm not sure why openSUSE hasn't adopted the bundled() Provides
> thing across the board anyway. There are plenty of packages that ship
> vendored trees/libraries and no one knows what they are. In general,
> it's really not a bad idea to do that. In my opinion, it's
> irresponsible to not require what you bundle to be defined.
>
> Generally speaking, I think this is a solid idea, but I solidly do not
> believe we will be continuing the vendored crates practice for much
> longer in Rust.

Okay. I just want to make sure that we don't run into the same
maintainence problem we already have with Ruby packages (which will end
up being worse due to the multi-versioning support in Rust, as well as
the existence of far more micro-packages than in the Ruby universe).
Does the current plan for Rust packaging account for that?

[1]: https://fedoraproject.org/wiki/SIGs/Rust

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

signature.asc (849 bytes) Download Attachment