python .pyc packaging

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

python .pyc packaging

Bernhard M. Wiedemann-5
Via https://en.opensuse.org/openSUSE:Reproducible_Builds

I found that when we build python packages like python-amqp or
python-binplist

it contains a .pyc file for every .py file and for every build these
.pyc files differ, because they contain the timestamp of the
corresponding source file and for some source files this is the time of
build.

http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content

http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content


I was wondering how to best get those to build bit-by-bit identical rpms.

I assume, we want to keep the concept of .pyc files, since they provide
some performance gain (e.g. I measured 'openstack --help' taking only
1.5 seconds with .pyc files versus 2.5 seconds without (on another
machine it was 12 vs 13 seconds))

But why do we have to ship .pyc files as part of our binary rpms? They
waste disk space and bandwidth for our mirrors and users.
They could be created in a %post or %posttrans hook when installing the
rpm (or do they need special build deps?)
It might even be, that compiling them on the destination is faster than
transferring and unpacking the LZMA compressed version.


The less intrusive alternative approach would be to touch .py files to a
constant older date (e.g. $SOURCE_DATE_EPOCH if set) before generating
the .pyc files.

What do you think which way to go?

Ciao
Bernhard M.
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Jan Engelhardt-4

On Friday 2017-02-17 06:33, Bernhard M. Wiedemann wrote:
>
>I assume, we want to keep the concept of .pyc files, since they provide
>some performance gain[...]
>But why do we have to ship .pyc files as part of our binary rpms? They
>waste disk space and bandwidth for our mirrors and users.
>They could be created in a %post or %posttrans hook when installing the
>rpm (or do they need special build deps?)

- It could prolong the installation time.
- rpm -qi's Size field is further away from the real installation size
  ("yast said it would take 1.2GB now it's 2.0.."-kind of thing)
- Creating them in %post, i.e. directly on the end-user system,
  in a way defeats the purpose of a precompiled distribution.

>It might even be, that compiling them on the destination is faster than
>transferring and unpacking the LZMA compressed version.

Feel free to take numbers on a 32-bit Raspberry :-p
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Simon Lees-3


On 02/17/2017 05:16 PM, Jan Engelhardt wrote:

>
> On Friday 2017-02-17 06:33, Bernhard M. Wiedemann wrote:
>>
>> I assume, we want to keep the concept of .pyc files, since they provide
>> some performance gain[...]
>> But why do we have to ship .pyc files as part of our binary rpms? They
>> waste disk space and bandwidth for our mirrors and users.
>> They could be created in a %post or %posttrans hook when installing the
>> rpm (or do they need special build deps?)
>
> - It could prolong the installation time.
> - rpm -qi's Size field is further away from the real installation size
>   ("yast said it would take 1.2GB now it's 2.0.."-kind of thing)
> - Creating them in %post, i.e. directly on the end-user system,
>   in a way defeats the purpose of a precompiled distribution.
The other reason is if you don't package them, you need to add specific
code in the %postun to check if they were created (the user may install
and never run them) then remove them if the exist. Making sure this
happens for each pyc file is often alot more effort then just packaging
them especially if subdirectories are involved.

>
>> It might even be, that compiling them on the destination is faster than
>> transferring and unpacking the LZMA compressed version.
>
> Feel free to take numbers on a 32-bit Raspberry :-p
>

--

Simon Lees (Simotek)                            http://simotek.net

Emergency Update Team                           keybase.io/simotek
SUSE Linux                           Adelaide Australia, UTC+10:30
GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Stefan Behlert
In reply to this post by Bernhard M. Wiedemann-5
Moin,


On Feb 17, 17 05:33:02 +0000, Bernhard M. Wiedemann wrote:

> Via https://en.opensuse.org/openSUSE:Reproducible_Builds
>
> I found that when we build python packages like python-amqp or
> python-binplist
>
> it contains a .pyc file for every .py file and for every build these
> .pyc files differ, because they contain the timestamp of the
> corresponding source file and for some source files this is the time of
> build.
>
> http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content
>
> http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content
>
>
> I was wondering how to best get those to build bit-by-bit identical rpms.
>
> I assume, we want to keep the concept of .pyc files, since they provide
> some performance gain (e.g. I measured 'openstack --help' taking only
> 1.5 seconds with .pyc files versus 2.5 seconds without (on another
> machine it was 12 vs 13 seconds))

Hm, my understanding was that the .pyc files have just an influence on
loading times, not on run-times of python programs. So I assume for
soemthing like 'openstack --help' it's significant, while something that
runs longer it may be less so.

> But why do we have to ship .pyc files as part of our binary rpms? They
> waste disk space and bandwidth for our mirrors and users.

I sometimes hear "disk space is cheap", but trying to do a rather small
installation with python-packages is difficult atm. E.g. just adding
sal-minion to a small installed system results in several dozend MB just
for the .pyc files.

> They could be created in a %post or %posttrans hook when installing the
> rpm (or do they need special build deps?)
> It might even be, that compiling them on the destination is faster than
> transferring and unpacking the LZMA compressed version.

Frankly speaking, they could be provided in a separate package, if really
needed. I would not generate them in the packages.

But "I am not a Python expert, just a user" :), so maybe there is a
significant reason to have them always available?
Just to have shorter startup-time of Python-programs doesn't sound like a
valid reason to have both .py and .pyc.

On the other hand, why not remove the .py files and just keep the .pyc?
The .py files could be still in the .src-rpms, for those who need/want
them.
That would still give one the advantages of the .pyc, without wasting
space.


        ciao,
          Stefan

--
Stefan Behlert, SUSE LINUX
 
Maxfeldstr. 5, D-90409 Nuernberg, Germany
Phone +49-911-74053-173
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Klaus Kaempf
In reply to this post by Bernhard M. Wiedemann-5
* Bernhard M. Wiedemann <[hidden email]> [Feb 17. 2017 06:33]:
>
> But why do we have to ship .pyc files as part of our binary rpms?

I'd rather ask why we have to ship *source code* (.py files) as part of
our binary rpms ?

They waste much more space since .py files usually include full
documentation.

https://hackweek.suse.com/15/projects/1244 showed that e.g. for Salt
and its dependencies, stripping the source almost halved(!) the
package size. From ~33 to ~18 MB.


Klaus
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Richard Biener
On Fri, 17 Feb 2017, Klaus Kaempf wrote:

> * Bernhard M. Wiedemann <[hidden email]> [Feb 17. 2017 06:33]:
> >
> > But why do we have to ship .pyc files as part of our binary rpms?
>
> I'd rather ask why we have to ship *source code* (.py files) as part of
> our binary rpms ?
>
> They waste much more space since .py files usually include full
> documentation.
>
> https://hackweek.suse.com/15/projects/1244 showed that e.g. for Salt
> and its dependencies, stripping the source almost halved(!) the
> package size. From ~33 to ~18 MB.

OTOH I remember .pyc files are not 100% portable across python
versions.  I really wonder why there's not some /var/cache/python-X.Y.Z
where .pyc files are created and cached on-demand (and that cache
configurable to not exist).

Debian compiles to .pyc at install time IIRC and re-compiles them
eventually on python package upgrates.

From looking at package rebuilds I do remember seeing changing python
triggering all -python packages to be rebuilt (when ideally when just
shipping .py files no build would be involved).

Richard.

--
Richard Biener <[hidden email]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Stephan Kulow-3
Am 17.02.2017 um 11:12 schrieb Richard Biener:

> From looking at package rebuilds I do remember seeing changing python
> triggering all -python packages to be rebuilt (when ideally when just
> shipping .py files no build would be involved).
>
Why would we prefer every user compiling over us building once? If the
pyc files are not compatible with all python versions, we need to have
a require on the python abi - so we know when to rebuild.

Greetings, Stephan

--
Ma muaß weiterkämpfen, kämpfen bis zum Umfalln, a wenn die
ganze Welt an Arsch offen hat, oder grad deswegn.
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Richard Biener
On Fri, 17 Feb 2017, Stephan Kulow wrote:

> Am 17.02.2017 um 11:12 schrieb Richard Biener:
>
> > From looking at package rebuilds I do remember seeing changing python
> > triggering all -python packages to be rebuilt (when ideally when just
> > shipping .py files no build would be involved).
> >
> Why would we prefer every user compiling over us building once? If the
> pyc files are not compatible with all python versions, we need to have
> a require on the python abi - so we know when to rebuild.

Sure.  Just as was mentioned if we ship .pyc why do we ship .py files?
Eventually it's about choice but then we can as well split packages
into python-X-py and python-X-pyc both providing python-X (or some
other way).

The incompatibility thing was just something I remember from the past,
it may be no longer true (python may just silently fall back to reading
the .py file if the .pyc is incompatible).

Richard.

> Greetings, Stephan
>
>

--
Richard Biener <[hidden email]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Stephan Kulow-3
Am 17.02.2017 um 11:52 schrieb Richard Biener:

> On Fri, 17 Feb 2017, Stephan Kulow wrote:
>
>> Am 17.02.2017 um 11:12 schrieb Richard Biener:
>>
>>> From looking at package rebuilds I do remember seeing changing python
>>> triggering all -python packages to be rebuilt (when ideally when just
>>> shipping .py files no build would be involved).
>>>
>> Why would we prefer every user compiling over us building once? If the
>> pyc files are not compatible with all python versions, we need to have
>> a require on the python abi - so we know when to rebuild.
>
> Sure.  Just as was mentioned if we ship .pyc why do we ship .py files?
> Eventually it's about choice but then we can as well split packages
> into python-X-py and python-X-pyc both providing python-X (or some
> other way).
>
Then why stop at python? Possibly we should make a choice if people want
to download binaries or compile their C code themselves? I'm sure you
will find other distributions as reference :)

Greetings, Stephan

--
Ma muaß weiterkämpfen, kämpfen bis zum Umfalln, a wenn die
ganze Welt an Arsch offen hat, oder grad deswegn.
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Bernhard M. Wiedemann-5
In reply to this post by Stephan Kulow-3
On 2017-02-17 11:14, Stephan Kulow wrote:
> Why would we prefer every user compiling over us building once?

a) it is not the users but their machines compiling in %post or such

b) it is very fast

I did a quick benchmark on a 2.1GHz CPU via
https://gist.github.com/bmwiedemann/2db103cda98d9c750ff27e3f92f67e37

which found that compiling 400000 lines or 14MB worth of python source
files created 14MB .pyc files, within 1.6 seconds.
Compressing those into a tar.xz of 2.7MB took 6.7s
and uncompressing took 0.2 seconds.

Now, you might think that users save 1.4 seconds when uncompressing the
precompiled .pyc files, but they also have to download them first, which
makes it only be faster when they have download speeds of >2MByte/s per
machine (=16MBit/s per machine) which unfortunately is not true
everywhere, especially in Germany aka Internet-Neuland.

on slower CPUs the balance might be better for precompiling, though.
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Jan Engelhardt-4
In reply to this post by Stephan Kulow-3

On Friday 2017-02-17 12:01, Stephan Kulow wrote:

>>>>
>>> Why would we prefer every user compiling over us building once?
>>
>> Sure.  Just as was mentioned if we ship .pyc why do we ship .py files?
>> Eventually it's about choice but then we can as well split packages
>> into python-X-py and python-X-pyc both providing python-X (or some
>> other way).
>>
>Then why stop at python? Possibly we should make a choice if people want
>to download binaries or compile their C code themselves? I'm sure you
>will find other distributions as reference :)

Tempting proposal. In fact, so tempting that we could just..

zypper()
{
        if [ "$1" != "emerge" ]; then command zypper "$@"; return $?; fi
        shift; for i in "$@"; do
                extra_optflags="-march=native" \
                bcond_withs=<(cat /etc/use_flags) \
                  osc --dont-ask-for-username build "openSUSE:Leap:42.2/$i"
                mv /var/tmp/.../*rpm /tmp/collect/ -f
        done
        command zypper in /tmp/collect/*.rpm
}

It's trivialized here for brevity, but shows the picture (that, to
me, also looks a lot easier than portage/emerge.)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Michael Calmer
In reply to this post by Bernhard M. Wiedemann-5
Hi

and now compare this to what Klaus suggested and ship only .pyc files:

- no compiling
- smaller package => less download time as well.

Sounds like that would be the fastest one.

Additionally no trouble with cleaning up these files. Remember all these
locally compiled files would be "not owned by any package". Or we need
to generate %ghost entries for all of them.

Am Freitag, 17. Februar 2017, 12:56:24 schrieb Bernhard M. Wiedemann:

> On 2017-02-17 11:14, Stephan Kulow wrote:
> > Why would we prefer every user compiling over us building once?
>
> a) it is not the users but their machines compiling in %post or such
>
> b) it is very fast
>
> I did a quick benchmark on a 2.1GHz CPU via
> https://gist.github.com/bmwiedemann/2db103cda98d9c750ff27e3f92f67e37
>
> which found that compiling 400000 lines or 14MB worth of python source
> files created 14MB .pyc files, within 1.6 seconds.
> Compressing those into a tar.xz of 2.7MB took 6.7s
> and uncompressing took 0.2 seconds.
>
> Now, you might think that users save 1.4 seconds when uncompressing the
> precompiled .pyc files, but they also have to download them first, which
> makes it only be faster when they have download speeds of >2MByte/s per
> machine (=16MBit/s per machine) which unfortunately is not true
> everywhere, especially in Germany aka Internet-Neuland.
>
> on slower CPUs the balance might be better for precompiling, though.

--
Regards

        Michael Calmer

--------------------------------------------------------------------------
Michael Calmer
SUSE LINUX GmbH, Maxfeldstr. 5, D-90409 Nuernberg
T: +49 (0) 911 74053 0
F: +49 (0) 911 74053575  - e-mail: [hidden email]
--------------------------------------------------------------------------
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
                     HRB 21284 (AG Nürnberg)

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

jan matejek-4
In reply to this post by Bernhard M. Wiedemann-5
hello,

replying to all comments so far

1. reason for existence of .pyc
Startup times, plain and simple. This does not matter in apps that run
for a long time, but it matters a lot for command line utilities.

Importantly, .pyc (and __pycache__ directories) is considered a *cache*.
The source code is the primary thing. The fact that python can run
purely off that cache is basically an implementation detail.


2. dropping .pyc and compiling on installation
That's certainly possible (Debian does something of the sort) and has
even some interesting advantages: it allows you to transparently support
many different python versions simultaneously with just one package install.
Of course, it would be a lot of work. Doing this manually in every
package is not realistic, we'd need some sort of automation that does
this for a package in a single macro, or maybe a "python-central" tool
that you run against the filelist in %post, or something...
You would also have to %ghost all the .pyc files, otherwise all you get
is a big ugly mess of not-owned files.
The singlespec macro set could help with this, possibly even creating
the %post/%preun scriptlets and %ghost entries automatically in packages.

This requires further discussion on the tradeoffs, probably at least a
rudimentary install time benchmark, but I'm not opposed to including
something like this.


3. dropping .pyc without replacement
That could be reasonable in some packages, if we know that they are
long-running and the startup time difference is negligible to users...
which is up to the individual maintainer, i suppose.
It's probably a bad idea for libraries, which tend to be used by
command-line tools, where startup time matters.


4. reproducible builds
For now, where it matters, let's touch generated .py files with a set
time (probably mtime of tarball?). I suspect this is not an issue in
most packages, so automation doesn't really help? But I'd be happy to
learn more, e.g. see packages that don't build reproducibly and check
out why that happens.

OTOH, doesn't rpm store mtime? Will a build with a regenerated file
count as same as the previous build, if contents of the file are
unchanged but metadata are?


5. dropping .py
That's a big NO.
First of all, it's simply not worth it. On my work machine, which has a
higher-than-usual amount of pythonic packages, the grand total size of
all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I
installed for the purpose of this experiment, is 17 MB of that.

Second, users would kill us and I personally would be one of them.
As I noted, .pyc is a *cache* for the primary source, which is the .py
file. For python (and other languages that run from source), the
automatic presence of source code and possibility of instant
modification (for local patching, debugging, etc.) is a big advantage.
Not installing .py files by default breaks all sorts of conventions,
user expectations, goes against the spirit of open source, and is
downright power-user-hostile, all in the name of saving space that's
negligible on a typical system. We don't want to be That Distro (at
least as far as I'm concerned)

For usecases where additional 15 MB matter (such as salt-minion? I have
no idea what kind of system is the target), this might be a reasonable
step. But you should also go further and install the whole thing as a
zip file (which python can import transparently if added to sys.path);
most of it would be symbol names, which are retained in .pyc, and here
the zip compression helps a lot.
Speaking of: whole salt is 32 MB, zipped up is 8.6 MB, zipped only *.py
is 3.7 MB, zipped only *.pyc is 4.9 MB. Make your own tradeoff.

You can go even further and vendor-include the dependent packages in the
zip file. Make the build process take the packages from their installed
locations, to pick up all updates and security fixes on rebuild. Sounds
like a good hackweek project.

(also, if 15 MB matter to you, maybe don't base your software on a
language with a 50MB stdlib of which you need maybe a third, if that...
or on a language where the sizes of source code and compiled objects are
comparable ;) )

regards
m.

On 17.2.2017 06:33, Bernhard M. Wiedemann wrote:

> Via https://en.opensuse.org/openSUSE:Reproducible_Builds
>
> I found that when we build python packages like python-amqp or
> python-binplist
>
> it contains a .pyc file for every .py file and for every build these
> .pyc files differ, because they contain the timestamp of the
> corresponding source file and for some source files this is the time of
> build.
>
> http://rb.zq1.de/compare.factory-20170208/python-amqp.html#content
>
> http://rb.zq1.de/compare.factory-20170208/python-binplist.html#content
>
>
> I was wondering how to best get those to build bit-by-bit identical rpms.
>
> I assume, we want to keep the concept of .pyc files, since they provide
> some performance gain (e.g. I measured 'openstack --help' taking only
> 1.5 seconds with .pyc files versus 2.5 seconds without (on another
> machine it was 12 vs 13 seconds))
>
> But why do we have to ship .pyc files as part of our binary rpms? They
> waste disk space and bandwidth for our mirrors and users.
> They could be created in a %post or %posttrans hook when installing the
> rpm (or do they need special build deps?)
> It might even be, that compiling them on the destination is faster than
> transferring and unpacking the LZMA compressed version.
>
>
> The less intrusive alternative approach would be to touch .py files to a
> constant older date (e.g. $SOURCE_DATE_EPOCH if set) before generating
> the .pyc files.
>
> What do you think which way to go?
>
> Ciao
> Bernhard M.
>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Kristoffer Grönlund
jan matejek <[hidden email]> writes:

> 5. dropping .py
> That's a big NO.

I just wanted to add my voice to this and agree wholeheartedly. As a
developer, this would make the packaged python modules completely
useless to me, as stepping into them with a debugger would no longer
work.

I would strongly recommend against removing .py files from the
packages.

Cheers,
Kristoffer

--
// Kristoffer Grönlund
// [hidden email]
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Bernhard M. Wiedemann-5
In reply to this post by jan matejek-4
On 2017-02-17 16:00, jan matejek wrote:

> 2. dropping .pyc and compiling on installation
> That's certainly possible (Debian does something of the sort) and has
> even some interesting advantages: it allows you to transparently support
> many different python versions simultaneously with just one package install.
> Of course, it would be a lot of work. Doing this manually in every
> package is not realistic, we'd need some sort of automation that does
> this for a package in a single macro, or maybe a "python-central" tool
> that you run against the filelist in %post, or something...
> You would also have to %ghost all the .pyc files, otherwise all you get
> is a big ugly mess of not-owned files.
> The singlespec macro set could help with this, possibly even creating
> the %post/%preun scriptlets and %ghost entries automatically in packages.
yes, indeed.

> This requires further discussion on the tradeoffs, probably at least a
> rudimentary install time benchmark, but I'm not opposed to including
> something like this.

I did some quick benchmarking today, that showed ~10 MB of source can be
compiled in 1-2 seconds (as long as you do not compile them one-by-one,
which creates some 10x overhead for loading python).
For the typical 200MB .py files that would make 20-40 seconds extra
minus the 10% that .pyc decompression would have eaten minus the saved
transfer time.


> 4. reproducible builds
> For now, where it matters, let's touch generated .py files with a set
> time (probably mtime of tarball?). I suspect this is not an issue in
> most packages, so automation doesn't really help? But I'd be happy to
> learn more, e.g. see packages that don't build reproducibly and check
> out why that happens.

you can search in http://rb.zq1.de/compare.factory/reproducible.json for
'unreproducible'
and http://rb.zq1.de/compare.factory/ also has some dozen build-compare
diffs for those cases where more than just a file timestamp differed.


> OTOH, doesn't rpm store mtime? Will a build with a regenerated file
> count as same as the previous build, if contents of the file are
> unchanged but metadata are?

it can be made constant with some recent patches linked in
https://github.com/rpm-software-management/rpm/pull/144
That already allowed to build reproducibly 70-76% of Factory.

Those patches are also in
https://build.opensuse.org/package/show/home:bmwiedemann:reproducible/rpm


> 5. dropping .py
> That's a big NO.
> First of all, it's simply not worth it. On my work machine, which has a
> higher-than-usual amount of pythonic packages, the grand total size of
> all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I
> installed for the purpose of this experiment, is 17 MB of that.
>
> Second, users would kill us and I personally would be one of them.
> As I noted, .pyc is a *cache* for the primary source, which is the .py
> file. For python (and other languages that run from source), the
> automatic presence of source code and possibility of instant
> modification (for local patching, debugging, etc.) is a big advantage.
> Not installing .py files by default breaks all sorts of conventions,
> user expectations, goes against the spirit of open source, and is
> downright power-user-hostile, all in the name of saving space that's
> negligible on a typical system. We don't want to be That Distro (at
> least as far as I'm concerned)
strongly agree there. It would be same loss we got with systemd vs
sysvinit scripts (e.g. the ones that handled filesystem mounts and
crypto containers which is now handled by compiled C code)


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Klaus Kaempf
In reply to this post by Kristoffer Grönlund
* Kristoffer Grönlund <[hidden email]> [Feb 17. 2017 16:59]:
> jan matejek <[hidden email]> writes:
>
> > 5. dropping .py
> > That's a big NO.
>
> I just wanted to add my voice to this and agree wholeheartedly. As a
> developer, this would make the packaged python modules completely
> useless to me, as stepping into them with a debugger would no longer
> work.

Then you just install the python-X-source package. That could even be
automated in zypper.

>
> I would strongly recommend against removing .py files from the
> packages.

I would strongly recommend for shipping the kernel source inside the
kernel-default package. SCNR ;-)

Klaus
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Jason Craig-3
In reply to this post by jan matejek-4
I'm replying to Jan's message inline because I think it raised important
points I want to reinforce with my voice.

On 02/17/2017 08:00 AM, jan matejek wrote:
...

>
> 2. dropping .pyc and compiling on installation
> That's certainly possible (Debian does something of the sort) and has
> even some interesting advantages: it allows you to transparently support
> many different python versions simultaneously with just one package install.
> Of course, it would be a lot of work. Doing this manually in every
> package is not realistic, we'd need some sort of automation that does
> this for a package in a single macro, or maybe a "python-central" tool
> that you run against the filelist in %post, or something...
> You would also have to %ghost all the .pyc files, otherwise all you get
> is a big ugly mess of not-owned files.
> The singlespec macro set could help with this, possibly even creating
> the %post/%preun scriptlets and %ghost entries automatically in packages.
>
> This requires further discussion on the tradeoffs, probably at least a
> rudimentary install time benchmark, but I'm not opposed to including
> something like this.

I guess this could make sense to some subset of users/packages/use cases
etc., but it seems like an awful lot of work to solve a problem that I
am not sure is even there.

> 3. dropping .pyc without replacement
> That could be reasonable in some packages, if we know that they are
> long-running and the startup time difference is negligible to users...
> which is up to the individual maintainer, i suppose.
> It's probably a bad idea for libraries, which tend to be used by
> command-line tools, where startup time matters.

If I understand what you are saying this is a non-starter for me. Python
will produce the .pyc files when the .py files get run, unless the user
were to always supply a flag on the Python command line, which is not
even a possibility when using many Python command line tools. So if only
.py files are distributed, then there will be a lot of .pyc cruft left
on the system after a package is uninstalled.

> 4. reproducible builds
> For now, where it matters, let's touch generated .py files with a set
> time (probably mtime of tarball?). I suspect this is not an issue in
> most packages, so automation doesn't really help? But I'd be happy to
> learn more, e.g. see packages that don't build reproducibly and check
> out why that happens.

This seems a reasonable strategy to help solve the original question.

> 5. dropping .py
> That's a big NO.
> First of all, it's simply not worth it. On my work machine, which has a
> higher-than-usual amount of pythonic packages, the grand total size of
> all "*.py" files is 208 MB. Out of a 7.6GB /usr. Salt itself, which I
> installed for the purpose of this experiment, is 17 MB of that.

This I whole-heartedly agree. Cannot drop the .py files. Python software
*is* the .py files. Python is an interpreted language and the fact that
the .pyc concept exists doesn't change that. Also, does any other distro
do this? There should be a good reason behind doing something no one
else is doing.

...
> (also, if 15 MB matter to you, maybe don't base your software on a
> language with a 50MB stdlib of which you need maybe a third, if that...
> or on a language where the sizes of source code and compiled objects are
> comparable ;) )

This is the key point IMO. I don't think a distribution should be trying
to "solve" the "problems" it sees in a particular programming language.
When a developer chooses a language they are also choosing its
particular strengths and limitations, whether the developer understands
that or not. If a particular user/use case can't abide with something on
the magnitude of tens to hundreds of MB (this population has to be the
tiniest sliver of the distro's potential user base), then they should
look at alternative software. It shouldn't be the job of a general
purpose Linux distribution to "solve" the tradeoffs implicit in the
developer's choices when the vast majority of users don't need such a
"solution".

--
Jason Craig
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Brüns, Stefan
In reply to this post by Klaus Kaempf
On Freitag, 17. Februar 2017 20:08:53 CET Klaus Kaempf wrote:

> * Kristoffer Grönlund <[hidden email]> [Feb 17. 2017 16:59]:
> > jan matejek <[hidden email]> writes:
> > > 5. dropping .py
> > > That's a big NO.
> >
> > I just wanted to add my voice to this and agree wholeheartedly. As a
> > developer, this would make the packaged python modules completely
> > useless to me, as stepping into them with a debugger would no longer
> > work.
>
> Then you just install the python-X-source package. That could even be
> automated in zypper.

Maybe add Recomends: packageand(python-X, python-sources), so its easy to
install sources for all installed python-packages and restore the current
status quo.

Option 5) actually said dropping sources, which is IMHO either badly worded or
just a bad idea.

Splitting out the sources (.py) from the bytecode (.pyc) would allow to remove
any redundant data and have fast startup times. Especially for containers and
small devices (RPi and alike) this would be a clear win. Probably most
installed systems would never need the sources, and even developers will
likely only need the sources of a few packages. Hands up, who has debuginfo/
debugsource installed for *all* their installed packages?
 
> > I would strongly recommend against removing .py files from the
> > packages.
>
> I would strongly recommend for shipping the kernel source inside the
> kernel-default package. SCNR ;-)

Don't forget to put the compiler and linker into the initrd, so we get a
freshly compiled kernel every time the system boots. ;-)

Kind regards,

Stefan

--
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
home: +49 241 53809034     mobile: +49 151 50412019
work: +49 2405 49936-424
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

todd rme
On Fri, Feb 17, 2017 at 3:00 PM, Stefan Bruens
<[hidden email]> wrote:

> On Freitag, 17. Februar 2017 20:08:53 CET Klaus Kaempf wrote:
>> * Kristoffer Grönlund <[hidden email]> [Feb 17. 2017 16:59]:
>> > jan matejek <[hidden email]> writes:
>> > > 5. dropping .py
>> > > That's a big NO.
>> >
>> > I just wanted to add my voice to this and agree wholeheartedly. As a
>> > developer, this would make the packaged python modules completely
>> > useless to me, as stepping into them with a debugger would no longer
>> > work.
>>
>> Then you just install the python-X-source package. That could even be
>> automated in zypper.
>
> Maybe add Recomends: packageand(python-X, python-sources), so its easy to
> install sources for all installed python-packages and restore the current
> status quo.
>
> Option 5) actually said dropping sources, which is IMHO either badly worded or
> just a bad idea.
>
> Splitting out the sources (.py) from the bytecode (.pyc) would allow to remove
> any redundant data and have fast startup times. Especially for containers and
> small devices (RPi and alike) this would be a clear win. Probably most
> installed systems would never need the sources, and even developers will
> likely only need the sources of a few packages. Hands up, who has debuginfo/
> debugsource installed for *all* their installed packages?

As others have said, Python is an interpreted language, not a compiled
language like C.  The .py files are the code that is executed.  The
.pyc files are just an optional cache.  In fact recent versions of
python put them in a __pycache__ directory.

More seriously, .pyc files are not intended for stand-alone use and
Python is not designed to work this way.  Python allows a lot of
tinkering with its internals, so I would not count on the .pyc files
even working reliably on their own, and the bugs that are introduced
could be rare and hard to track down.

This won't even work by default in recent versions of python.  As I
mentioned, these files go in the __pycache__ directory by default, and
python does not look there for code to execute.  So we would need to
change python from the upstream default behavior just to make this
proposal work at all.
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: python .pyc packaging

Jan Engelhardt-4

On Friday 2017-02-17 21:55, Todd Rme wrote:
>As others have said, Python is an interpreted language, not a compiled
>language like C.

That's nonsense. Who says Python *has* to be interpreted? Who says C
*has* to be compiled? Python, like C, each is a language (with a more
or less large standard library behind it).. I have yet to see a
language that cannot be compiled - it most likely would be some
esoteric one.
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

123