Random "Job seems to be stuck here" build failures

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Random "Job seems to be stuck here" build failures

Michal Srb
Hi,

I am working with llvm4 package, which is usually building very long time:
https://build.opensuse.org/package/show/devel:tools:compiler/llvm4
https://build.opensuse.org/package/show/
home:michalsrb:branches:devel:tools:compiler/llvm4

I am seeing random but frequent build failures where the build log simply ends
(sometimes even in the middle of a line) and after long wait the build is
terminated. For example:

> ...
> [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
llvm4-4.0.1-38.5.i386/usr/include/llvm/Transforms/Utils/CtorUtils.h
> [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
llvm4-4.0.1-38.5.i386/usr/include/llvm/Transforms/Utils/EscapeEnumerator.h
> [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
llvm4-4.0.1-38.5.i386/usr/include/llvm/Transf
>
> Job seems to be stuck here, killed. (after 28800 seconds of inactivity)

It never happened in local build. In build service it happens randomly and in
different parts of the build. For example in the middle of "make install" in
%install section, or in the middle of running tests in %check section, or in
the middle of debuginfo extraction. That makes me think that the problem is
not in the package, but something is wrong in build service.

Could there be something broken in the package?
If it is problem in buildservice, can I do something to reduce the chance of
it happening?

Thanks,
Michal Srb
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Random "Job seems to be stuck here" build failures

Adrian Schröter
On Donnerstag, 7. September 2017, 13:23:29 CEST wrote Michal Srb:

> Hi,
>
> I am working with llvm4 package, which is usually building very long time:
> https://build.opensuse.org/package/show/devel:tools:compiler/llvm4
> https://build.opensuse.org/package/show/
> home:michalsrb:branches:devel:tools:compiler/llvm4
>
> I am seeing random but frequent build failures where the build log simply ends
> (sometimes even in the middle of a line) and after long wait the build is
> terminated. For example:
>
> > ...
> > [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
> llvm4-4.0.1-38.5.i386/usr/include/llvm/Transforms/Utils/CtorUtils.h
> > [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
> llvm4-4.0.1-38.5.i386/usr/include/llvm/Transforms/Utils/EscapeEnumerator.h
> > [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
> llvm4-4.0.1-38.5.i386/usr/include/llvm/Transf
> >
> > Job seems to be stuck here, killed. (after 28800 seconds of inactivity)
>
> It never happened in local build. In build service it happens randomly and in
> different parts of the build. For example in the middle of "make install" in
> %install section, or in the middle of running tests in %check section, or in
> the middle of debuginfo extraction. That makes me think that the problem is
> not in the package, but something is wrong in build service.

but not during %build?

Would point to some IO or disk space problem maybe.

Can you detect any pattern in the jobhistory? Eg. It only fails on lamb7x systems
or alike?

You may need require more disk space then ...

> Could there be something broken in the package?
> If it is problem in buildservice, can I do something to reduce the chance of
> it happening?

Hard to say, you could try local build using "--vm-type=kvm" to build like
on our workers. Can you reproduce it then?

It could be also a kernel bug from the used distro. Does it happen only on
distro X maybe?

In worst case you need to ping me when you see the build is hanging on
some worker and I need to trigger a kernel trace to get an idea why it
is hanging ....

--

Adrian Schroeter
email: [hidden email]

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
 
Maxfeldstraße 5                        
90409 Nürnberg
Germany


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Random "Job seems to be stuck here" build failures

Andreas Schwab-2
In reply to this post by Michal Srb
On Sep 07 2017, Michal Srb <[hidden email]> wrote:

> I am working with llvm4 package, which is usually building very long time:
> https://build.opensuse.org/package/show/devel:tools:compiler/llvm4
> https://build.opensuse.org/package/show/
> home:michalsrb:branches:devel:tools:compiler/llvm4
>
> I am seeing random but frequent build failures where the build log simply ends
> (sometimes even in the middle of a line) and after long wait the build is
> terminated. For example:
>
>> ...
>> [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
> llvm4-4.0.1-38.5.i386/usr/include/llvm/Transforms/Utils/CtorUtils.h
>> [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
> llvm4-4.0.1-38.5.i386/usr/include/llvm/Transforms/Utils/EscapeEnumerator.h
>> [14840s] -- Installing: /home/abuild/rpmbuild/BUILDROOT/
> llvm4-4.0.1-38.5.i386/usr/include/llvm/Transf
>>
>> Job seems to be stuck here, killed. (after 28800 seconds of inactivity)
>
> It never happened in local build. In build service it happens randomly and in
> different parts of the build. For example in the middle of "make install" in
> %install section, or in the middle of running tests in %check section, or in
> the middle of debuginfo extraction. That makes me think that the problem is
> not in the package, but something is wrong in build service.

Could be an OOM situation.  Try looking at the resource usage of a
succeeding build.

For example,
https://build.opensuse.org/package/statistics/devel:tools:compiler/llvm4?arch=x86_64&repository=openSUSE_Factory
says it needs 6GB of memory, but in _constraints only 4GB are requested.

Andreas.

--
Andreas Schwab, SUSE Labs, [hidden email]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Random "Job seems to be stuck here" build failures

Michal Srb
In reply to this post by Adrian Schröter
On čtvrtek 7. září 2017 13:37:16 CEST Adrian Schröter  wrote:
> > It never happened in local build. In build service it happens randomly and
> > in different parts of the build. For example in the middle of "make
> > install" in %install section, or in the middle of running tests in %check
> > section, or in the middle of debuginfo extraction. That makes me think
> > that the problem is not in the package, but something is wrong in build
> > service.
>
> but not during %build?

Sometimes it happens during %build too.

> Would point to some IO or disk space problem maybe.

The package has _constraints file that asks for 30GB of disk space, which
should be enough. But yes, maybe some IO problem.

> Can you detect any pattern in the jobhistory? Eg. It only fails on lamb7x
> systems or alike?

I have only 7 samples right now, I don't know how to get to logs from older
builds, if it is possible.

In those 7 samples it failed on SLE_12_SP2, openSUSE_Factory,
openSUSE_Leap_42.2 and openSUSE_Leap_42.3.
Architectures were x86_64, i586, armv6l.
Build hosts were lamb77, lamb76, lamb74, lamb78 and armbuild15.

> Hard to say, you could try local build using "--vm-type=kvm" to build like
> on our workers. Can you reproduce it then?

I tried that and it built correctly. But I tried only once or twice, so I
can't tell if I wasn't just lucky.

> In worst case you need to ping me when you see the build is hanging on
> some worker and I need to trigger a kernel trace to get an idea why it
> is hanging ....

I'll ping you if I catch it happening. Thanks!

Michal Srb
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Random "Job seems to be stuck here" build failures

Michal Srb
In reply to this post by Andreas Schwab-2
On čtvrtek 7. září 2017 13:41:53 CEST Andreas Schwab wrote:
> Could be an OOM situation.  Try looking at the resource usage of a
> succeeding build.
>
> For example,
> https://build.opensuse.org/package/statistics/devel:tools:compiler/llvm4?arc
> h=x86_64&repository=openSUSE_Factory says it needs 6GB of memory, but in
> _constraints only 4GB are requested.

Cool, I didn't know about this statistics page. The peak memory usage is when
linking the main libraries, but it gets stuck in other random places. But I'll
try to increase the limit anyway.

Michal
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]