flapping build

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

flapping build

Max-2
Hi.

I've got weird issue with one of the OBS jobs which I don't know how to debug. From
time to time the build job fails.

It fails with "Illegal instruction" error - the code uses tricky SSE optimizations
which are heavily dependent on processor features.

The problem is that I do not see any pattern:
- any of the ubuntu or debian builds might fail (several or single)
- I'm unable to reproduce this locally
- the build failure is gone next day without any code changes

Overall this seems like some sort of spooky race condition. If I "trigger rebuild"
than the build is back to normal. I guess that it depends on what kind VM/CPU the
build has been scheduled on. Is there a way to get additional info about the VM on
which the build is running?

Example build log is attached but I'm unable to make sense out of it.

--
Max Suraev <[hidden email]> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte





debian_i586.log (558K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Henne Vogelsang-2
Hey,

On 20.07.2017 11:47, Max wrote:

> It fails with "Illegal instruction" error - the code uses tricky SSE optimizations
> which are heavily dependent on processor features.

Find out which & create a build constraint? :-)

http://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_job_constraints.html#idm140109221460960

Henne

--
Henne Vogelsang
http://www.opensuse.org
Everybody has a plan, until they get hit.
        - Mike Tyson
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Frank Schreiner-2
Am Donnerstag, 20. Juli 2017, 12:18:44 CEST schrieb Henne Vogelsang:

> Hey,
>
> On 20.07.2017 11:47, Max wrote:
> > It fails with "Illegal instruction" error - the code uses tricky SSE
> > optimizations which are heavily dependent on processor features.
>
> Find out which & create a build constraint? :-)
>
> http://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_j
> ob_constraints.html#idm140109221460960
>
> Henne
An additional hint:

You can use "osc workerinfo" to find out a bit more about the worker where the
job was running on:

## BUILD FAILED:
# osc workerinfo x86_64:build36:1 |grep sse
      <flag>sse</flag>
      <flag>sse2</flag>
      <flag>sse4a</flag>
      <flag>misalignsse</flag>

## BUILD WORKED
# osc workerinfo x86_64:cloud117:1 |grep sse
      <flag>sse</flag>
      <flag>sse2</flag>
      <flag>ssse3</flag>
      <flag>sse4_1</flag>
      <flag>sse4_2</flag>



signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Max-2
In reply to this post by Henne Vogelsang-2
Excellent advice, thank you.

Is there some sort of negation operator for those constraints? I mean setting bunch
of flags would most likely fix our build, but it would be nice to fix our code
instead - to find particular combination of present/absent flags causes build
failure. Is there a way to constrain build job to workers _without_ cpu flag sse2 for
example?

On 20.07.2017 12:18, Henne Vogelsang wrote:
> Find out which & create a build constraint? :-)
>
> http://openbuildservice.org/help/manuals/obs-reference-guide/cha.obs.build_job_constraints.html#idm140109221460960
>
>

--
Max Suraev <[hidden email]> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte




--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Max-2
In reply to this post by Frank Schreiner-2
Looks very useful, thank you. Where do I get this "build36" or "cloud117" from? also
what does ":1" at the end refers to?
Basically how do I gather parameters for "osc workerinfo" from the failed build log?

On 20.07.2017 12:24, Frank Schreiner wrote:

> An additional hint:
> You can use "osc workerinfo" to find out a bit more about the worker where the
> job was running on:
>
> ## BUILD FAILED:
> # osc workerinfo x86_64:build36:1 |grep sse
>       <flag>sse</flag>
>       <flag>sse2</flag>
>       <flag>sse4a</flag>
>       <flag>misalignsse</flag>
>
> ## BUILD WORKED
> # osc workerinfo x86_64:cloud117:1 |grep sse
>       <flag>sse</flag>
>       <flag>sse2</flag>
>       <flag>ssse3</flag>
>       <flag>sse4_1</flag>
>       <flag>sse4_2</flag>
>
>

--
Max Suraev <[hidden email]> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte




--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Frank Schreiner-2
Am Donnerstag, 20. Juli 2017, 12:31:32 CEST schrieb Max:
> Looks very useful, thank you. Where do I get this "build36" or "cloud117"
> from? also what does ":1" at the end refers to?

It is:

<arch>:<host>:<vm_num>

In the log you sent, you can find a line like

[    0s] build36 started "build libosmocore_0.9.6.20170719.dsc" at Wed Jul 19 19:49:50 UTC 2017.

build36 is the host. Here you can find the arch (maybe there is a better solution with osc - I don`t know)

https://build.opensuse.org/monitor

and vm_num=1 is always a good idea. Normally the other VM`s should be the same.




> Basically how do I gather parameters for "osc workerinfo" from the failed
> build log?
> On 20.07.2017 12:24, Frank Schreiner wrote:
> > An additional hint:
> > You can use "osc workerinfo" to find out a bit more about the worker where
> > the job was running on:
> >
> > ## BUILD FAILED:
> > # osc workerinfo x86_64:build36:1 |grep sse
> >
> >       <flag>sse</flag>
> >       <flag>sse2</flag>
> >       <flag>sse4a</flag>
> >       <flag>misalignsse</flag>
> >
> > ## BUILD WORKED
> > # osc workerinfo x86_64:cloud117:1 |grep sse
> >
> >       <flag>sse</flag>
> >       <flag>sse2</flag>
> >       <flag>ssse3</flag>
> >       <flag>sse4_1</flag>
> >       <flag>sse4_2</flag>


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Frank Schreiner-2
> build36 is the host. Here you can find the arch (maybe there is a better
> solution with osc - I don`t know)
>
Yes there is:

osc api /worker/_status|grep build36


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Max-2
Exactly what I was looking for, thank you.

--
Max Suraev <[hidden email]> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte




--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Max-2
In reply to this post by Frank Schreiner-2
The <arch> is the "hostarch=" parameter of the worker obtained from "osc api
/worker/_status" or it's arch of the failed build?

For example I got build for i586 failed while x86_64 build is ok (were on different
workers).

Getting data on the worker of failed i586 build:
osc api /worker/_status|grep build34
  <idle workerid="build34:1" hostarch="x86_64"/>
  <idle workerid="build34:2" hostarch="x86_64"/>
  <idle workerid="build34:3" hostarch="x86_64"/>
  <idle workerid="build34:5" hostarch="x86_64"/>
  <idle workerid="build34:6" hostarch="x86_64"/>
  <building workerid="build34:4" hostarch="x86_64" project="Kernel:linux-next"
repository="standard" package="kernel-vanilla" arch="i586" starttime="1500631045"/>

Getting sse flags:
osc workerinfo x86_64:build34:1 |grep sse
      <flag>sse</flag>
      <flag>sse2</flag>
      <flag>sse4a</flag>
      <flag>misalignsse</flag>

but
osc workerinfo x586:build34:1 |grep sse
Server returned an error: HTTP Error 404: remote error  unknown worker
remote error: unknown worker

On 20.07.2017 12:43, Frank Schreiner wrote:

> It is:
>
> <arch>:<host>:<vm_num>
>
> In the log you sent, you can find a line like
>
> [    0s] build36 started "build libosmocore_0.9.6.20170719.dsc" at Wed Jul 19 19:49:50 UTC 2017.
>
> build36 is the host. Here you can find the arch (maybe there is a better solution with osc - I don`t know)
>
> https://build.opensuse.org/monitor
>
> and vm_num=1 is always a good idea. Normally the other VM`s should be the same.


--
Max Suraev <[hidden email]> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte




--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: flapping build

Adrian Schröter
On Freitag, 21. Juli 2017, 13:41:55 CEST wrote Max:
> The <arch> is the "hostarch=" parameter of the worker obtained from "osc api
> /worker/_status" or it's arch of the failed build?

The hostarch is the architecture from the host ;)

It is independend of the build target.

i586, but also emulated architectures (eg armv6) can be build on
a x86_64 host.

> For example I got build for i586 failed while x86_64 build is ok (were on different
> workers).
>
> Getting data on the worker of failed i586 build:
> osc api /worker/_status|grep build34
>   <idle workerid="build34:1" hostarch="x86_64"/>
>   <idle workerid="build34:2" hostarch="x86_64"/>
>   <idle workerid="build34:3" hostarch="x86_64"/>
>   <idle workerid="build34:5" hostarch="x86_64"/>
>   <idle workerid="build34:6" hostarch="x86_64"/>
>   <building workerid="build34:4" hostarch="x86_64" project="Kernel:linux-next"
> repository="standard" package="kernel-vanilla" arch="i586" starttime="1500631045"/>
>
> Getting sse flags:
> osc workerinfo x86_64:build34:1 |grep sse
>       <flag>sse</flag>
>       <flag>sse2</flag>
>       <flag>sse4a</flag>
>       <flag>misalignsse</flag>
>
> but
> osc workerinfo x586:build34:1 |grep sse
>
> Server returned an error: HTTP Error 404: remote error  unknown worker
> remote error: unknown worker

Right, there is no i586 worker host, only a x86_64 which can also run
i586 via either booting 32-bit kernel or via personality switch.

However, it is up to the kernel and cpu then to offer the optimizations in
32bit legacy mode. I suppose that at least sse4a won't be available there.

But this is all content from POV of OBS, we only offer the VM here.
 

> On 20.07.2017 12:43, Frank Schreiner wrote:
> > It is:
> >
> > <arch>:<host>:<vm_num>
> >
> > In the log you sent, you can find a line like
> >
> > [    0s] build36 started "build libosmocore_0.9.6.20170719.dsc" at Wed Jul 19 19:49:50 UTC 2017.
> >
> > build36 is the host. Here you can find the arch (maybe there is a better solution with osc - I don`t know)
> >
> > https://build.opensuse.org/monitor
> >
> > and vm_num=1 is always a good idea. Normally the other VM`s should be the same.
>
>
>


--

Adrian Schroeter
email: [hidden email]

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
 
Maxfeldstraße 5                        
90409 Nürnberg
Germany


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]