Leap 42.3

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Leap 42.3

Robert Schweikert-6
Hi,

Not even sure as to how to describe the issue or where to start looking.

What is happening appears to be an issue with oom or process limits, but
I have no idea how to nail this down. My work habbits have not changed,
meaning compared to when I was running Leap 42.2 I have a many browser
tabs open as before, a lot, and as many terminal and editor windows open
as well, again, a lot of those.

The symptoms I am seeing are

- systemd-coredump process running off and on and when it runs it sucks
up 100% cpu
- I get "no more process" errors when trying to open a new terminal window
- Rhythembox appears to be the mos frequent victim and it gets killed or
dies

dmesg does not have any "trace" information, but there is a message
pointing to some kind of trouble:

[178155.684433] Corrupted low memory at ffff880000004000 (4000 phys) =
002777f2

Anyway, the whole thing is annoying and I'd sure like to figure out
what's going on.

Thoughts?

Thanks,
Robert

--
Robert Schweikert                   MAY THE SOURCE BE WITH YOU
Distinguished Architect                       LINUX
Team Lead Public Cloud
[hidden email]
IRC: robjo


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Robert Schweikert-6
On 09/06/2017 04:49 PM, Robert Schweikert wrote:

> Hi,
>
> Not even sure as to how to describe the issue or where to start looking.
>
> What is happening appears to be an issue with oom or process limits, but
> I have no idea how to nail this down. My work habbits have not changed,
> meaning compared to when I was running Leap 42.2 I have a many browser
> tabs open as before, a lot, and as many terminal and editor windows open
> as well, again, a lot of those.
>
> The symptoms I am seeing are
>
> - systemd-coredump process running off and on and when it runs it sucks
> up 100% cpu
> - I get "no more process" errors when trying to open a new terminal window
> - Rhythembox appears to be the mos frequent victim and it gets killed or
> dies
>
> dmesg does not have any "trace" information, but there is a message
> pointing to some kind of trouble:
>
> [178155.684433] Corrupted low memory at ffff880000004000 (4000 phys) =
> 002777f2
>
> Anyway, the whole thing is annoying and I'd sure like to figure out
> what's going on.
>
And to follow up with a bit more information, I find lots of these in
the system log:

Sep 06 16:30:01 mountain systemd[1]: Stopping User Manager for UID 0...
Sep 06 16:30:01 mountain systemd[11849]: Stopped target Default.
Sep 06 16:30:01 mountain systemd[11849]: Stopped target Basic System.
Sep 06 16:30:01 mountain systemd[11849]: Stopped target Sockets.
Sep 06 16:30:01 mountain systemd[11849]: Stopped target Timers.
Sep 06 16:30:01 mountain systemd[11849]: Reached target Shutdown.
Sep 06 16:30:01 mountain systemd[11849]: Starting Exit the Session...
Sep 06 16:30:01 mountain systemd[11849]: Stopped target Paths.
Sep 06 16:30:01 mountain systemd[11849]: Received SIGRTMIN+24 from PID
11897 (kill).
Sep 06 16:30:01 mountain systemd[11852]: pam_unix(systemd-user:session):
session closed for user root

But well I am not logged in as root, i.e. I am not constantly logging in
and out. I do have a terminal window open where I am root, but that's
it. Or maybe I get one of these blocks every time I run osc build?

Help is appreciated,
Robert

--
Robert Schweikert                   MAY THE SOURCE BE WITH YOU
Distinguished Architect                       LINUX
Team Lead Public Cloud
[hidden email]
IRC: robjo


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Robert Schweikert-6
On 09/06/2017 05:51 PM, Tyler McClanahan wrote:
> Robert,
>
> You could be running into this bug here:
>
> https://bugzilla.opensuse.org/show_bug.cgi?id=1017652
>
> If you are running GNOME,

Running XFCE

> I have this same issue. The indexer hangs on certain files. Do you see anything in your message logs that shows
> "tracker" errors?

Nothing with tracker:

journalctl --all | grep tracker

Anyway I killed the running tracker-* processes and do not appear to
return, lets see if anything changes.

Thanks,
Robert

--
Robert Schweikert                   MAY THE SOURCE BE WITH YOU
Distinguished Architect                       LINUX
Team Lead Public Cloud
[hidden email]
IRC: robjo


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Carlos E. R.-2
In reply to this post by Robert Schweikert-6
On 2017-09-06 23:13, Robert Schweikert wrote:
> On 09/06/2017 04:49 PM, Robert Schweikert wrote:


>> The symptoms I am seeing are
>>
>> - systemd-coredump process running off and on and when it runs it sucks
>> up 100% cpu

This means that some other process has crashed, and systemd-coredump is
collecting and compacting its garbage. You have to find out what that
other process is.

Run "coredumpctl" and it will tell you the list.

In "journalctl you will find some info around the times listed above.


>> - I get "no more process" errors when trying to open a new terminal window
>> - Rhythembox appears to be the mos frequent victim and it gets killed or
>> dies
>>
>> dmesg does not have any "trace" information, but there is a message
>> pointing to some kind of trouble:
>>
>> [178155.684433] Corrupted low memory at ffff880000004000 (4000 phys) = 002777f2

Probably unrelated.


>> Anyway, the whole thing is annoying and I'd sure like to figure out
>> what's going on.
>>
>
> And to follow up with a bit more information, I find lots of these in
> the system log:
>
> Sep 06 16:30:01 mountain systemd[1]: Stopping User Manager for UID 0...
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Default.
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Basic System.
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Sockets.
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Timers.
> Sep 06 16:30:01 mountain systemd[11849]: Reached target Shutdown.
> Sep 06 16:30:01 mountain systemd[11849]: Starting Exit the Session...
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Paths.
> Sep 06 16:30:01 mountain systemd[11849]: Received SIGRTMIN+24 from PID 11897 (kill).
> Sep 06 16:30:01 mountain systemd[11852]: pam_unix(systemd-user:session): session closed for user root
Irrelevant.

--
Cheers / Saludos,

                Carlos E. R.

  (from 42.2 x86_64 "Malachite" (Minas Tirith))


signature.asc (220 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Andrei Borzenkov
In reply to this post by Robert Schweikert-6
07.09.2017 00:13, Robert Schweikert пишет:
...

>
> And to follow up with a bit more information, I find lots of these in
> the system log:
>
> Sep 06 16:30:01 mountain systemd[1]: Stopping User Manager for UID 0...
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Default.
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Basic System.
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Sockets.
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Timers.
> Sep 06 16:30:01 mountain systemd[11849]: Reached target Shutdown.
> Sep 06 16:30:01 mountain systemd[11849]: Starting Exit the Session...
> Sep 06 16:30:01 mountain systemd[11849]: Stopped target Paths.
> Sep 06 16:30:01 mountain systemd[11849]: Received SIGRTMIN+24 from PID
> 11897 (kill).
> Sep 06 16:30:01 mountain systemd[11852]: pam_unix(systemd-user:session):
> session closed for user root
>
> But well I am not logged in as root, i.e. I am not constantly logging in
> and out.
E.g. cron does. Check logs before this what opens this session.


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Johannes Weberhofer-2
In reply to this post by Robert Schweikert-6
Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
> - systemd-coredump process running off and on and when it runs it sucks
> up 100% cpu

We see such errors when Chromium in development mode crashes.

--
Johannes Weberhofer
Weberhofer GmbH, Austria, Vienna

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Carlos E. R.-2
On 2017-09-08 12:44, Johannes Weberhofer wrote:
> Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
>> - systemd-coredump process running off and on and when it runs it sucks
>> up 100% cpu
>
> We see such errors when Chromium in development mode crashes.

The log should say what it is.

systemd-coredump uses a lot of CPU because it compresses the core
images, and they are always large, perhaps huge files (gigabytes in a
case of mine).

You can configure it in "/etc/systemd/coredump.conf"

Ah, ulimit -c doesn't work.


I think systemd-coredump should be improved: initially dump without
compression. Later, after dumping, compress just one file at a time, in
background. If a process dumps repeatedly, because it is started again
and again, don't dump.


--
Cheers / Saludos,

                Carlos E. R.
                (from 42.2 x86_64 "Malachite" at Telcontar)


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

John Andersen-2
On 09/08/2017 07:03 AM, Carlos E. R. wrote:
> I think systemd-coredump should be improved: initially dump without
> compression. Later, after dumping, compress just one file at a time, in
> background. If a process dumps repeatedly, because it is started again
> and again, don't dump.

Isn't that what you get when you set in /etc/systemd/coredump.conf
that storage is external and compression is off?

These will still be cleaned by systemd-tmpfiles wouldn't they?

--
After all is said and done, more is said than done.


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Carlos E. R.-2
On 2017-09-08 20:22, John Andersen wrote:
> On 09/08/2017 07:03 AM, Carlos E. R. wrote:
>> I think systemd-coredump should be improved: initially dump without
>> compression. Later, after dumping, compress just one file at a time, in
>> background. If a process dumps repeatedly, because it is started again
>> and again, don't dump.
>
> Isn't that what you get when you set in /etc/systemd/coredump.conf
> that storage is external and compression is off?

I don't know what "external" means, but I guess it means "inside the log
or outside" (man confirms). Compression can be set to off, yes. And they
are deleted in a week by default.

But there is no adjustment possible when something dies and restarts
repeatedly: kills all CPU cores.

Also, there is no choice of compressor or options, like use the fastest
method, nor of a filter to not collect the coredump of some processes.

 I have a process that sometimes crashes and dump core, sometimes huge,
and the CPU is busy for many minutes; during this time I can not start
it again, and I need the process.

The most I can do is disable compression.

> These will still be cleaned by systemd-tmpfiles wouldn't they?

Yes.

--
Cheers / Saludos,

                Carlos E. R.
                (from 42.2 x86_64 "Malachite" at Telcontar)


signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Robert Schweikert-6
In reply to this post by Johannes Weberhofer-2
On 09/08/2017 06:44 AM, Johannes Weberhofer wrote:
> Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
>> - systemd-coredump process running off and on and when it runs it sucks
>> up 100% cpu
>
> We see such errors when Chromium in development mode crashes.
>


Yeah I think Chrome is to blame here :(

--
Robert Schweikert                   MAY THE SOURCE BE WITH YOU
Distinguished Architect                       LINUX
Team Lead Public Cloud
[hidden email]
IRC: robjo


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Aaron Digulla
Am 11.09.2017 um 22:26 schrieb Robert Schweikert:
> On 09/08/2017 06:44 AM, Johannes Weberhofer wrote:
>> Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
>>> - systemd-coredump process running off and on and when it runs it sucks
>>> up 100% cpu
>> We see such errors when Chromium in development mode crashes.
>>
>
> Yeah I think Chrome is to blame here :(
>

Another possible reason is that you don't have enough processes. Chrome
renders each tab in a different process. That means chrome needs a ton
of entries in the process table. To see how many you have, use:

ulimit -a|grep proc

The command to check how many processes you use is:

ps -fTu $USER | wc -l

If the numbers are close, opening new tabs or terminals can fail because
these operations create many new processes (BASH will need them to
process the information in the start up scripts).

To fix this, add these two lines to /etc/security/limits.conf:

*               hard    nproc           1700
*               soft    nproc           1200

Regards,

--
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://blog.pdark.de/


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Robert Schweikert-6
On 09/13/2017 02:46 PM, Aaron Digulla wrote:

> Am 11.09.2017 um 22:26 schrieb Robert Schweikert:
>> On 09/08/2017 06:44 AM, Johannes Weberhofer wrote:
>>> Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
>>>> - systemd-coredump process running off and on and when it runs it sucks
>>>> up 100% cpu
>>> We see such errors when Chromium in development mode crashes.
>>>
>>
>> Yeah I think Chrome is to blame here :(
>>
>
> Another possible reason is that you don't have enough processes. Chrome
> renders each tab in a different process. That means chrome needs a ton
> of entries in the process table. To see how many you have, use:
>
> ulimit -a|grep proc
~> ulimit -a|grep proc
max user processes              (-u) 1200

>
> The command to check how many processes you use is:
>
> ps -fTu $USER | wc -l

1056

Probably close enough to the limit to cause trouble when opening a few
more tabs and windows.....

>
> If the numbers are close, opening new tabs or terminals can fail because
> these operations create many new processes (BASH will need them to
> process the information in the start up scripts).
>
> To fix this, add these two lines to /etc/security/limits.conf:
>
> *               hard    nproc           1700
> *               soft    nproc           1200

Ahh configuration knobs, cranked up a bit.

Thanks,
Robert

--
Robert Schweikert                   MAY THE SOURCE BE WITH YOU
Distinguished Architect                       LINUX
Team Lead Public Cloud
[hidden email]
IRC: robjo


signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Anton Aylward-2
In reply to this post by Aaron Digulla
On 13/09/17 02:46 PM, Aaron Digulla wrote:
> Another possible reason is that you don't have enough processes. Chrome
> renders each tab in a different process. That means chrome needs a ton
> of entries in the process table.

Just out of interest, what is the algorithm  for dealing with the proc table?
I gather from what you write that it is a (somewhat) static array as opposed to
a dynamically created tree?

What is the search and/or insert or compression algorithm?  Is there some hash
which might also be expanded for faster lookup in the nearly full situation?



--
         A: Yes.
     >   Q: Are you sure?
     >>  A: Because it reverses the logical flow of conversation.
     >>> Q: Why is top posting frowned upon?


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Aaron Digulla
Am 13.09.2017 um 21:29 schrieb Anton Aylward:

> On 13/09/17 02:46 PM, Aaron Digulla wrote:
>> Another possible reason is that you don't have enough processes. Chrome
>> renders each tab in a different process. That means chrome needs a ton
>> of entries in the process table.
> Just out of interest, what is the algorithm  for dealing with the proc table?
> I gather from what you write that it is a (somewhat) static array as opposed to
> a dynamically created tree?
>
> What is the search and/or insert or compression algorithm?  Is there some hash
> which might also be expanded for faster lookup in the nearly full situation?

The process list is already dynamic.  It's a security feature:
https://en.wikipedia.org/wiki/Fork_bomb

In a nutshell: This is to prevent your computer from locking up because
someone made a mistake (program endlessly creates processes in a loop)
or a denial of service attack (creating processes to bring the
performance to a crawl).

Now, this is 2017 and people are starting to use all those nice CPU
cores so the "1000 processes per user should be enough for anyone" is no
longer true.

On my computer, Chrome needs 400 entries (each thread counts as one
process), Thunderbird 50, Firefox 45. With the Version 55 of Firefox,
the situation will get worse.

Maybe openSUSE should set the default to 2000?

Regards,

--
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://blog.pdark.de/


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Patrick Shanahan-2
* Aaron Digulla <[hidden email]> [09-13-17 17:11]:

> Am 13.09.2017 um 21:29 schrieb Anton Aylward:
> > On 13/09/17 02:46 PM, Aaron Digulla wrote:
> >> Another possible reason is that you don't have enough processes. Chrome
> >> renders each tab in a different process. That means chrome needs a ton
> >> of entries in the process table.
> > Just out of interest, what is the algorithm  for dealing with the proc table?
> > I gather from what you write that it is a (somewhat) static array as opposed to
> > a dynamically created tree?
> >
> > What is the search and/or insert or compression algorithm?  Is there some hash
> > which might also be expanded for faster lookup in the nearly full situation?
>
> The process list is already dynamic.  It's a security feature:
> https://en.wikipedia.org/wiki/Fork_bomb
>
> In a nutshell: This is to prevent your computer from locking up because
> someone made a mistake (program endlessly creates processes in a loop)
> or a denial of service attack (creating processes to bring the
> performance to a crawl).
>
> Now, this is 2017 and people are starting to use all those nice CPU
> cores so the "1000 processes per user should be enough for anyone" is no
> longer true.
>
> On my computer, Chrome needs 400 entries (each thread counts as one
> process), Thunderbird 50, Firefox 45. With the Version 55 of Firefox,
> the situation will get worse.
>
> Maybe openSUSE should set the default to 2000?

my Tw's are set to 4096 and I didn't change them :)
--
(paka)Patrick Shanahan       Plainfield, Indiana, USA          @ptilopteri
http://en.opensuse.org    openSUSE Community Member    facebook/ptilopteri
Registered Linux User #207535                    @ http://linuxcounter.net
Photos: http://wahoo.no-ip.org/piwigo                    paka @ IRCnet freenode

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Anton Aylward-2
In reply to this post by Aaron Digulla
On 13/09/17 05:10 PM, Aaron Digulla wrote:

>>
>> What is the search and/or insert or compression algorithm?  Is there some hash
>> which might also be expanded for faster lookup in the nearly full situation?
> The process list is already dynamic.  It's a security feature:
> https://en.wikipedia.org/wiki/Fork_bomb
>
> In a nutshell: This is to prevent your computer from locking up because
> someone made a mistake (program endlessly creates processes in a loop)
> or a denial of service attack (creating processes to bring the
> performance to a crawl).

Yes, I'm quite aware of a fork bomb; it is not something new.
Applying a per-user process limit as opposed to merely the system wide process
limit is likely adequate.

Having for each user on a heavily multi-user system  Patrick's 4K per user
setting nearly filled by Chrome -- or are we really talking about THREADS rather
than complete processes?[1] --leads to a pretty big main proc table.  Or are
they indexed on a per-user basis as well.

The research I can find googling around on kernel hashing seems a bit out of
date and it general.  I've seem mention that it is is

- a linear table
- a linked list
- a hash-indexed "table" but format of the "table" unspecified.

Please not, I'm not commenting on the data structure itself, only on its access
methods, creation, destruction.


One thing I do realise: with a VM kernel, tables can be resized.  Grab a new
page set, copy into the larger space giving the table a new upper limit, reset
pointers, release old page set.

Whether you SHOULD is quite another matter.  The circumstances that force you to
do this might be a problem that is in need of a solution first and foremost.




[1] I understand that for Chrome is it actually processes, but other
applications seem to spawn threads, which look remarkably like processes to some
process-listing tools.  I run htop or "ps -eLf" and find firefox has 54 threads,
thunderbird has 91.

--
         A: Yes.
     >   Q: Are you sure?
     >>  A: Because it reverses the logical flow of conversation.
     >>> Q: Why is top posting frowned upon?


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Leap 42.3

Johannes Weberhofer-2
In reply to this post by Aaron Digulla
Am 13.09.2017 um 20:46 schrieb Aaron Digulla:
> Another possible reason is that you don't have enough processes. Chrome
> renders each tab in a different process. That means chrome needs a ton
> of entries in the process table. To see how many you have, use:
  ...
> To fix this, add these two lines to /etc/security/limits.conf:
>
> *               hard    nproc           1700
> *               soft    nproc           1200

Thanks for all your responses! Today I got another core-dump:

Sep 18 11:54:51 c-web1 kernel: mmap: chromium (8060): VmData 2147631104 exceed data ulimit 2147483647. Update limits or use boot option ignore_rlimit_data.
Sep 18 11:54:51 c-web1 kernel: do_trap: 117 callbacks suppressed
Sep 18 11:54:51 c-web1 kernel: traps: chromium[8060] trap int3 ip:5630a8cc7d7e sp:7ffe4b3e3610 error:0
Sep 18 11:54:52 c-web1 systemd-coredump[14985]: Process 8060 (chromium) of user 1027 dumped core.


In my Leap 42.3 installation the values you suggested for /etc/security/limits.conf were already set. I have now increased them to 2200/2000, I wonder if that PC now stops core-dumping.

Best regards
--
Johannes Weberhofer
Weberhofer GmbH, Austria, Vienna

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]