watchdog, anyone?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

watchdog, anyone?

Peter Suetterlin-2

  Hi list,

is anyone of you using a (hardware) watchdog to reset the comuter if the OS gets
stuck?  If so, what HW do you use, and what SW on the linux side?  I saw TW
(only) has freeipmi-bmc-watchdog, which seems a bit overkill(?) - the 'simple'
watchdog daemon is only available via home: repos...

I had tried the iTCO watchdog of my Skylake computer here, but while it reset
the machine, the boot would hang forever in POST.  Seems not too uncommon :(

Hints/tips highly welcome :)

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: watchdog, anyone?

Andrei Borzenkov
03.02.2018 17:22, Peter Suetterlin пишет:
>
>   Hi list,
>
> is anyone of you using a (hardware) watchdog to reset the comuter if the OS gets
> stuck?  If so, what HW do you use, and what SW on the linux side?  I saw TW

It's not like you really have a choice. If you talk about hardware
watchdog, this is whatever your hardware implements. Or you can use softdog.

> (only) has freeipmi-bmc-watchdog, which seems a bit overkill(?) - the 'simple'
> watchdog daemon is only available via home: repos...
>

You probably misunderstand how watchdog works. You need something to
periodically poke it. This "something" needs to know how to speak with
watchdog. Either you have kernel driver that implements standardize
interface, then you can really use "simple" daemon - although in this
case you could simply enable watchdog in sytsemd which is always running
anyway - or you need dedicated program that knows how to access
watchdog. bmc-watchdog is obviously useful only if your system actually
has BMC (under whatever name) with watchdog support. Do you have one?

> I had tried the iTCO watchdog of my Skylake computer here, but while it reset
> the machine, the boot would hang forever in POST.  Seems not too uncommon :(
>

You may try to play with turn_SMI_watchdog_clear_off parameter.

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: watchdog, anyone?

Peter Suetterlin-2
Andrei Borzenkov wrote:
> 03.02.2018 17:22, Peter Suetterlin пишет:
> >
> >   Hi list,
> >
> > is anyone of you using a (hardware) watchdog to reset the comuter if the OS gets
> > stuck?  If so, what HW do you use, and what SW on the linux side?  I saw TW
>
> It's not like you really have a choice. If you talk about hardware
> watchdog, this is whatever your hardware implements. Or you can use softdog.

I saw small external counter devices that you can connect to the reset line of
the MB.  So if I understand correct for those I'd need my own ping-daemon or
rather a driver that 'connects' it as some /dev/watchdog<n>?
The softdog doesn't help if the system really freezes....

> > (only) has freeipmi-bmc-watchdog, which seems a bit overkill(?) - the 'simple'
> > watchdog daemon is only available via home: repos...
> >
>
> You probably misunderstand how watchdog works. You need something to
> periodically poke it. This "something" needs to know how to speak with
> watchdog.

Yes, sure.  But the whole IPMI stuff is rather for (real) server boards with
built-in hardware that does (much) more like (HW) health monitoring etc.

> Either you have kernel driver that implements standardize
> interface, then you can really use "simple" daemon - although in this
> case you could simply enable watchdog in sytsemd which is always running
> anyway - or you need dedicated program that knows how to access
> watchdog. bmc-watchdog is obviously useful only if your system actually
> has BMC (under whatever name) with watchdog support. Do you have one?

Nope :(
So indeed the ipmi stuff is not for me, unless I buy appropriate hardware.
One thing answered, good!

> > I had tried the iTCO watchdog of my Skylake computer here, but while it reset
> > the machine, the boot would hang forever in POST.  Seems not too uncommon :(

> You may try to play with turn_SMI_watchdog_clear_off parameter.

Aah!  Thanks! yes I'm going to play with that :)

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: watchdog, anyone?

Andrei Borzenkov
03.02.2018 20:50, Peter Suetterlin пишет:
>
> I saw small external counter devices that you can connect to the reset line of
> the MB.  So if I understand correct for those I'd need my own ping-daemon or
> rather a driver that 'connects' it as some /dev/watchdog<n>?

Correct. Or custom program that knows how to speak to device (but you
likely will need some kernel driver anyway in which case it would be
easier if kernel driver also implemented standard access via /dev/watchdog).


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]