lockups with gcc 4.1.0?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

lockups with gcc 4.1.0?

stephan beal
Hiya!

A quick question to the software developers out there:

Since i've been using Suse 10.1, i haven't been able to keep an uptime
of more than 24 hours. i continually experience lockups while compiling
my software (gcc 4.1.0, which comes with Suse). i just rebooted for the
3rd time in the past 4 hours and it's *getting on my nerves*.

Have any other developers been experiencing lockups or mysterious
compiler segfaults/internal compiler erros when using gcc under 10.1?

*aaaarrrrgggghhhh*

stephan@owl:~/cvs/s11n.net/SpiderApe/src/ape> uname -a
Linux owl 2.6.16.13-4-default #1 Wed May 3 04:53:23 UTC 2006 i686 athlon
i386 GNU/Linux

The lockups have cause corruption on one of my drives: i've got a files
which i can neither see nor delete, and i can't delete their containing
directories:

root@owl:/home/stephan/cvs/s11n.net/SpiderApe/src # rm -fr js
rm: cannot lstat `js/jsxdrapi.c': Permission denied

*AAAARRRRGGGGHHHH*

And now i can't even 'cd' to some directories. Ah, Christ, i'm
screwed...

--
----- [hidden email]   http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

Carl Hartung
On Thursday 10 August 2006 22:01, stephan beal wrote:
> And now i can't even 'cd' to some directories. Ah, Christ, i'm
> screwed...

Hi Stephan,

You probably know how to tackle the consequences of the lockups using the
rescue system but, if not, feel free to ask (please describe your hardware,
too.)

I'm curious to know if it's experienced any similar lockups while running
other memory and/or disk IO intensive applications? Examples that come to
mind might be AV recording/editing or multimedia production or CAD? Have you
otherwise stress-tested and ruled out marginal hardware?

When you installed 10.1, did you do so on one or more *clean* partitions (i.e.
all contents pre-erased with the installer allowed to format it/them before
installing?) Was this a 'fresh' installation or did you upgrade a previous
version?

regards,

Carl

--
Check the headers for your unsubscription address
For additional commands send e-mail to [hidden email]
Also check the archives at http://lists.suse.com
Please read the FAQs: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

stephan beal
On Friday 11 August 2006 04:26, Carl Hartung wrote:
> On Thursday 10 August 2006 22:01, stephan beal wrote:
> > And now i can't even 'cd' to some directories. Ah, Christ, i'm
> > screwed...
>
> Hi Stephan,
>
> You probably know how to tackle the consequences of the lockups using
> the rescue system but, if not, feel free to ask (please describe your
> hardware, too.)

i just got back from doing the dreaded:

fsck.reiserfs --rebuild-tree /dev/hda3

It mostly worked. i had to run with --fix-fixable a couple more times to
correct my problems.

It would *appear* that my lockups were related to corruption on my /home
partition (reiserfs). i'm considering switching it to XFS because
reiser has done this to me once before), but in retrospect, reiserfs's
utilities have always recovered fairly gracefully from filesystem
corruption, and i don't yet have enough experience with XFS to know if
i would be so lucky with it.

> I'm curious to know if it's experienced any similar lockups while
> running other memory and/or disk IO intensive applications?

So far, no. Last week it was crashing when i was running gcc (about
5-10% of the time it would crash), but always immediately before it
crashed i got a cryptic NVIDIA error message in the syslog with the
pointer addresses. i replaced the NVidia X driver with the standard
(non-accelerated) driver and assumed the problem was gone. (This was
actually better for me, anyway, as it keeps me from wasting all my time
playing games like Tremulous.) Tonight, though, it was crash after
crash.

> production or CAD? Have you otherwise stress-tested and ruled out
> marginal hardware?

i have considered it, but haven't done it. In my experience, gcc is
about as good a stress test as any. In one case i discovered i had a
bad RAM chip because gcc kept failing with odd assembly-level errors,
even when memcheck didn't pick up the problem.

> When you installed 10.1, did you do so on one or more *clean*
> partitions (i.e. all contents pre-erased with the installer allowed
> to format it/them before installing?) Was this a 'fresh' installation
> or did you upgrade a previous version?

i *always* do a fresh install, because i don't trust any OS upgrade
process (not biased against Suse, just against upgrades in general).
However, my /home partition was of course *not* reformatted, and if it
was corrupted before, then of course the 10.1 install would have
inherited that corruption.

For now i'm going to assume that the reiserfs corruption was the problem
and hope/pray that it doesn't happen again. i was, luckily enough, able
to tar up my /home directories, rescuing my evening's worth of C++
code. The --rebuild-tree process only deleted 8 files, none of which i
use, and most of which were old HTML and PDF files generated by Lyx
more than 2 years ago.

Thanks a lot for your feedback!

--
----- [hidden email]   http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

Carl Hartung
On Thursday 10 August 2006 23:45, stephan beal wrote:
> i *always* do a fresh install, because i don't trust any OS upgrade
> process (not biased against Suse, just against upgrades in general).

That's called 'experienced user' syndrome ;-)

> However, my /home partition was of course *not* reformatted, and if it
> was corrupted before, then of course the 10.1 install would have
> inherited that corruption.

This would be my guess, too.

> For now i'm going to assume that the reiserfs corruption was the problem
> and hope/pray that it doesn't happen again. i was, luckily enough, able
> to tar up my /home directories, rescuing my evening's worth of C++
> code. The --rebuild-tree process only deleted 8 files, none of which i
> use, and most of which were old HTML and PDF files generated by Lyx
> more than 2 years ago.
>
> Thanks a lot for your feedback!

Glad you were able to 'land' this one gracefully.

Carl

--
Check the headers for your unsubscription address
For additional commands send e-mail to [hidden email]
Also check the archives at http://lists.suse.com
Please read the FAQs: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

M Harris-2
In reply to this post by stephan beal
On Thursday 10 August 2006 22:45, stephan beal wrote:
> It would *appear* that my lockups were related to corruption on my /home
> partition (reiserfs). i'm considering switching it to XFS because
> reiser has done this to me once before), but in retrospect, reiserfs's
> utilities have always recovered fairly gracefully from filesystem
> corruption, and i don't yet have enough experience with XFS to know if
> i would be so lucky with it.
     Stick with ReiserFS.  Your own recent recovery is the reason why.

     I do think you have the diagnosis backwards, however. I believe your
lockups are causing the drive corruption and the lockups are hardware
related--- very slim chance this would be a gcc problem... in fact, I would
say no way.

     Set your machine up for a memory check... let it cycle several times
(couple of hours) and see what turns up.




--
Kind regards,

M Harris     <><

--
Check the headers for your unsubscription address
For additional commands send e-mail to [hidden email]
Also check the archives at http://lists.suse.com
Please read the FAQs: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

stephan beal
In reply to this post by stephan beal
On Friday 11 August 2006 05:45, stephan beal wrote:
> For now i'm going to assume that the reiserfs corruption was the
> problem and hope/pray that it doesn't happen again.

And not 2 minutes after i sent that, the filesystem went haywire again.
i've moved to a new partition (XFS), restored from a backup, and am
praying yet again...

--
----- [hidden email]   http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

stephan beal
In reply to this post by M Harris-2
On Friday 11 August 2006 06:26, M Harris wrote:
>      I do think you have the diagnosis backwards, however. I believe
> your lockups are causing the drive corruption and the lockups are
> hardware related--- very slim chance this would be a gcc problem...
> in fact, I would say no way.

i tend to agree - i think gcc was somehow triggering/demonstrating the
problem. The filesystem on that partition is well over 2 years old,
without a reformat in that time, so maybe it's just got a lot of cruft
in the filesystem internals (i spend most of my time
programming/compiling, which generates tons of files). The rest of the
partitions on that drive aren't demonstrating any problems (so far).


>      Set your machine up for a memory check... let it cycle several
> times (couple of hours) and see what turns up.

That's the next course of action. i've had this box 2.5 years, so i
don't have any reason to believe my RAM is bad. In my experience, if HW
is going to die, it does so when it's very young or very old.

--
----- [hidden email]   http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

M Harris-2
On Thursday 10 August 2006 23:53, stephan beal wrote:
> In my experience, if HW
> is going to die, it does so when it's very young or very old.
        Yup. ... if it is going to die it will usually die in the first 30 hours of
use... if it lasts 30 hours it will last several years...

     You might want to simply reseat your connectors... use an esd strap...
and reseat the memory dimms, hd cable plugs, and power supply plugs...
sometimes flaky hardware issues crop up in connectors, esp if the environment
is smoky or humid.


--
Kind regards,

M Harris     <><

--
Check the headers for your unsubscription address
For additional commands send e-mail to [hidden email]
Also check the archives at http://lists.suse.com
Please read the FAQs: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

stephan beal
In reply to this post by stephan beal
On Friday 11 August 2006 06:53, stephan beal wrote:
> On Friday 11 August 2006 06:26, M Harris wrote:
> >      Set your machine up for a memory check... let it cycle several
> > times (couple of hours) and see what turns up.
>
> That's the next course of action. i've had this box 2.5 years, so i
> don't have any reason to believe my RAM is bad. In my experience, if
> HW is going to die, it does so when it's very young or very old.

It was indeed a bad RAM chip. i believe i've got the bad guy singled
out, and am now running gcc to ma... <dropped carrier>

just kidding... as soon as i safely close kmail i'll be running gcc to
make sure i got the right chip (i don't want to wait another 7 hours on
the test program when gcc can find it in 4 seconds). It's a shame to
lose 512MB, but the machine is still usable.

Once again, gcc turns out to be the best app for finding bad memory.

Thanks again for your feedback!

--
----- [hidden email]   http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

stephan beal
In reply to this post by M Harris-2
On Friday 11 August 2006 07:21, M Harris wrote:
>      You might want to simply reseat your connectors... use an esd
> strap... and reseat the memory dimms, hd cable plugs, and power
> supply plugs... sometimes flaky hardware issues crop up in
> connectors, esp if the environment is smoky or humid.

That's a good idea. The suspect RAM chip was indeed very dusty and had
some "gook" on one of the connectors, so i'm going to give it another
try and memcheck it run over the weekend (while i'm out of town).

Thank goodness i had an old Suse 8 DVD with a memcheck boot option...


--
----- [hidden email]   http://s11n.net
"...pleasure is a grace and is not obedient to the commands
of the will." -- Alan W. Watts

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: lockups with gcc 4.1.0?

Jan Engelhardt
>>      You might want to simply reseat your connectors... use an esd
>> strap... and reseat the memory dimms, hd cable plugs, and power
>> supply plugs... sometimes flaky hardware issues crop up in
>> connectors, esp if the environment is smoky or humid.
>
>That's a good idea. The suspect RAM chip was indeed very dusty and had
>some "gook" on one of the connectors, so i'm going to give it another
>try and memcheck it run over the weekend (while i'm out of town).

Hm I think memtest86 should again be installed as an additional entry in
the bootloader, like it has been before IIRC.

>Thank goodness i had an old Suse 8 DVD with a memcheck boot option...


Jan Engelhardt
--

--
Check the headers for your unsubscription address
For additional commands send e-mail to [hidden email]
Also check the archives at http://lists.suse.com
Please read the FAQs: [hidden email]