System stops responding

Bug #518427 reported by Hal
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Presumably this is a kernel problem, but it has some curious aspects
to it and is being reported here for lack of a better place to turn.

lsb release: Ubuntu 8.04.4 LTS
uname -a: Linux www 2.6.24-26-server #1 SMP Tue Dec 1 19:19:20 UTC 2009 i686
GNU/Linux
Summary: server crash
Current Hardware: Dell PowerEdge 2650
Profile: Web server with vhosted clients, and basic LAMP functionality.
Typical load: less than .20, rarely above .50 (currently .03)

Symptom summary: System fails to fully respond. System is "running", and
answers pings quite normally, but ALL services fail to respond (apache, sshd,
etc), requiring a reboot to restore "normal" functionaltiy.

Related log data: Nothing, that I could find.

I've run into a troubling situation that has followed me from one hardware
profile to something radically different, with the same nasty results. As
mentioned above this system supports several client web sites. Its main
purpose is Apache/php. Mysql is running on a separate system. ftp is installed
but firewalled and really not used. Mail is only there to relay out mail from
the vhosted web clients. No incoming mail.

What is most troubling is that 2 months ago we moved everything from a
completely different 8.04 system (an IBM x330 server) because of the same
problem, eg system dies mysteriously with no log data, pings normally, nmap
shows all services running, but none of those services respond fully. I had
assumed we had some obscure hardware related problem, and moved all the
clients over to the current system. But something else is going on since the
problem has followed me to the current system, which would rule out faulty
hardware, I would think.

The best I can get from the logs is that the last Apache request was served at
16:40 (looked normal). Syslogd lefts its ---MARK--- thing in syslog for the last time at
16:56, which is the last entry that I can find in any log, until a reboot at
17:33. Absolutely nothing unusual in syslog, kern.log, or any other log,
during any of this timeframe. Nothing real unusual in any Apache log either
just prior to this.

I have reported a strange php/suhosin related error to the Ubuntu php team,
that is memory related
(https://bugs.launchpad.net/ubuntu/+source/php5/+bug/503396), and could be
related to this somehow. Possibly something happened there, and it was not
able to be logged. Hard to say.

As another note, I have several systems running 8.04 now with very like
configurations and these issues have not been a problem (except the previous
incarnation of this particular system).

Remote diagnostics after the problem started at approx 17:10:

$ ping www.example.net
PING www.example.net (212.253.111.163) 56(84) bytes of data.
64 bytes from www.example.net (212.253.111.163): icmp_seq=1 ttl=63 time=4.67 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=2 ttl=63 time=4.61 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=3 ttl=63 time=4.39 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=4 ttl=63 time=3.99 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=5 ttl=63 time=3.78 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=6 ttl=63 time=4.77 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=7 ttl=63 time=4.57 ms
64 bytes from www.example.net (212.253.111.163): icmp_seq=8 ttl=63 time=4.42 ms
^C
--- www.example.net ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7007ms

Starting Nmap 4.76 ( http://nmap.org ) at 2010-02-03 17:14 EST
Interesting ports on www.example.net (212.253.111.163):
Not shown: 994 closed ports
PORT STATE SERVICE
21/tcp open ftp
22/tcp open ssh
25/tcp open smtp
80/tcp open http
443/tcp open https
1720/tcp filtered H.323/Q.931

Everything *looks* very normal at this point. But none of those services fully
respond and can't open a usable connection. There is not even any indication
of attempted logins despite multiple attempts at new ssh connections. A
pre-existing ssh connection that had been opened for weeks, was likewise
totally unresponsive. The patient looks alive, but is quite dead.

wget -S www.example.net
--2010-02-03 17:16:08-- http://www.example.net/
Resolving www.example.net... 212.253.111.163
Connecting to www.example.net|212.253.111.163|:80... connected.
HTTP request sent, awaiting response... ^C

Hangs at that point. Same with ssh. All other systems in the same rack
and connected to the same switch, are 100% normal at this time too.

This is a remote system to my location so diagnostics had to be run remotely.

Tags: kj-expired
Revision history for this message
Hal (hal-foobox) wrote :

Attaching files I forgot about.

As another note, possibly relevant, and possibly not, on the previous system, I had observed what seemed to be very weird clock behavior during these episodes where the system "was not responding". For instance, Apache log files where it looked like the clock jumped backwards. And logging onto the console and finding repeated 'date' commands showed the clock was standing still. Strange. I have not noticed this on the current system (but only one episode and not much diagnostics).

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hal,
   Would it be possible for us to get 'apport-collect -p linux 518427' run so that other relevant logging could be attached to this bug?

Thanks!

-JFo

Changed in linux (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Hal Burgiss (hal-dbsinteractive) wrote : Re: [Bug 518427] Re: System stops responding

On Tue, Feb 09, 2010 at 09:44:47PM -0000, Jeremy Foshee wrote:
> Hal,
> Would it be possible for us to get 'apport-collect -p linux 518427' run so
> that other relevant logging could be attached to this bug?
>
> Thanks!

Hey Jeremy, I'd love to! I don't see a package with that command in it though.
We've got apport-cli, is that close? My 9.10 home system has apport-collect, but
this 8.04 doesn't seem to.

apport-cli just says 'no pending crash reports'.

Thanks! Let me know.

--
Hal Burgiss
DBS>Interactive
Manager Technical Services

Revision history for this message
Hal (hal-foobox) wrote :

I'd love to but I don't see a package with that command in it though. We've got apport-cli, is that close?

apport-cli just says 'no pending crash reports'.

Thanks.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hal,
you are absolutely correct. I misspoke in my earlier message. I've unfortunately gotten into the habit of requesting that for desktop installs.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hal,
     Please run ' apport-cli -p linux'. This should collect all logging for the linux kernel. I'd recommend looking through those logs to see if there is any sensitive information before attaching them to this bug ticket.

Apologies for any confusion earlier. :-)

Thanks!

-JFo

Revision history for this message
Hal (hal-foobox) wrote :

Hey, no problem. But this is what I get:

  # apport-cli -p linux
  No pending crash reports. Try --help for more information.

Thank you.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.