timeout while inserting large free-space message

Bug #218388 reported by Andreas Hasenack
10
Affects Status Importance Assigned to Milestone
Landscape Client
Fix Released
High
Kevin McDermott
landscape-client (Ubuntu)
Fix Released
Undecided
Unassigned
Intrepid
Fix Released
Undecided
Unassigned
Jaunty
Fix Released
Undecided
Unassigned
Karmic
Fix Released
Undecided
Unassigned
Lucid
Fix Released
Undecided
Unassigned

Bug Description

Computer 236 today decided to send a large free-space message:

len(msg['free-space']) = 14733

I'm attaching it to the ticket.

This results in a backtrace on the server:
Traceback (most recent call last):
  File "src/canonical/landscape/message/messageapis.py", line 209, in _run
    self.handle(message["type"], message)
  File "src/canonical/message/api.py", line 55, in handle
    return handler(type, body)
  File "src/canonical/message/handler.py", line 27, in wrapper
    return function(message_api, type, body)
  File "src/canonical/lib/arguments.py", line 77, in replacement
    return original(*new_args, **new_kwargs)
  File "src/canonical/landscape/message/handlers/resources.py", line 507, in handle_free_space
    timestamp)
  File "src/canonical/landscape/model/main/computer.py", line 428, in get_mount_info_for_timestamp
    resource.MountInfo.mount_point == mount_point)
  File "src/canonical/landscape/model/main/computer.py", line 263, in _find
    return self.store.find(cls, cls.computer_id == self.id,
  File "/srv/landscape.canonical.com/storm/storm/store.py", line 192, in find
    self.flush()
  File "/srv/landscape.canonical.com/storm/storm/store.py", line 443, in flush
    self._flush_one(obj_info)
  File "/srv/landscape.canonical.com/storm/storm/store.py", line 477, in _flush_one
    result = self._connection.execute(expr)
  File "/srv/landscape.canonical.com/storm/storm/databases/postgres.py", line 250, in execute
    result = Connection.execute(self, Returning(statement), params)
  File "/srv/landscape.canonical.com/storm/storm/database.py", line 200, in execute
    raw_cursor = self.raw_execute(statement, params)
  File "/srv/landscape.canonical.com/storm/storm/databases/postgres.py", line 266, in raw_execute
    return Connection.raw_execute(self, statement, params)
  File "/srv/landscape.canonical.com/storm/storm/database.py", line 274, in raw_execute
    statement, params or ())
  File "/srv/landscape.canonical.com/storm/storm/tracer.py", line 75, in trace
    attr(*args, **kwargs)
  File "/srv/landscape.canonical.com/storm/storm/tracer.py", line 29, in connection_raw_execute
    raise TimeoutError(statement, params)
TimeoutError: 'INSERT INTO free_space (free_space, mount_info_id, "timestamp") VALUES (%s, %s, %s) RETURNING free_space.id', [<storm.variables.IntVariable ob
ject at 0x2aaaad12b650>, <storm.variables.IntVariable object at 0x2aaaad12b510>, <storm.variables.DateTimeVariable object at 0x2aaaad12b5d0>]

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Changed in landscape:
assignee: nobody → kevin-mcdermott
importance: Undecided → High
milestone: none → mthood-pre-4
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The logs are filling up. Each error line is 461kbytes, and we have 429 such lines so far.

Changed in landscape:
importance: High → Critical
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Download full text (5.0 KiB)

Client version is 0.16.0 and is gutsy

Some things that I don't know if are related or not, but look weird anyway:

- repeated mount points
landscape-test-layer-resource-1=# select * from mount_info where computer_id=236;
  id | computer_id | timestamp | device | mount_point | filesystem | total_space
------+-------------+---------------------+--------------------------------------------------------+---------------+------------+-------------
 1118 | 236 | 2007-10-29 18:21:10 | /dev/disk/by-uuid/0e60c4b6-2b1d-4dc3-81f7-a2f7a084a7f3 | / | ext3 | 18777
 1119 | 236 | 2007-10-29 18:21:10 | /dev/sda5 | /home | ext3 | 51630
 1120 | 236 | 2007-10-29 18:21:10 | /dev/sdb1 | /media/BACKUP | fuseblk | 238472
 2818 | 236 | 2007-12-11 17:22:45 | /dev/disk/by-uuid/0e60c4b6-2b1d-4dc3-81f7-a2f7a084a7f3 | / | ext3 | 18777
 2819 | 236 | 2007-12-11 17:22:45 | /dev/sda5 | /home | ext3 | 51630
 2820 | 236 | 2007-12-11 17:22:45 | /dev/sdb1 | /media/BACKUP | fuseblk | 238472

- some general statistics about those:
landscape-test-layer-resource-1=# select computer_id,mount_point,count(mount_point) as how_many from mount_info group by computer_id,mount_point having count(mount_point) > 1 order by computer_id desc limit 20;
 computer_id | mount_point | how_many
-------------+---------------------+----------
         986 | / | 3
         977 | / | 2
         972 | /backup | 2
         972 | /var | 2
         972 | / | 2
         972 | /mnt | 2
         965 | / | 4
         965 | /home/medias/Medias | 4
         964 | /home | 2
         964 | / | 2
         962 | /media/kvf-storage_ | 3
         962 | /media/kvf-storage | 7
         962 | /media/disk | 3
         962 | /boot | 2
         962 | / | 2
         961 | /home | 2
         961 | / | 2
         957 | / | 373
         955 | /boot | 228
         955 | / | 228
(20 rows)

landscape-test-layer-resource-1=# select computer_id,mount_point,count(mount_point) as how_many from mount_info group by computer_id,mount_point having count(mount_point) > 1 order by how_many desc limit 10;
 computer_id | mount_point | how_many
-------------+-------------+----------
         741 | / | 7369
         741 | /boot | 7369
         909 | / | 1393
         909 | /media/sda3 | 1393
         909 | /media/sda1 | 1393
         957 | / | 373
         955 | / | 228
         95...

Read more...

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Actually, client version is not clear.

https://landscape.canonical.com/account/camhs/computer/236/packages/list?q=landscape-client&filter=
suggests both 0.16.0 and 1.0.1 are installed

Apache access logs for the same timestamp as the backtrace suggest it was 1.0.1:
127.0.0.1 - - [15/Apr/2008:23:03:06 +0000] "POST /++vh++https:landscape.canonical.com:443/++/message-system HTTP/1.1" 500 29 "-" "landscape-client/1.0.1-gutsy1-landscape1"

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Maybe/probably the recent lowering of the storm timeout value from 300s to 120s triggered this.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

After increasing the timeout to 600s, the message was accepted:

2008-04-17T17:57:29 INFO canonical.landscape.message Processed messages for computer 236 (size=697.69kb, duration=464.65s, load=0.24, ignored=5, processed=1, dropped=94, remaining=2194, client=1.0.1-gutsy1-landscape1)

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Lowering importance to high since the immediate problem was resolved by temporarily increasing the timeout setting on the server.

Changed in landscape:
importance: Critical → High
Changed in landscape:
milestone: mthood-pre-4 → mthood
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Note to self: check if we have proper indexes in place for this.

Changed in landscape:
milestone: mthood → later
status: New → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Got another one today, from computer 1245:
len(msg['free-space']) = 10542

Changed in landscape:
milestone: later → thames-pre-3
Changed in landscape:
milestone: thames-pre-3 → thames-pre-4
Changed in landscape:
milestone: thames-pre-4 → thames
Changed in landscape:
milestone: thames → thames+1
Changed in landscape:
milestone: thames+1 → later
Revision history for this message
Jamu Kakar (jkakar) wrote :

I'm working on this now.

Changed in landscape:
assignee: bigkevmcd → jkakar
milestone: later → mountainview-pre-4
status: Confirmed → In Progress
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Jamu, this branch doesn't fix the issue brought up in this bug. In fact, it probably makes it even worse, since it's more expensive to insert new rows if there's an index to maintain. The issue described in this bug is somewhat of an edge case which needs a fix on the client side, since it should never be possible to queue so many individual free space messages in a single message.

Specifically for the branch, it looks good. Only one small detail, and a question related to the above:

[1]

+CREATE INDEX custom_graph_data_point_custom_graph_data_id_idx
+ ON custom_graph_data_point (custom_graph_id)

The naming pattern seems to be off here (notice the second _data_). There are two instances of this.

[2]

I'm hoping it won't be a big hit on the database to insert the several data points with the new indexes, but we should keep our eyes open about the impact of this change on the deployment.

Revision history for this message
Jamu Kakar (jkakar) wrote :

The branch I originally linked here doesn't fix this bug.

Changed in landscape:
assignee: jkakar → nobody
milestone: mountainview-pre-4 → later
status: In Progress → Confirmed
Revision history for this message
Jamu Kakar (jkakar) wrote :

I've filed bug #321671 to track the issue I've dealt with in my
branch. Please add any comments about it over there.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Happened again with computer 4938: 15466 free-space items.

2009-07-01T11:00:58 ERROR canonical.landscape.message Error handling message 'free-space' for computer 4938: {'free-space': [(1244125500, u'/', 12653), .....
Traceback (most recent call last):
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/landscape/message/messageapis.py", line 245, in _run
    self.handle(message["type"], message)
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/message/api.py", line 55, in handle
    return handler(type, body)
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/message/handler.py", line 27, in wrapper
    return function(message_api, type, body)
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/lib/arguments.py", line 77, in replacement
    return original(*new_args, **new_kwargs)
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/landscape/message/handlers/resources.py", line 594, in handle_free_space
    timestamp)
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/landscape/model/main/computer.py", line 486, in get_mount_info_for_timestamp
    resource.MountInfo.mount_point == mount_point)
  File "/srv/landscape.canonical.com/production/landscape/src/canonical/landscape/model/main/computer.py", line 289, in _find
    *args, **kwargs)
  File "/srv/landscape.canonical.com/production/storm/storm/store.py", line 206, in find
    self.flush()
  File "/srv/landscape.canonical.com/production/storm/storm/store.py", line 486, in flush
    self._flush_one(obj_info)
  File "/srv/landscape.canonical.com/production/storm/storm/store.py", line 523, in _flush_one
    result = self._connection.execute(expr)
  File "/srv/landscape.canonical.com/production/storm/storm/databases/postgres.py", line 251, in execute
    result = Connection.execute(self, Returning(statement), params)
  File "/srv/landscape.canonical.com/production/storm/storm/database.py", line 203, in execute
    raw_cursor = self.raw_execute(statement, params)
  File "/srv/landscape.canonical.com/production/storm/storm/databases/postgres.py", line 267, in raw_execute
    return Connection.raw_execute(self, statement, params)
  File "/srv/landscape.canonical.com/production/storm/storm/database.py", line 279, in raw_execute
    statement, params or ())
  File "/srv/landscape.canonical.com/production/storm/storm/database.py", line 328, in _check_disconnect
    return function(*args, **kwargs)
  File "/srv/landscape.canonical.com/production/storm/storm/tracer.py", line 95, in trace
    attr(*args, **kwargs)
  File "/srv/landscape.canonical.com/production/storm/storm/tracer.py", line 49, in connection_raw_execute
    raise TimeoutError(statement, params)
TimeoutError: 'INSERT INTO free_space (free_space, mount_info_id, "timestamp") VALUES (%s, %s, %s) RETURNING free_space.id', [<storm.variables.IntVariable object ......

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

It's happening again

tags: added: sooner-than-later
Changed in landscape:
status: Confirmed → In Progress
assignee: nobody → Kevin McDermott (bigkevmcd)
affects: landscape → landscape-client
Changed in landscape-client:
milestone: later → none
tags: added: fix-it-friday
Changed in landscape-client:
status: In Progress → Fix Committed
milestone: none → 1.4.4
visibility: private → public
tags: removed: sooner-than-later
tags: added: needs-testing
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 1.4.4-0ubuntu0.10.04

---------------
landscape-client (1.4.4-0ubuntu0.10.04) lucid; urgency=low

  * New upstream release (LP: #519200):
    - Add a message for creating package locks (LP: #514334)
    - Add support for auto-approved change-packages messages (LP: #517175)
    - Add support for installing server-generated debian packages (LP: #509752)
    - Add support for reporting Eucalyptus topology information (LP: #518501)
    - Fix timeout while inserting large free-space message (LP: #218388)
    - Fix wrong log path in motd (LP: #517454)
    - Fix race condition in process excecution (LP: #517453)
 -- Free Ekanayaka <email address hidden> Wed, 10 Feb 2010 18:50:53 +0100

Changed in landscape-client (Ubuntu Lucid):
status: New → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted landscape-client into intrepid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in landscape-client (Ubuntu Intrepid):
status: New → Fix Committed
tags: added: verification-needed
Changed in landscape-client (Ubuntu Jaunty):
status: New → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted landscape-client into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in landscape-client (Ubuntu Karmic):
status: New → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted landscape-client into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in landscape-client:
status: Fix Committed → Fix Released
Changed in landscape-client:
status: Fix Released → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 1.4.4-0ubuntu0.9.10

---------------
landscape-client (1.4.4-0ubuntu0.9.10) karmic-proposed; urgency=low

  * New upstream release (LP: #519200):
    - Add a message for creating package locks (LP: #514334)
    - Add support for auto-approved change-packages messages (LP: #517175)
    - Add support for installing server-generated debian packages (LP: #509752)
    - Add support for reporting Eucalyptus topology information (LP: #518501)
    - Fix timeout while inserting large free-space message (LP: #218388)
    - Fix wrong log path in motd (LP: #517454)
    - Fix race condition in process excecution (LP: #517453)
 -- Free Ekanayaka <email address hidden> Wed, 10 Feb 2010 18:49:25 +0100

Changed in landscape-client (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 1.4.4-0ubuntu0.9.04

---------------
landscape-client (1.4.4-0ubuntu0.9.04) jaunty-proposed; urgency=low

  * New upstream release (LP: #519200):
    - Add a message for creating package locks (LP: #514334)
    - Add support for auto-approved change-packages messages (LP: #517175)
    - Add support for installing server-generated debian packages (LP: #509752)
    - Add support for reporting Eucalyptus topology information (LP: #518501)
    - Fix timeout while inserting large free-space message (LP: #218388)
    - Fix wrong log path in motd (LP: #517454)
    - Fix race condition in process excecution (LP: #517453)
 -- Free Ekanayaka <email address hidden> Wed, 10 Feb 2010 18:41:52 +0100

Changed in landscape-client (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 1.4.4-0ubuntu0.8.10

---------------
landscape-client (1.4.4-0ubuntu0.8.10) intrepid-proposed; urgency=low

  * New upstream release (LP: #519200):
    - Add a message for creating package locks (LP: #514334)
    - Add support for auto-approved change-packages messages (LP: #517175)
    - Add support for installing server-generated debian packages (LP: #509752)
    - Add support for reporting Eucalyptus topology information (LP: #518501)
    - Fix timeout while inserting large free-space message (LP: #218388)
    - Fix wrong log path in motd (LP: #517454)
    - Fix race condition in process excecution (LP: #517453)
 -- Free Ekanayaka <email address hidden> Wed, 10 Feb 2010 18:40:45 +0100

Changed in landscape-client (Ubuntu Intrepid):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Code confirmed in the staging client. I'm not just leaving a client running for some time, accumulating data and without network access, to see what happens once I open the flood gates.

tags: removed: needs-testing
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Flood gates opened, no issues.

Tom Haddon (mthaddon)
tags: added: canonical-losa-ls
Changed in landscape-client:
status: Fix Committed → Fix Released
tags: removed: verification-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.