high ram use during initial full-signatures download

Bug #1320832 reported by John Leach
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Duplicity
Fix Released
Medium
Unassigned

Bug Description

this is duplicity 0.6.23-1 on Debian Wheezy (Python 2.7.6). Backup target is an OpenStack Swift cluster.

I ran a first full backup of about 60,000 files resulting in a ~100gig backup and a ~1gig full-signatures file.

I then reran the backup from another server that had the same file set, so obviously duplicity had to download the full-signatures file to the .cache

During that download, the ram usage of duplicity soars - in the end to roughly the size of the full-signatures file, which hints that perhaps duplicity is buffering the download entirely in ram. That's ~1gig of ram!

Duplicity doesn't seem to require that much ram during normal backup operation - only around 22mb, so it doesn't seem necessary that the full-signatures data is held in ram.

Revision history for this message
Philip Raschke (p.raschke) wrote :

I can confirm that this bug is still present in duplicity 0.7.08. Are there any plans to fix this?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

The size you are seeing are from Python not garbage collecting until the last moments. The actual read/write is buffered, large buffers yes, but not 1GB.

Changed in duplicity:
status: New → Invalid
Revision history for this message
John Leach (johnleach) wrote :

This still makes using duplicity on small virtual machines quite difficult.

Can duplicity explicitly call garbage collection to help out here? Or hint to Python that it should?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Even on small machines memory pressure should cause Python to garbage collect early. Has anyone had actual memory problems when duplicity was running?

Revision history for this message
Philip Raschke (p.raschke) wrote :

I run duplicity on a Raspberry Pi 2 (with 1gb of memory) and experienced memory problems when running an initial full backup of ~100gb of data. The backup process crashed during the transmission phase due to network problems (I initially thought that memory problems were the cause for this crash).

However, I simply tried to restart the process. Duplicity figures out "where" to restart the transmission, followed by a phase of heavy CPU and memory consumption. During this phase I ran out of memory and the process was repeatedly killed by the OS.

I solved this problem by copying all files (including the archive dir) to another machine with 8gb of memory and reran the backup with the same destination. Duplicity was able to start from the point it last crashed and complete the full backup eventually. I observed that the process needed 1.3gb of memory in total. Afterwards the memory was freed and the process finished normally after the transmission.

All incremental backups are now executed on the Raspberry Pi 2 without any other problems in respect to memory consumption. I'm sure the full backup would have been completed as well if the transmission wouldn't have been interrupted.

So my problem occurs when recovering from a crash during transmission which is a little bit different from John's bug description. Note that my first comment was made before I fully analyzed the situation.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

This will most likely be corrected with the sigfile rework being done for 0.8. I've marked it confirmed and will reevaluate after 0.8 is released.

Changed in duplicity:
status: Invalid → Confirmed
importance: Undecided → Medium
milestone: none → 0.8.00
Revision history for this message
Oskar Wycislak (oskarek) wrote :

Hi,

This also affected me yesterday.
I've found out that there's a very simple fix to it. Instead of reading whole file at once, it should be read in chunks. Looking at the code I can tell that someone meant to do that, but in the end - didn't.
Anyway, there's a very simple fix.

Just change

headers, body = self.conn.get_object(self.container, self.prefix + remote_filename)

to

headers, body = self.conn.get_object(self.container, self.prefix + remote_filename, resp_chunk_size=1024)

in

backends/swiftbackend.py

Sorry, for not providing this in a proper manner - branch, etc, but I'm quite new to bzr and don't really have time and energy now to fight with it ;-)

I've patched 0.7.10-1~bpo8+1 (python-swiftclient 1:2.3.1-1+deb8u1) with this fix and it works, when getting 1GB+ file when there's less than 500mb of ram available.
Before this fix it was crashing even with more than 1GB of ram available (which makes no sense, I know).

Changed in duplicity:
status: Confirmed → Fix Committed
Changed in duplicity:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.