Data corruption when resuming

Bug #1091269 reported by Pascual Abellan
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Duplicity
Fix Released
Medium
Unassigned
duplicity (Fedora)
Fix Released
Undecided
duplicity (Ubuntu)
Fix Released
Critical
Michael Terry
Lucid
Fix Released
Undecided
Unassigned
Oneiric
Won't Fix
Undecided
Unassigned
Precise
Fix Released
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
It's very easy to accidentally corrupt the backup copy of a file when resuming an interrupted backup.

Specifically, there are several ways to end up backing up corrupted data:

A) If resuming after a volume that ended in a one-block file, we would skip the first block of the next file.
B) If resuming after a volume that ended in a multi-block file, we would skip the first block of the next file.
C) If resuming a non-encrypted backup after a volume that spanned a multi-block file, we would skip some data inside the file.

These are all very similar but have slightly different code fixes. Together they amount to "if you resume a backup, it's very likely you will have corrupted data somewhere". The only situation that doesn't is resuming in the middle of a file while using encryption.

[Test Case]
Download the script 'auto-ctrl-c-test.sh' from this bug, and run it:
sudo /bin/bash ./auto-ctrl-c-test.sh -s

It should end with "***** Diff between /lib and /tmp/restore" and no lines following if the bug is fixed. If the bug is still there, there will be several lines explaining the difference between the original files and the restored ones.

[Regression Potential]
This patch changes (1) how much data we read from source files when backing up and (2) how we pick where to resume a backup.

If a regression did appear because of this branch, I would expect it to be a similar data corruption problem (not starting in the right place or reading too much/too little from source files).

[Original Report]
I have found a bug similar to #613244.

It seems that some files are corrupted when the backup is interrupted and resumed when using --no-encryption option.

The make-ctrl-c-test.sh test fails if I add the --no-encryption option to duplicity. Without this option it works fine.

I have added an option to make-ctrl-c-test.sh test to run it with or without encryption (patch attached).

I have seen this bug in the current duplicity quantal version (0.6.19-0ubuntu2) and in the last bazaar revision (897).

I also attached a patch that seems to fix this issue. I don't know anything about duplicity internals, so I don't know if this patch is correct or it will break something. Actually, I have ran the run-test script and it fails in test_GzipWriteFile test. But hopefully this can help you to fix the problem.

These are the final log lines of the make-ctrl-c-test.sh script, with and without encryption.

make-ctrl-c-test.sh with encryption:
...
Restoring backups...
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon Dec 17 14:07:59 2012
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon Dec 17 14:08:13 2012
Diff between /lib and /tmp/restore1
Diff between /tmp/restore1 and /tmp/restore2

make-ctrl-c-test.sh with --no-encryption and without fix applied:
...
Restoring backups...
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon Dec 17 14:08:55 2012
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon Dec 17 14:09:16 2012
Diff between /lib and /tmp/restore1
Diff between /tmp/restore1 and /tmp/restore2
Files /tmp/restore1/modules/3.5.0-19-generic/kernel/drivers/infiniband/hw/qib/ib_qib.ko and /tmp/restore2/modules/3.5.0-19-generic/kernel/drivers/infiniband/hw/qib/ib_qib.ko differ

make-ctrl-c-test.sh with --no-encryption and fix applied:
...
Restoring backups...
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon Dec 17 14:19:43 2012
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon Dec 17 14:20:14 2012
Diff between /lib and /tmp/restore1
Diff between /tmp/restore1 and /tmp/restore2

The test_GzipWriteFile fails with this error:
======================================================================
FAIL: test_GzipWriteFile (__main__.GPGTest)
Test GzipWriteFile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/gpgtest.py", line 135, in test_GzipWriteFile
    assert size - 64 * 1024 <= os.stat("testfiles/output/gzwrite.gz").st_size <= size + 64 * 1024
AssertionError

----------------------------------------------------------------------
Ran 8 tests in 1.326s

FAILED (failures=1)
Test failed

Related branches

Revision history for this message
Pascual Abellan (pabellan-nc) wrote :
Revision history for this message
Pascual Abellan (pabellan-nc) wrote :
Changed in duplicity:
status: New → Fix Committed
importance: Undecided → Medium
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "no-encryption-corruption-fix.patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Revision history for this message
Michael Terry (mterry) wrote :

I don't understand why this patch would fix the bug. It looks like it just means that we always call next() with exactly 128*1024, whereas before this patch, we might call it with anything from 32*1024 to 128*1024.

Revision history for this message
Michael Terry (mterry) wrote :

It looks like all the files that have problems are files split across volumes. So I posit that since the amount to read from the input files is based on how close the gzipped volume is to our target size (see GzipWriteFile()), that amount to read could be different if the gzip compression is different across runs. And it might be different if it includes the timestamp? Or uses a random number generator seeded from the timestamp or some such.

That would also explain why this patch fixes the issue. With this patch, we always read exactly 128*1024 bytes (though the code could be clearer about that).

I'm looking into this further, trying to add an automatic test for this.

In slightly unrelated news, why does GzipWriteFile() not call "gzip_file.write(block_iter.get_footer())"?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 1091269] Re: Data corruption when resuming with --no-encryption option
Download full text (4.6 KiB)

Not a clue. What's get_footer()? We did not encounter this problem
because duplicity has not been used much in no-encryption mode before now.
 I'm still unclear why anyone would want to do so. Kind of like using a
wrench to drive in a nail.

On Sat, Jan 5, 2013 at 10:35 AM, Michael Terry
<email address hidden>wrote:

> It looks like all the files that have problems are files split across
> volumes. So I posit that since the amount to read from the input files
> is based on how close the gzipped volume is to our target size (see
> GzipWriteFile()), that amount to read could be different if the gzip
> compression is different across runs. And it might be different if it
> includes the timestamp? Or uses a random number generator seeded from
> the timestamp or some such.
>
> That would also explain why this patch fixes the issue. With this
> patch, we always read exactly 128*1024 bytes (though the code could be
> clearer about that).
>
> I'm looking into this further, trying to add an automatic test for this.
>
> In slightly unrelated news, why does GzipWriteFile() not call
> "gzip_file.write(block_iter.get_footer())"?
>
> --
> You received this bug notification because you are subscribed to
> duplicity in Ubuntu.
> https://bugs.launchpad.net/bugs/1091269
>
> Title:
> Data corruption when resuming with --no-encryption option
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Fix Committed
> Status in “duplicity” package in Ubuntu:
> New
>
> Bug description:
> I have found a bug similar to #613244.
>
> It seems that some files are corrupted when the backup is interrupted
> and resumed when using --no-encryption option.
>
> The make-ctrl-c-test.sh test fails if I add the --no-encryption option
> to duplicity. Without this option it works fine.
>
> I have added an option to make-ctrl-c-test.sh test to run it with or
> without encryption (patch attached).
>
> I have seen this bug in the current duplicity quantal version
> (0.6.19-0ubuntu2) and in the last bazaar revision (897).
>
> I also attached a patch that seems to fix this issue. I don't know
> anything about duplicity internals, so I don't know if this patch is
> correct or it will break something. Actually, I have ran the run-test
> script and it fails in test_GzipWriteFile test. But hopefully this can
> help you to fix the problem.
>
> These are the final log lines of the make-ctrl-c-test.sh script, with
> and without encryption.
>
> make-ctrl-c-test.sh with encryption:
> ...
> Restoring backups...
> Local and Remote metadata are synchronized, no sync needed.
> Last full backup date: Mon Dec 17 14:07:59 2012
> Local and Remote metadata are synchronized, no sync needed.
> Last full backup date: Mon Dec 17 14:08:13 2012
> Diff between /lib and /tmp/restore1
> Diff between /tmp/restore1 and /tmp/restore2
>
> make-ctrl-c-test.sh with --no-encryption and without fix applied:
> ...
> Restoring backups...
> Local and Remote metadata are synchronized, no sync needed.
> Last full backup date: Mon Dec 17 14:08:55 2012
> Local and Remote metadata are synchronized, no sync needed.
> Last full backup date: M...

Read more...

Revision history for this message
Michael Terry (mterry) wrote : Re: Data corruption when resuming with --no-encryption option
Revision history for this message
Michael Terry (mterry) wrote :

Oh sorry, didn't see your comment. get_footer() seems to be some a function to add padding to a tar block. Not sure how important it is, since we seem to be doing fine without it.

As for how common no-encryption is, I'm confident it's relatively common among Deja Dup users. Encryption is the default, but turning it off is a simple check when starting your backup for the first time. It seems sensible if you already trust the destination and are using https or ssh or some such.

I also found other corruption issues that are not specific to no-encryption mode, addressed in the linked branch above.

Changed in duplicity (Ubuntu):
assignee: nobody → Michael Terry (mterry)
importance: Undecided → Critical
status: New → Triaged
Michael Terry (mterry)
summary: - Data corruption when resuming with --no-encryption option
+ Data corruption when resuming
Michael Terry (mterry)
description: updated
Changed in duplicity:
status: Fix Committed → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package duplicity - 0.6.20-0ubuntu4

---------------
duplicity (0.6.20-0ubuntu4) raring; urgency=low

  * debian/patches/08volcorruption.dpatch:
    - Fix some data-corruption issues when resuming an interrupted
      backup (LP: #1091269)
 -- Michael Terry <email address hidden> Mon, 07 Jan 2013 10:28:41 -0500

Changed in duplicity (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in duplicity (Ubuntu Lucid):
status: New → Confirmed
Changed in duplicity (Ubuntu Oneiric):
status: New → Confirmed
Changed in duplicity (Ubuntu Precise):
status: New → Confirmed
Changed in duplicity (Ubuntu Quantal):
status: New → Confirmed
Michael Terry (mterry)
Changed in duplicity:
status: Triaged → Fix Committed
Revision history for this message
Michael Terry (mterry) wrote :

Here's a test script, run like so to test the system duplicity:
sudo /bin/bash ./auto-ctrl-c-test.sh -s

description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Pascual, or anyone else affected,

Accepted duplicity into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/duplicity/0.6.19-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in duplicity (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Pascual, or anyone else affected,

Accepted duplicity into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/duplicity/0.6.18-0ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in duplicity (Ubuntu Precise):
status: Confirmed → Fix Committed
Revision history for this message
Miklos Juhasz (mjuhasz) wrote :

The proposed package fixes the data corruption issue.
I ran the test script with the version in the normal repository first and got corruption for the first run. I updated to the proposed package and ran the test script several times and there was no corruption at all. Thank you for the fix!

Revision history for this message
Miklos Juhasz (mjuhasz) wrote :

I forgot to mention that I was testing on Precise, duplicity version 0.6.18-0ubuntu3.1

tags: added: verification-done
removed: verification-needed
Revision history for this message
Colin Watson (cjwatson) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package duplicity - 0.6.18-0ubuntu3.1

---------------
duplicity (0.6.18-0ubuntu3.1) precise-proposed; urgency=low

  [ Chris J Arges ]
  * debian/patches/07caching.dpatch:
    - Backport caching work done by Steve Atwell. (LP: #1013446)

  [ Michael Terry ]
  * debian/patches/08volcorruption.dpatch:
    - Fix some data-corruption issues when resuming an interrupted
      backup (LP: #1091269)
 -- Michael Terry <email address hidden> Fri, 11 Jan 2013 11:05:16 -0500

Changed in duplicity (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote :

duplicity (0.6.19-0ubuntu2.2) quantal-proposed; urgency=low

  * debian/patches/09volcorruption.dpatch:
    - Backport a fix for a restart test that caused FTBFS on armhf

 -- Michael Terry <email address hidden> Fri, 11 Jan 2013 13:59:19 -0500

duplicity (0.6.19-0ubuntu2.1) quantal-proposed; urgency=low

  [ Michael Terry ]
  * debian/patches/07u1utf8.dpatch:
    - Update with a fix for a UnicodeDecodeError crash (LP: #1080423)
  * debian/patches/09volcorruption.dpatch:
    - Fix some data-corruption issues when resuming an interrupted
      backup (LP: #1091269)

  [ Chris J Arges ]
  * debian/patches/08caching.dpatch:
    - Backport caching work done by Steve Atwell. (LP: #1013446)

 -- Michael Terry <email address hidden> Wed, 05 Dec 2012 16:39:01 -0500

Changed in duplicity (Ubuntu Quantal):
status: Fix Committed → Fix Released
Changed in duplicity:
milestone: none → 0.6.21
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Pascual, or anyone else affected,

Accepted duplicity into lucid-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/duplicity/0.6.08b-0ubuntu2.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in duplicity (Ubuntu Lucid):
status: Confirmed → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Changed in duplicity (Ubuntu Oneiric):
status: Confirmed → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Pascual, or anyone else affected,

Accepted duplicity into oneiric-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/duplicity/0.6.15-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

Created attachment 711658
test script

Description of problem:

It's very easy to accidentally corrupt the backup copy of a file when resuming an interrupted backup.

Specifically, there are several ways to end up backing up corrupted data:

A) If resuming after a volume that ended in a one-block file, we would skip the first block of the next file.
B) If resuming after a volume that ended in a multi-block file, we would skip the first block of the next file.
C) If resuming a non-encrypted backup after a volume that spanned a multi-block file, we would skip some data inside the file.

These are all very similar but have slightly different code fixes. Together they amount to "if you resume a backup, it's very likely you will have corrupted data somewhere". The only situation that doesn't is resuming in the middle of a file while using encryption.

(This is upstream bug https://bugs.launchpad.net/duplicity/+bug/1091269 )

Version-Release number of selected component (if applicable):

This is fixed in upstream 0.6.21. It is present in probably any version of duplicity >= 0.6.0.

How reproducible:

Reliably.

Steps to Reproduce:

1. Download the script 'auto-ctrl-c-test.sh' from this bug
2
. sudo /bin/bash ./auto-ctrl-c-test.sh -s

Actual results:

There will be several lines at the end of the output explaining the difference between the original files and the restored ones.

Expected results:

It should end with "***** Diff between /lib and /tmp/restore" and no lines following.

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

Created attachment 711659
Patch against 0.6.20

Here's a patch against 0.6.20 for F18.

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

Created attachment 711660
Patch against 0.6.18

Here's a patch against 0.6.18 for F17.

Revision history for this message
In , Rahul (rahul-redhat-bugs) wrote :

I have reviewed the upstream changelog and I think it is just better to push the latest upstream release to all active branches instead of cherry picking fixes. I am going to attempt to do that now.

@robert, holler if you have a problem with that!

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/duplicity-0.6.21-1.fc18

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/duplicity-0.6.21-1.fc17

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/duplicity-0.6.21-1.el6

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/duplicity-0.6.21-1.el5

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

Package duplicity-0.6.21-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing duplicity-0.6.21-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-0709/duplicity-0.6.21-1.el6
then log in and leave karma (feedback).

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

duplicity-0.6.21-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
Brian Murray (brian-murray) wrote : [duplicity/lucid] verification still needed

The fix for this bug has been awaiting testing feedback in the -proposed repository for lucid for more than 90 days. Please test this fix and update the bug appropriately with the results. In the event that the fix for this bug is still not verified 15 days from now, the package will be removed from the -proposed repository.

tags: added: removal-candidate
Changed in duplicity (Ubuntu Oneiric):
status: Fix Committed → Won't Fix
Revision history for this message
Michael Terry (mterry) wrote :

I know it's gauche to verify your own SRU, but since no one else was doing it... I tested on lucid and verified the fix.

tags: added: verification-done
removed: removal-candidate verification-needed
Revision history for this message
Chris Halse Rogers (raof) wrote :

Not gauche at all. Much better to verify your own SRUs than to leave them unloved!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package duplicity - 0.6.08b-0ubuntu2.2

---------------
duplicity (0.6.08b-0ubuntu2.2) lucid-proposed; urgency=low

  * debian/patches/07volcorruption.dpatch:
    - Fix some data-corruption issues when resuming an interrupted
      backup (LP: #1091269)
 -- Michael Terry <email address hidden> Fri, 11 Jan 2013 12:28:43 -0500

Changed in duplicity (Ubuntu Lucid):
status: Fix Committed → Fix Released
Changed in duplicity (Fedora):
importance: Unknown → Undecided
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.