Hang on first collection-status
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Duplicity |
Fix Released
|
High
|
Unassigned |
Bug Description
Duplicity: 0.6.01
Python: 2.6.2+
OS: Ubuntu 9.04
When running duplicity 0.6.01 with "collection-status" for the first time, it hangs after displaying the status of the collection. No CPU nor network activity occurs while it is hanging. If interrupted, "collection-status" works normally the second time.
Cristian Klein (cristiklein) wrote : | #1 |
Kenneth Loafman (kenneth-loafman) wrote : | #2 |
Cristian Klein (cristiklein) wrote : Re: [Bug 394629] Re: Hang on first collection-status | #3 |
Kenneth Loafman a écrit :
> Is there any way to reproduce this? Does it happen all the time, or
> just sometimes?
>
> I'm having a hard time replicating the problem.
Try to „rm -rf ~/.cache/duplicity” the run duplicity without
„--archive-dir” nor „--name”.
Kenneth Loafman (kenneth-loafman) wrote : | #4 |
Cristian KLEIN wrote:
> Kenneth Loafman a écrit :
>> Is there any way to reproduce this? Does it happen all the time, or
>> just sometimes?
>>
>> I'm having a hard time replicating the problem.
>
> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
> „--archive-dir” nor „--name”.
No problem with that. It just recreates the dirs, does the resync, then
the collection status, no delay anywhere.
...Ken
Cristian Klein (cristiklein) wrote : | #5 |
Kenneth Loafman a écrit :
> Cristian KLEIN wrote:
>> Kenneth Loafman a écrit :
>>> Is there any way to reproduce this? Does it happen all the time, or
>>> just sometimes?
>>>
>>> I'm having a hard time replicating the problem.
>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>> „--archive-dir” nor „--name”.
>
> No problem with that. It just recreates the dirs, does the resync, then
> the collection status, no delay anywhere.
Strange. I can reproduce it every time. At the end, I have three GnuPG
processes which are:
gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
--no-secmem-warning --use-agent --decrypt
gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
--no-secmem-warning --use-agent --decrypt
gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
--no-secmem-warning --use-agent --decrypt
If I kill them by hand, duplicity resumes.
I also tried removing everything duplicity-related and installing it
from the PPA, but the problem persists.
Kenneth Loafman (kenneth-loafman) wrote : | #6 |
Cristian KLEIN wrote:
> Kenneth Loafman a écrit :
>> Cristian KLEIN wrote:
>>> Kenneth Loafman a écrit :
>>>> Is there any way to reproduce this? Does it happen all the time, or
>>>> just sometimes?
>>>>
>>>> I'm having a hard time replicating the problem.
>>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>>> „--archive-dir” nor „--name”.
>> No problem with that. It just recreates the dirs, does the resync, then
>> the collection status, no delay anywhere.
>
> Strange. I can reproduce it every time. At the end, I have three GnuPG
> processes which are:
>
> gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
> --no-secmem-warning --use-agent --decrypt
> gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
> --no-secmem-warning --use-agent --decrypt
> gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
> --no-secmem-warning --use-agent --decrypt
>
> If I kill them by hand, duplicity resumes.
>
> I also tried removing everything duplicity-related and installing it
> from the PPA, but the problem persists.
If you start off with a working archive, does it do the same thing? In
other words, if the first run fails and you rerun, does the second run
fail as well? Or, is it just the first run with an empty archive?
Cristian Klein (cristiklein) wrote : | #7 |
Kenneth Loafman a écrit :
> Cristian KLEIN wrote:
>> Kenneth Loafman a écrit :
>>> Cristian KLEIN wrote:
>>>> Kenneth Loafman a écrit :
>>>>> Is there any way to reproduce this? Does it happen all the time, or
>>>>> just sometimes?
>>>>>
>>>>> I'm having a hard time replicating the problem.
>>>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>>>> „--archive-dir” nor „--name”.
>>> No problem with that. It just recreates the dirs, does the resync, then
>>> the collection status, no delay anywhere.
>> Strange. I can reproduce it every time. At the end, I have three GnuPG
>> processes which are:
>>
>> gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
>> --no-secmem-warning --use-agent --decrypt
>> gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
>> --no-secmem-warning --use-agent --decrypt
>> gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
>> --no-secmem-warning --use-agent --decrypt
>>
>> If I kill them by hand, duplicity resumes.
>>
>> I also tried removing everything duplicity-related and installing it
>> from the PPA, but the problem persists.
>
> If you start off with a working archive, does it do the same thing? In
> other words, if the first run fails and you rerun, does the second run
> fail as well? Or, is it just the first run with an empty archive?
It's just the first run with an empty archive. The second time it runs
flawlessly.
Kenneth Loafman (kenneth-loafman) wrote : | #8 |
- duplicity-coll-status-hang.patch Edit (1.2 KiB, text/x-patch; name="duplicity-coll-status-hang.patch")
Cristian KLEIN wrote:
> Kenneth Loafman a écrit :
>> Cristian KLEIN wrote:
>>> Kenneth Loafman a écrit :
>>>> Cristian KLEIN wrote:
>>>>> Kenneth Loafman a écrit :
>>>>>> Is there any way to reproduce this? Does it happen all the time, or
>>>>>> just sometimes?
>>>>>>
>>>>>> I'm having a hard time replicating the problem.
>>>>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>>>>> „--archive-dir” nor „--name”.
>>>> No problem with that. It just recreates the dirs, does the resync, then
>>>> the collection status, no delay anywhere.
>>> Strange. I can reproduce it every time. At the end, I have three GnuPG
>>> processes which are:
>>>
>>> gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
>>> --no-secmem-warning --use-agent --decrypt
>>> gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
>>> --no-secmem-warning --use-agent --decrypt
>>> gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
>>> --no-secmem-warning --use-agent --decrypt
>>>
>>> If I kill them by hand, duplicity resumes.
>>>
>>> I also tried removing everything duplicity-related and installing it
>>> from the PPA, but the problem persists.
>> If you start off with a working archive, does it do the same thing? In
>> other words, if the first run fails and you rerun, does the second run
>> fail as well? Or, is it just the first run with an empty archive?
>
> It's just the first run with an empty archive. The second time it runs
> flawlessly.
Here's a patch, no guarantees, but it will change the order of the
initial collection status and sync. It's possible you've encountered
some sort of race condition, but who knows.
Apply this to /usr/bin/duplicity, or wherever your have duplicity.
Cristian Klein (cristiklein) wrote : | #9 |
Kenneth Loafman a écrit :
> Cristian KLEIN wrote:
>> Kenneth Loafman a écrit :
>>> Cristian KLEIN wrote:
>>>> Kenneth Loafman a écrit :
>>>>> Cristian KLEIN wrote:
>>>>>> Kenneth Loafman a écrit :
>>>>>>> Is there any way to reproduce this? Does it happen all the time, or
>>>>>>> just sometimes?
>>>>>>>
>>>>>>> I'm having a hard time replicating the problem.
>>>>>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>>>>>> „--archive-dir” nor „--name”.
>>>>> No problem with that. It just recreates the dirs, does the resync, then
>>>>> the collection status, no delay anywhere.
>>>> Strange. I can reproduce it every time. At the end, I have three GnuPG
>>>> processes which are:
>>>>
>>>> gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
>>>> --no-secmem-warning --use-agent --decrypt
>>>> gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
>>>> --no-secmem-warning --use-agent --decrypt
>>>> gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
>>>> --no-secmem-warning --use-agent --decrypt
>>>>
>>>> If I kill them by hand, duplicity resumes.
>>>>
>>>> I also tried removing everything duplicity-related and installing it
>>>> from the PPA, but the problem persists.
>>> If you start off with a working archive, does it do the same thing? In
>>> other words, if the first run fails and you rerun, does the second run
>>> fail as well? Or, is it just the first run with an empty archive?
>> It's just the first run with an empty archive. The second time it runs
>> flawlessly.
>
> Here's a patch, no guarantees, but it will change the order of the
> initial collection status and sync. It's possible you've encountered
> some sort of race condition, but who knows.
>
> Apply this to /usr/bin/duplicity, or wherever your have duplicity.
>
>
> ** Attachment added: "duplicity-
> http://
>
Sorry, it doesn't solve the problem. Same effect: 3 GPG processes which,
if killed, duplicity resumes.
Kenneth Loafman (kenneth-loafman) wrote : | #10 |
Cristian KLEIN wrote:
> Kenneth Loafman a écrit :
>> Cristian KLEIN wrote:
>>> Kenneth Loafman a écrit :
>>>> Cristian KLEIN wrote:
>>>>> Kenneth Loafman a écrit :
>>>>>> Cristian KLEIN wrote:
>>>>>>> Kenneth Loafman a écrit :
>>>>>>>> Is there any way to reproduce this? Does it happen all the time, or
>>>>>>>> just sometimes?
>>>>>>>>
>>>>>>>> I'm having a hard time replicating the problem.
>>>>>>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>>>>>>> „--archive-dir” nor „--name”.
>>>>>> No problem with that. It just recreates the dirs, does the resync, then
>>>>>> the collection status, no delay anywhere.
>>>>> Strange. I can reproduce it every time. At the end, I have three GnuPG
>>>>> processes which are:
>>>>>
>>>>> gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
>>>>> --no-secmem-warning --use-agent --decrypt
>>>>> gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
>>>>> --no-secmem-warning --use-agent --decrypt
>>>>> gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
>>>>> --no-secmem-warning --use-agent --decrypt
>>>>>
>>>>> If I kill them by hand, duplicity resumes.
>>>>>
>>>>> I also tried removing everything duplicity-related and installing it
>>>>> from the PPA, but the problem persists.
>>>> If you start off with a working archive, does it do the same thing? In
>>>> other words, if the first run fails and you rerun, does the second run
>>>> fail as well? Or, is it just the first run with an empty archive?
>>> It's just the first run with an empty archive. The second time it runs
>>> flawlessly.
>> Here's a patch, no guarantees, but it will change the order of the
>> initial collection status and sync. It's possible you've encountered
>> some sort of race condition, but who knows.
>>
>> Apply this to /usr/bin/duplicity, or wherever your have duplicity.
>>
>>
>> ** Attachment added: "duplicity-
>> http://
>>
>
> Sorry, it doesn't solve the problem. Same effect: 3 GPG processes which,
> if killed, duplicity resumes.
OK, will chase it some more. The 3 GPG processes are normal and pretty
much stay around through the whole run. When there's many more than
that, its a problem.
Do you have any idea what the '+' represents on the end of Python 2.6.2+
that you use? Is it normal Python?
Cristian Klein (cristiklein) wrote : | #11 |
Kenneth Loafman a écrit :
> Cristian KLEIN wrote:
>> Kenneth Loafman a écrit :
>>> Cristian KLEIN wrote:
>>>> Kenneth Loafman a écrit :
>>>>> Cristian KLEIN wrote:
>>>>>> Kenneth Loafman a écrit :
>>>>>>> Cristian KLEIN wrote:
>>>>>>>> Kenneth Loafman a écrit :
>>>>>>>>> Is there any way to reproduce this? Does it happen all the time, or
>>>>>>>>> just sometimes?
>>>>>>>>>
>>>>>>>>> I'm having a hard time replicating the problem.
>>>>>>>> Try to „rm -rf ~/.cache/duplicity” the run duplicity without
>>>>>>>> „--archive-dir” nor „--name”.
>>>>>>> No problem with that. It just recreates the dirs, does the resync, then
>>>>>>> the collection status, no delay anywhere.
>>>>>> Strange. I can reproduce it every time. At the end, I have three GnuPG
>>>>>> processes which are:
>>>>>>
>>>>>> gpg --status-fd 4 --passphrase-fd 8 --logger-fd 3 --batch --no-tty
>>>>>> --no-secmem-warning --use-agent --decrypt
>>>>>> gpg --status-fd 4 --passphrase-fd 9 --logger-fd 3 --batch --no-tty
>>>>>> --no-secmem-warning --use-agent --decrypt
>>>>>> gpg --status-fd 4 --passphrase-fd 10 --logger-fd 3 --batch --no-tty
>>>>>> --no-secmem-warning --use-agent --decrypt
>>>>>>
>>>>>> If I kill them by hand, duplicity resumes.
>>>>>>
>>>>>> I also tried removing everything duplicity-related and installing it
>>>>>> from the PPA, but the problem persists.
>>>>> If you start off with a working archive, does it do the same thing? In
>>>>> other words, if the first run fails and you rerun, does the second run
>>>>> fail as well? Or, is it just the first run with an empty archive?
>>>> It's just the first run with an empty archive. The second time it runs
>>>> flawlessly.
>>> Here's a patch, no guarantees, but it will change the order of the
>>> initial collection status and sync. It's possible you've encountered
>>> some sort of race condition, but who knows.
>>>
>>> Apply this to /usr/bin/duplicity, or wherever your have duplicity.
>>>
>>>
>>> ** Attachment added: "duplicity-
>>> http://
>>>
>> Sorry, it doesn't solve the problem. Same effect: 3 GPG processes which,
>> if killed, duplicity resumes.
>
> OK, will chase it some more. The 3 GPG processes are normal and pretty
> much stay around through the whole run. When there's many more than
> that, its a problem.
>
> Do you have any idea what the '+' represents on the end of Python 2.6.2+
> that you use? Is it normal Python?
I used a Python version from the Debian repository which corrected some
bugs in 2.6.2, so the version returned was "2.6.2+". Now I am using
Ubuntu-shipped 9.04 Python, with version "2.6.2", so it can't be
Python's fault.
Could it be perhaps due to some backup-host specific properties? I run a
„restricted” SFTP server on it. However, all other duplicity operations
run flawlessly.
Peter Schuller (scode) wrote : | #12 |
I saw this earlier today too the first time I ran collection status after upgrading from 0.5.18 to 0.6.01 on Python 2.5 and FreeBSD.
For some reason I am unable to reproduce it again, even after nuking ~/.cache. Possibly because I have tacked on a couple of 0.6.01 produced backup increments onto the backups.
Kenneth Loafman (kenneth-loafman) wrote : | #13 |
Peter Schuller wrote:
> I saw this earlier today too the first time I ran collection status
> after upgrading from 0.5.18 to 0.6.01 on Python 2.5 and FreeBSD.
>
> For some reason I am unable to reproduce it again, even after nuking
> ~/.cache. Possibly because I have tacked on a couple of 0.6.01 produced
> backup increments onto the backups.
It looks like something is waiting, an IO operation, a thread,
something. If we could reproduce it, we could find it.
In a cron environment, a hang like this could be a real problem. It
might be possible to hang for days without getting noticed.
Peter Schuller (scode) wrote : | #14 |
For the record I was able to trigger it again with my backup to rsync.net (the other one I tried was my S3 backup). So whatever the problem is it is definitely crossing backends.
As per previous discussion I too saw a GPG child process. Hitting ctrl-c yielded:
gpg: signal Interrupt caught ... exiting
Exception exceptions.
I'll see if I can indeed reproduce it reliably and hopefully debug it.
Peter Schuller (scode) wrote : | #15 |
After inserting a printout on launching gpg (the '!!!....') and one on the wait() in GPGFile.close(), and comparing with pstree output, it looks like it is the GPG process from the *FIRST* downloaded file, in my case, which hangs round (with BEGIN WAIT/END WAIT) for that one not beeing seen:
Synchronizing remote metadata to local cache...
Copying duplicity-
Using temporary directory /tmp/duplicity-
Running 'sftp -oServerAliveIn
sftp command: 'get home_scode/
!!!!!!!
Copying duplicity-
Running 'sftp -oServerAliveIn
sftp command: 'get home_scode/
!!!!!!!
BEGIN WAIT
END WAIT
Deleting /tmp/duplicity-
Copying duplicity-
Running 'sftp -oServerAliveIn
sftp command: 'get home_scode/
!!!!!!!
BEGIN WAIT
END WAIT
.....
The first file is also never deleted from temp. This may be related. The close of the gpg is in this case supposed to happen as a result of SrcIter's next() method in copy_to_local() in duplicity-bin. It gets closed when you reach EOF on the file being decrypted, right before StopIteration is raised.
The file is probably not being read to EOF, which might also explain why the time file is not deleted, The would-be "Deleted ..." message comes from path.py and, when actually called, is called via this path:
Traceback (most recent call last):
File "./duplicity-bin", line 1146, in <module>
with_
File "./duplicity-bin", line 1139, in with_tempdir
fn()
File "./duplicity-bin", line 1044, in main
sync_
File "./duplicity-bin", line 895, in sync_archive
copy_
File "./duplicity-bin", line 865, in copy_to_local
copy_
File "./duplicity-bin", line 811, in copy_raw
data = src_iter.
File "./duplicity-bin", line 843, in next
self.
File "/usr/home/
hook()
File "/usr/home/
path.
File "/usr/home/
raise AssertionError(
(Moving to debugging another bug for a while, so wanted to dump this in the mean time before I forget the details)
Kenneth Loafman (kenneth-loafman) wrote : | #16 |
Hangs also on cleanup, remove-older-than, essentially any first use of a new archive dir.
Changed in duplicity: | |
importance: | Undecided → High |
status: | New → Confirmed |
milestone: | none → 0.6.02 |
Kenneth Loafman (kenneth-loafman) wrote : | #17 |
I just pushed the fix to this bug to the trunk. Thanks Peter for the initial analysis. The bug was caused by the early out in GPG/GZIPWriteFile() that stops at a certain volume size. I had removed the size = sys.maxint args in to_remote() and to_final() after the change to block size. My mistake. GZipWriteFile() was taking an early out at 5MB and never finishing the iteration.
Changed in duplicity: | |
status: | Confirmed → Fix Committed |
Cristian Klein (cristiklein) wrote : | #18 |
Kenneth Loafman a écrit :
> I just pushed the fix to this bug to the trunk. Thanks Peter for the
> initial analysis. The bug was caused by the early out in
> GPG/GZIPWriteFile() that stops at a certain volume size. I had removed
> the size = sys.maxint args in to_remote() and to_final() after the
> change to block size. My mistake. GZipWriteFile() was taking an early
> out at 5MB and never finishing the iteration.
>
> ** Changed in: duplicity
> Status: Confirmed => Fix Committed
Hello,
I'm sorry to tell you, but I still experience this bug even with the
last duplicity from bzr #553. I make sure that all previous traces of
duplicity were erased and that the patch in dup_temp.py was correctly
applied.
Kenneth Loafman (kenneth-loafman) wrote : | #19 |
Cristian KLEIN wrote:
> Kenneth Loafman a écrit :
>> I just pushed the fix to this bug to the trunk. Thanks Peter for the
>> initial analysis. The bug was caused by the early out in
>> GPG/GZIPWriteFile() that stops at a certain volume size. I had removed
>> the size = sys.maxint args in to_remote() and to_final() after the
>> change to block size. My mistake. GZipWriteFile() was taking an early
>> out at 5MB and never finishing the iteration.
>>
>> ** Changed in: duplicity
>> Status: Confirmed => Fix Committed
>
> Hello,
>
> I'm sorry to tell you, but I still experience this bug even with the
> last duplicity from bzr #553. I make sure that all previous traces of
> duplicity were erased and that the patch in dup_temp.py was correctly
> applied.
I just pushed another release with, I hope, the final fix. Turns out
that this one was not findable with a debugger, in fact, it would pass
completely when run under a debugger. AArgh!
If you could please test this out in your environment, I could get a
release out soon.
Cristian Klein (cristiklein) wrote : | #20 |
Kenneth Loafman a écrit :
> Cristian KLEIN wrote:
>> Kenneth Loafman a écrit :
>>> I just pushed the fix to this bug to the trunk. Thanks Peter for the
>>> initial analysis. The bug was caused by the early out in
>>> GPG/GZIPWriteFile() that stops at a certain volume size. I had removed
>>> the size = sys.maxint args in to_remote() and to_final() after the
>>> change to block size. My mistake. GZipWriteFile() was taking an early
>>> out at 5MB and never finishing the iteration.
>>>
>>> ** Changed in: duplicity
>>> Status: Confirmed => Fix Committed
>> Hello,
>>
>> I'm sorry to tell you, but I still experience this bug even with the
>> last duplicity from bzr #553. I make sure that all previous traces of
>> duplicity were erased and that the patch in dup_temp.py was correctly
>> applied.
>
> I just pushed another release with, I hope, the final fix. Turns out
> that this one was not findable with a debugger, in fact, it would pass
> completely when run under a debugger. AArgh!
>
> If you could please test this out in your environment, I could get a
> release out soon.
Good work! I downloaded the latest bzr and the hang does not occur any
more. Thank you for your patience!
Changed in duplicity: | |
status: | Fix Committed → Fix Released |
Is there any way to reproduce this? Does it happen all the time, or just sometimes?
I'm having a hard time replicating the problem.