hardlinking doesnt work with schedule per included folder enabled

Bug #412470 reported by Mark Baas
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Back In Time
Fix Released
High
Bart de Koning

Bug Description

Schedule per included folder is BUGGY, do NOT use it unless you want to do some debugging or if you have plenty of space (and computing power)!

The snapshots are too big, also without changing anything the snapshot size is the size of the folder, so hardlinking is not working. For the tasks with higher frequency the hardlinking fails: consuming the full space of all the folders that are scheduled for backup (also the ones with lower frequencies). The time to make the snapshot is incredibly long (exceeding 10 times the time to create a normal complete snapshot, which is logical as it recreates every single bit)
The lowest frequency tasks and manual backups behave normally

Related branches

Revision history for this message
Mark Baas (mark-baas123) wrote :

Tried it on ext3, same result. I guess the command backintime --backup-job is not hardlinking while backintime -b is.
My workaround now was to change the crontab in /var/spool/cron/...

Revision history for this message
Mark Baas (mark-baas123) wrote :

My apologies, it has nothing to do with the commands, it is just the cronjob. It is not making any hardlinks with backintime -b neither.

I have no clue why the hardlinking wont work from a cron job.

Revision history for this message
Mark Baas (mark-baas123) wrote :

Okay I have located the problem. I have enabled schedule per included folder. Unfortunately i cannot change bug name, but the I guess the problem i related to the diff backintime runs.

So new summary: hardlinking doesnt work from cronjob with schedule per included folder enabled

Revision history for this message
Borph (borph) wrote : Re: hardlinking doesnt work from cronjob with schedule per included folder enabled

I changed your summary.

I actually have a simmilar phenomenon. When the cronjob does the backup, it takes ages and makes copies, even if unchanged.
As I reported in my question under 'answers', I have every file as *~ file as well.

I have schedule per included folder enabled, too.

summary: - hardlinking not working in cronned snapshots
+ hardlinking doesnt work from cronjob with schedule per included folder
+ enabled
Revision history for this message
Bart de Koning (bratdaking) wrote :

Hey Mark and Borph,

I tried to reproduce your phenomenon, but I can't. I use schedules per included folder, but the size of my snapshot dir stays the same.
Mark: the size of the snapshot should equal the size of the originating folder, the folder were the snapshots folders are located (backintime) should not equal the cumulative sizes of the separate snapshot folders, indicating that you use hardlinks. Could you verify that?
Borph: your problem is I think different because you apparently see *~ files. On what kind of partition do you save the snapshots (ext3 or fat32 or ...?) and is it located on a harddisk or memory card. What is your distribution, and the version of Backintime that you use?

Cheers,
Bart

Changed in backintime:
status: New → Incomplete
Revision history for this message
Borph (borph) wrote :

Bart,

I use Kubuntu 9.04 and a ext4 root partition and a 1TB sized external (USB) drive with ext4, encrypted with LUKS (if that matters).

But don't worry reproducing this bug, like I mentioned under 'answers' I have to check my configuration, but had no time to experiment so far.

I should be sure, that each scheduled folder is disjoint of each other, for example this is not:
/home: weekly
/home/peter/documents: hourly

I will test again. Can I simulate a cronjob? backintime --backup-job did nothing if the last snapshot was done just recently.

Thanks,
Peter

Revision history for this message
Bart de Koning (bratdaking) wrote : Re: [Bug 412470] Re: hardlinking doesnt work from cronjob with schedule per included folder enabled

You can not simulate a cronjob otherwise than you already did, but you can
start a manual backup by executing backintime -b
Scheduled folders do not necessary have to be disjoint of eachother: it will
make always a copy of the whole snapshot folder (so it will copy + hardlink
both) but it will update the folder (and subfolders) of the one that is
scheduled, if you include a folder that is already included it will not run
the folder twice or something. backintime - b will update both btw.
If it says that there is nothing to do, you can always make a little test
file or something, than it will notice that something has changed...

Cheers,
Bart

2009/8/27 Borph <email address hidden>

> Bart,
>
> I use Kubuntu 9.04 and a ext4 root partition and a 1TB sized external
> (USB) drive with ext4, encrypted with LUKS (if that matters).
>
> But don't worry reproducing this bug, like I mentioned under 'answers' I
> have to check my configuration, but had no time to experiment so far.
>
> I should be sure, that each scheduled folder is disjoint of each other, for
> example this is not:
> /home: weekly
> /home/peter/documents: hourly
>
> I will test again. Can I simulate a cronjob? backintime --backup-job did
> nothing if the last snapshot was done just recently.
>
> Thanks,
> Peter
>
> --
> hardlinking doesnt work from cronjob with schedule per included folder
> enabled
> https://bugs.launchpad.net/bugs/412470
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Borph (borph) wrote : Re: hardlinking doesnt work from cronjob with schedule per included folder enabled
Download full text (3.2 KiB)

I reproduced now the issue with the *~ files!

I made two folders with two files each:
peter@borphtux:~$ ls /home/peter/BiTtest/
file1.txt file2.txt subfolder
peter@borphtux:~$ ls /home/peter/BiTtest/subfolder/
file3.txt file4.txt

The schedule was:
/home/peter/BiTtest: every 10min
/home/peter/BiTtest/subfolder: every 5min

The config file (part):
snapshots.exclude_patterns=*.backup*:/media:/lost+found:**/.thumbnails/**
snapshots.expert.per_directory_schedule=true
snapshots.include_folders=/home/peter/BiTtest|4:/home/peter/BiTtest/subfolder|2

I made a forced backup and got a "20090831-203106" folder with the expected files.

I changed file1 and file3 and waited. 20090831-204000 got created with including both folders as expected.

I changed again both files. 20090831-204500 got created with these files:
peter@borphtux:~$ ls -l /media/BigBilly/System/BackInTime/backintime/20090831-204500/backup/home/peter/BiTtest/
insgesamt 12
-r--r--r-- 2 peter peter 11 2009-08-31 20:35 file1.txt
-r--r--r-- 4 peter peter 10 2009-08-31 20:28 file2.txt
drwxr-xr-x 2 peter peter 4096 2009-08-31 20:36 subfolder
(this is as expected, unchanged, but was not due)

peter@borphtux:~$ ls -l /media/BigBilly/System/BackInTime/backintime/20090831-204500/backup/home/peter/BiTtest/subfolder/
insgesamt 16
-r--r--r-- 2 peter peter 11 2009-08-31 20:36 file3.txt
-r--r--r-- 1 peter peter 12 2009-08-31 20:41 file3.txt~
-r--r--r-- 4 peter peter 10 2009-08-31 20:28 file4.txt
-r--r--r-- 1 peter peter 10 2009-08-31 20:28 file4.txt~

Hum, these *~ files got created! During the next 10-min backup they disappear, ok, but this means they come each time a subfolder is backed up and a folder above not!

Lets see the logfile:
INFO: Create hard-links
INFO: Command "cp -al "/media/BigBilly/System/BackInTime/backintime/20090831-204000/backup/home/peter/BiTtest/subfolder"* "/media/BigBilly/System/BackInTime/backintime/new_snapshot/backup/home/peter/BiTtest/subfolder"" returns 0
INFO: Call rsync to take the snapshot
INFO: Command "rsync -aEAX -v --delete-excluded --chmod=Fa-w,D+w --whole-file --delete --exclude="/media/BigBilly/System/BackInTime" --exclude="/root/.local/share/backintime" --include="/home/peter/BiTtest/subfolder/" --include="/home/peter/BiTtest/" --include="/home/peter/" --include="/home/" --exclude="*.backup*" --exclude="/media" --exclude="/lost+found" --exclude="**/.thumbnails/**" --exclude="/home/peter/BiTtest" --include="/home/peter/BiTtest/subfolder/**" --exclude="*" / "/media/BigBilly/System/BackInTime/backintime/new_snapshot/backup/"" returns 0
INFO: Save permissions
INFO: Command "cp -alb "/media/BigBilly/System/BackInTime/backintime/20090831-204000/backup/home/peter/BiTtest/"* "/media/BigBilly/System/BackInTime/backintime/new_snapshot/backup/home/peter/BiTtest"" returns 0

So I guess the second cp -alb copies the old version (the previous snapshot) again over the new_snapshot. This explains the file-dates!

Of course the cp -alb after the rsync is meant to copy hard-linked the folders, which have not been synced. But if the folders are not disjoint, this command includes recursively already synced files, attempting to overwrite them.

Last but not least:
pe...

Read more...

Revision history for this message
Bart de Koning (bratdaking) wrote : Re: [Bug 412470] Re: hardlinking doesnt work from cronjob with schedule per included folder enabled
Download full text (5.5 KiB)

Some strange things I notice: your includes and excludes are a bit weird
like:
--exclude="/home/peter/BiTtest" --include="/home/peter/BiTtest/subfolder/**"
Should not mention the exclude... oh wait it should mention it if the folder
that should only be backupped once every year or something is a subfolder of
a folder that should be backupped every 10 minutes, it should exclude it.
And apparently it does not overwrite the includes -> it produces the
snapshot with the files anyway
So that makes sense

What does not on the other hand is the cp -alb function for a folder it
already copied, anyway, actually what is the whole reason for this command,
ignored folders are copied anyway during the hardlinking?

At the moment I am checking a quite difficult backup scheme to see if this
behaviour is reproducable here. Probably some commands for the schedule per
folder backup do the wrong thing.
Ok, it is now making a snapshot and calls indeed the cp -alb command, it
consumes loads of time. It even comes over the 10 minutes (the rsync took
not more than 2 minutes) so it delays my next (hourly) backup. However it
does do the right thing if the directories are disjoint. I will try to make
an exclude that is joint...
The backup dir has grown incredibly in size, however I do not see the ~
behaviour yet. What I do see is time consumption, and size consumption. The
hardlinking definitely fails when it does the small job (I have two
schedules: 1 every 10 minutes, 1 every hour). The every hour job goes
perfectly fine and as normal, but the 10 minutes job is behaving not as it
should (actually right at this point it consumed all the space there was,
filling up my logs with warnings, while there used to be plenty of space two
hours ago when I started this test, it also did not rename or remove the
new-snapshot folder any longer, -> another bug...)
So far my main conclusion is that you should not use the schedule per
included folder option as it is still buggy!

Cheers,
Bart

2009/8/31 Borph <email address hidden>

> I reproduced now the issue with the *~ files!
>
> I made two folders with two files each:
> peter@borphtux:~$ ls /home/peter/BiTtest/
> file1.txt file2.txt subfolder
> peter@borphtux:~$ ls /home/peter/BiTtest/subfolder/
> file3.txt file4.txt
>
> The schedule was:
> /home/peter/BiTtest: every 10min
> /home/peter/BiTtest/subfolder: every 5min
>
> The config file (part):
> snapshots.exclude_patterns=*.backup*:/media:/lost+found:**/.thumbnails/**
> snapshots.expert.per_directory_schedule=true
>
> snapshots.include_folders=/home/peter/BiTtest|4:/home/peter/BiTtest/subfolder|2
>
> I made a forced backup and got a "20090831-203106" folder with the
> expected files.
>
> I changed file1 and file3 and waited. 20090831-204000 got created with
> including both folders as expected.
>
> I changed again both files. 20090831-204500 got created with these files:
> peter@borphtux:~$ ls -l
> /media/BigBilly/System/BackInTime/backintime/20090831-204500/backup/home/peter/BiTtest/
> insgesamt 12
> -r--r--r-- 2 peter peter 11 2009-08-31 20:35 file1.txt
> -r--r--r-- 4 peter peter 10 2009-08-31 20:28 file2.txt
> drwxr-xr-x 2 peter peter 4096 2009-08-31 20:36 subfolder
> (t...

Read more...

Changed in backintime:
status: Incomplete → Confirmed
summary: - hardlinking doesnt work from cronjob with schedule per included folder
- enabled
+ hardlinking doesnt work with schedule per included folder enabled
Revision history for this message
Bart de Koning (bratdaking) wrote :
Download full text (5.8 KiB)

By the way I changed the name and the description a bit

2009/9/1 Bart <email address hidden>

> Some strange things I notice: your includes and excludes are a bit weird
> like:
> --exclude="/home/peter/BiTtest"
> --include="/home/peter/BiTtest/subfolder/**"
> Should not mention the exclude... oh wait it should mention it if the
> folder that should only be backupped once every year or something is a
> subfolder of a folder that should be backupped every 10 minutes, it should
> exclude it. And apparently it does not overwrite the includes -> it produces
> the snapshot with the files anyway
> So that makes sense
>
> What does not on the other hand is the cp -alb function for a folder it
> already copied, anyway, actually what is the whole reason for this command,
> ignored folders are copied anyway during the hardlinking?
>
> At the moment I am checking a quite difficult backup scheme to see if this
> behaviour is reproducable here. Probably some commands for the schedule per
> folder backup do the wrong thing.
> Ok, it is now making a snapshot and calls indeed the cp -alb command, it
> consumes loads of time. It even comes over the 10 minutes (the rsync took
> not more than 2 minutes) so it delays my next (hourly) backup. However it
> does do the right thing if the directories are disjoint. I will try to make
> an exclude that is joint...
> The backup dir has grown incredibly in size, however I do not see the ~
> behaviour yet. What I do see is time consumption, and size consumption. The
> hardlinking definitely fails when it does the small job (I have two
> schedules: 1 every 10 minutes, 1 every hour). The every hour job goes
> perfectly fine and as normal, but the 10 minutes job is behaving not as it
> should (actually right at this point it consumed all the space there was,
> filling up my logs with warnings, while there used to be plenty of space two
> hours ago when I started this test, it also did not rename or remove the
> new-snapshot folder any longer, -> another bug...)
> So far my main conclusion is that you should not use the schedule per
> included folder option as it is still buggy!
>
> Cheers,
> Bart
>
>
> 2009/8/31 Borph <email address hidden>
>
> I reproduced now the issue with the *~ files!
>>
>> I made two folders with two files each:
>> peter@borphtux:~$ ls /home/peter/BiTtest/
>> file1.txt file2.txt subfolder
>> peter@borphtux:~$ ls /home/peter/BiTtest/subfolder/
>> file3.txt file4.txt
>>
>> The schedule was:
>> /home/peter/BiTtest: every 10min
>> /home/peter/BiTtest/subfolder: every 5min
>>
>> The config file (part):
>> snapshots.exclude_patterns=*.backup*:/media:/lost+found:**/.thumbnails/**
>> snapshots.expert.per_directory_schedule=true
>>
>> snapshots.include_folders=/home/peter/BiTtest|4:/home/peter/BiTtest/subfolder|2
>>
>> I made a forced backup and got a "20090831-203106" folder with the
>> expected files.
>>
>> I changed file1 and file3 and waited. 20090831-204000 got created with
>> including both folders as expected.
>>
>> I changed again both files. 20090831-204500 got created with these files:
>> peter@borphtux:~$ ls -l
>> /media/BigBilly/System/BackInTime/backintime/20090831-204500/backup/home/peter/BiTtest/...

Read more...

description: updated
Revision history for this message
Borph (borph) wrote :

Hi,

> Some strange things I notice: your includes and excludes are a bit weird
> like:
> --exclude="/home/peter/BiTtest" --include="/home/peter/BiTtest/subfolder/**"
> Should not mention the exclude... oh wait it should mention it if the folder
> that should only be backupped once every year or something is a subfolder of
> a folder that should be backupped every 10 minutes, it should exclude it.

I think you confused it, didn't you? The subfolder has the higher
frequency than the parent folder in my example. If they are disjoint,
it makes sense to exclude the one with the lower frequency of course,
so it doesn't get rsync'ed. But here they are not disjoint.

> What does not on the other hand is the cp -alb function for a folder it
> already copied, anyway, actually what is the whole reason for this command,
> ignored folders are copied anyway during the hardlinking?

I think it tries to hardlink in two steps: one before the rsync and
one after. Before, the folders are hardlinked, which are going to be
rsync'ed, after the other folders are, which should not be included in
the rsync (because of their low frequency).

> The backup dir has grown incredibly in size, however I do not see the ~
> behaviour yet. What I do see is time consumption, and size consumption.

This fits. I think the *~ files are a symptom on my machine. Your cp
-alb command behaves different. Mine detects that file3.txt already
exists and makes a copy to file3.txt~ (link-count 1), then it
hard-links file3.txt from source again.

>> -r--r--r-- 2 peter peter 11 2009-08-31 20:36 file3.txt <== linked from 20090831-204500, old! wrong!
>> -r--r--r-- 1 peter peter 12 2009-08-31 20:41 file3.txt~ <== that was rsync'ed, but cp made a backup after

In fact, now at work I tried to reproduce the behaviour of cp -alb and
I couldn't! Its a Fedora box, and it makes the ~ files if the
destination exists and has no relation, but if source and destination
are actually links to the same file, it does nothing. I think at home
this was different.

Anyway, as with a changed and rsync'ed file the source and destination
are not the same, the intended behaviour of cp -alb is wrong anyway
(on my box its wrong on even any file).

==> schedules per include-folder are broken indeed.

Revision history for this message
Bart de Koning (bratdaking) wrote :

> I think you confused it, didn't you? The subfolder has the higher
> frequency than the parent folder in my example. If they are disjoint,
> it makes sense to exclude the one with the lower frequency of course,
> so it doesn't get rsync'ed. But here they are not disjoint.
Not necessary to exclude something if you don't include it either, except when a subfolder has a lower frequency than the parent folder (otherwise you rsync the subfolder also with the higher freq). That is what I meant...

> I think it tries to hardlink in two steps: one before the rsync and
> one after. Before, the folders are hardlinked, which are going to be
> rsync'ed, after the other folders are, which should not be included in
> the rsync (because of their low frequency).
And that is exactly where it gets wrong. The copying afterwards is A not hardlinked, and B writes over the complete snapshot if it happens to be a parent folder. If you hardlink the complete older snapshot that copying afterwards is not necessary, except that rsync gets the --delete-excluded option. Why deleting something that you have copied in the first place and then afterwards copy it again. I just don't get that part...
So I removed & changed that part, and the behaviour is far better now, only not ideal for fat disks. I will do some testing in the meanwhile, and report the changes if they are successful...

> This fits. I think the *~ files are a symptom on my machine. Your cp
> -alb command behaves different. Mine detects that file3.txt already
> exists and makes a copy to file3.txt~ (link-count 1), then it
> hard-links file3.txt from source again.

>> -r--r--r-- 2 peter peter 11 2009-08-31 20:36 file3.txt <== linked from 20090831-204500, old! wrong!
>> -r--r--r-- 1 peter peter 12 2009-08-31 20:41 file3.txt~ <== that was rsync'ed, but cp made a backup after

Ok interesting, where comes that difference in behaviour from (mine is a Ubuntu box, but we use the same cp most probably)?
Anyway I think we should get rid of this deleting first and then copying afterwards anyway....

Revision history for this message
Bart de Koning (bratdaking) wrote :

I think I am close to a temporary solution that does the trick right, however cost considerable space on fat32 disks

Changed in backintime:
assignee: nobody → Bart de Koning (bratdaking)
Revision history for this message
Borph (borph) wrote : Re: [Bug 412470] Re: hardlinking doesnt work with schedule per included folder enabled

> Not necessary to exclude something if you don't include it either, except when a subfolder has a lower frequency than the parent folder (otherwise you rsync the subfolder also with the higher freq). That is what I meant...

Ah ok, yes I agree. This would make sense. So what is the current
behaviour given two folders with different frequency?

> And that is exactly where it gets wrong. The copying afterwards is A not hardlinked, and B writes over the complete snapshot if it happens to be a parent folder. If you hardlink the complete older snapshot that copying afterwards is not necessary, except that rsync gets the --delete-excluded option. Why deleting something that you have copied in the first place and then afterwards copy it again. I just don't get that part...

As far as I saw it from the logs, the first "cp" just copies
'included' folders, whereas the second copies the rest, the folders
with the low frequency. I think the reason is the --delete-excluded
option, but this approach doesn't work with overlapping folders like I
had.

So best would be - as you said - to copy hardlinked _all_ files, then
do the rsync but without --delete-excluded. This way, only files we
want are synced, but every file is existing.

The question is: do we want to link _all_ files in each snapshot. This
is bad on fat32, but costs also performance on ext4. Maybe I write
more thoughts in the 'answer' section.

> Ok interesting, where comes that difference in behaviour from (mine is a Ubuntu box, but we use the same cp most probably)?
> Anyway I think we should get rid of this deleting first and then copying afterwards anyway....

I have Ubuntu at home, Fedora at work. But (being a developer myself)
I cannot 100% confirm this behaviour at the moment as long as I don't
do deeper tests at home.. so don't worry.

Peter

Revision history for this message
Bart de Koning (bratdaking) wrote :

> Ah ok, yes I agree. This would make sense. So what is the current
behaviour given two folders with different frequency?

1) It copies only the included folders (not the ignored -> folders scheduled for later) using cp -alb what has unexpected behaviour
2) Syncs the included folders, exclude the excluded folders and ignored foders, but also deletes them in the snapshot (but they were not copied anyway...)
3) Copies the ignored folders using cp -alb

> So best would be - as you said - to copy hardlinked _all_ files, then
do the rsync but without --delete-excluded. This way, only files we
want are synced, but every file is existing.

That is how I solved it

> The question is: do we want to link _all_ files in each snapshot. This
is bad on fat32, but costs also performance on ext4. Maybe I write
more thoughts in the 'answer' section.

Indeed that is also my main remark, however now it consumes overall already less space and time because of the fix.
ext4 preformance drop is I think minimal as hardlinking the whole is faster than hardlinking every single included folder if there are many...
Probably we need some more sophisticated cp -al behaviour (copying them from different sources???)

Cheers,
Bart

Changed in backintime:
status: Confirmed → Fix Committed
Revision history for this message
Borph (borph) wrote :

I think we have to be clear what BackInTime should be able to do and which kind of program it is..

It's nice that BiT can be run by any user, but as root you have more possibilities: you could use anacron or backup the whole system. Maybe this can be detected and BiT could behave different?

Should BiT support FAT32 and be (quite) efficient on it? This would mean to have only folders in the snapshot directory, which are rsynced. But in the GUI, it's nice to see each snapshot as a full folder set, even if only one of them was rsynced. I mean a good example is: backup (almost) the whole system monthly, but a specified folder like /home/peter/documents every hour.

On FAT32, this takes a lot of space which is wasted absolutely senseless. On ext3/ext4, hardlinking solves this problem, but is still an overhead when it comes to a lot of files.

If you say goodbye to the a-snapshot-has-all-files approach, an hourly snapshot would contain only /home/peter/documents, the monthly all files and the "latest_snapshot" also (like I wrote under answers). But of course it can get much less convenient to look for a specific file, or a extra logic is needed to support the user. Actually, this extra logic could be a soft link! I will explain my idea:

There are more than one folder configured and they have different frequencies. Let's say "/" monthly and "/home/peter/documents" hourly.

In the backintime directory, there is a folder "latest_snapshot" which has always the latest status of files which got rsync'ed. Also there are the snapshots like 20090831-204000 or so. A cronjob will do like this:

If it's the monthly run, rsync including both folders mentioned using root and the "latest_snapshot". Then copy (cp -al) it to the snapshot (with the date in the name).

If it's the hourly run, rsync only including the "/home/peter/documents", also using "latest_snapshot". Then copy (cp -al) _this_ folder to the named snapshot.

It knows, which snapshot was with the next-lower frequency (here monthly) and does soft-links of the ignored folders into the current snapshot. Softlinks have a big disadvantage: we have to be carefull with removing snapshots! If we don't do the links, the new snapshot only contains 'documents', which is ok for me and saves space on FAT32.

About the problem of non-disjoint folders: If the subfolder has a lower frequency, it can be excluded with "--exclude" and won't be rsynced. Don't do the softlinks in that case. If the subfolder has higher frequency, this shouldn't be a problem, as only the subfolder got rsynced and copied into the new snapshot.

Revision history for this message
Bart de Koning (bratdaking) wrote :
Download full text (4.0 KiB)

Sounds like a solution.
The latest_snapshot will always contain all the information of the last
taken snapshot and we store only the part that is included via hardlinks in
the folder with the snapshot-id.

Advantages:

   1. we use less space than by copying everything each time
   2. we need to have only one rsync command (the --dry-run is unnecessary)
   and
   3. we copy (hardlink) only the included folders

Disadvantage:

   1. now we have every file visible in every snapshot, it might not have
   been updated in that round but it is visible, making scrolling through the
   snapshot folders to search for that particular version you want easy. if we
   use your proposed trick a lot of the snapshot folders will show: file not
   present in this snapshot (for less frequent files)
   2. we use one complete snapshot more than before (the latest_snapshot)

We could avoid the last though, by first making the copy (hardlinking) and
give it the name of last time, and then do the rsync for the next round. (so
you make the actual snapshot folder during the next scheduled round), this
gets pretty complicated so I vote actually against that option and use the
extra space for an extra snapshot...

2009/9/3 Borph <email address hidden>

> I think we have to be clear what BackInTime should be able to do and
> which kind of program it is..
>
> It's nice that BiT can be run by any user, but as root you have more
> possibilities: you could use anacron or backup the whole system. Maybe
> this can be detected and BiT could behave different?
>
> Should BiT support FAT32 and be (quite) efficient on it? This would mean
> to have only folders in the snapshot directory, which are rsynced. But
> in the GUI, it's nice to see each snapshot as a full folder set, even if
> only one of them was rsynced. I mean a good example is: backup (almost)
> the whole system monthly, but a specified folder like
> /home/peter/documents every hour.
>
> On FAT32, this takes a lot of space which is wasted absolutely
> senseless. On ext3/ext4, hardlinking solves this problem, but is still
> an overhead when it comes to a lot of files.
>
> If you say goodbye to the a-snapshot-has-all-files approach, an hourly
> snapshot would contain only /home/peter/documents, the monthly all files
> and the "latest_snapshot" also (like I wrote under answers). But of
> course it can get much less convenient to look for a specific file, or a
> extra logic is needed to support the user. Actually, this extra logic
> could be a soft link! I will explain my idea:
>
> There are more than one folder configured and they have different
> frequencies. Let's say "/" monthly and "/home/peter/documents" hourly.
>
> In the backintime directory, there is a folder "latest_snapshot" which
> has always the latest status of files which got rsync'ed. Also there are
> the snapshots like 20090831-204000 or so. A cronjob will do like this:
>
> If it's the monthly run, rsync including both folders mentioned using
> root and the "latest_snapshot". Then copy (cp -al) it to the snapshot
> (with the date in the name).
>
> If it's the hourly run, rsync only including the
> "/home/peter/documents", also using "latest_snapshot". Then copy ...

Read more...

Revision history for this message
Borph (borph) wrote :

Hi,

> Sounds like a solution.
> The latest_snapshot will always contain all the information of the last
> taken snapshot and we store only the part that is included via hardlinks in
> the folder with the snapshot-id.

This is exactly what I meant.

> Advantages:
>   1. we use less space than by copying everything each time
No, space should be the same.

>   2. we need to have only one rsync command (the --dry-run is unnecessary)
Yes.

>   3. we copy (hardlink) only the included folders
Yes, so less hardlinks mean less fs-meta-infos and better performance.

> Disadvantage:
>   1. now we have every file visible in every snapshot, it might not have
>   been updated in that round but it is visible, making scrolling through the
>   snapshot folders to search for that particular version you want easy. if we
>   use your proposed trick a lot of the snapshot folders will show: file not
>   present in this snapshot (for less frequent files)

Yes, like I said. Big disadvantage. What I proposed to solve this,
softlinks, are really not suitable, because you can delete a snapshot
and get orphan links. So maybe we have to hardlink indeed all the
files.

>   2. we use one complete snapshot more than before (the latest_snapshot)

Yes and no. There is one more directory: latest_snapshot, but there is
no loss of space.

The very first start after configuring BiT would make the
latest_snapshot by copying all the files, sure. Then it makes its
first snapshot-id by hardlinking. So after the first start, two
snapshots exist, but only the space of one snapshot is used.

Of course, rsync has to unlink a destination file which has changed,
instead of changing the content. This would be a mess, as there are 2
links to the same file! If you think about it, this is actually the
current behaviour. BackInTime would not work correctly else! I mean
the current code!

> We could avoid the last though, by first making the copy (hardlinking) and
> give it the name of last time, and then do the rsync for the next round. (so
> you make the actual snapshot folder during the next scheduled round), this
> gets pretty complicated so I vote actually against that option and use the
> extra space for an extra snapshot...

This I don't get. Sounds complicated, and which extra space? We talk
about hardlinks or Fat32? On Fat32 each snapshot occupies the space
regarding the files in it, but the algorithm should work correctly on
both FS.

Peter

Revision history for this message
kanub (gwd0fqy02) wrote :

oops... why do i have public access anyway? :)

Changed in backintime:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Bart de Koning (bratdaking) wrote :

Because we can undo it easily :)

No because, if you like to, you also can propose a fix and also release your fix. That is why...
Although I try to keep the Fix Released status only related to fixes that are released in the official releases...

But thanks for undoing your status change

Changed in backintime:
importance: Undecided → High
description: updated
Dan (danleweb)
Changed in backintime:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.