Merge lp:~weyrick/bzr/54624-warn-on-large-files into lp:bzr

Proposed by Shannon Weyrick
Status: Merged
Approved by: Jonathan Riddell
Approved revision: no longer in the source branch.
Merged at revision: 6086
Proposed branch: lp:~weyrick/bzr/54624-warn-on-large-files
Merge into: lp:bzr
Diff against target: 336 lines (+169/-10)
9 files modified
bzrlib/add.py (+44/-0)
bzrlib/builtins.py (+5/-1)
bzrlib/config.py (+40/-0)
bzrlib/help_topics/en/configuration.txt (+8/-0)
bzrlib/mutabletree.py (+13/-6)
bzrlib/osutils.py (+6/-3)
bzrlib/tests/blackbox/test_add.py (+24/-0)
bzrlib/tests/test_config.py (+20/-0)
doc/en/release-notes/bzr-2.5.txt (+9/-0)
To merge this branch: bzr merge lp:~weyrick/bzr/54624-warn-on-large-files
Reviewer Review Type Date Requested Status
martin suchanek (community) Needs Fixing
Jonathan Riddell (community) Needs Fixing
Jelmer Vernooij (community) Needs Information
Review via email: mp+70691@code.launchpad.net

Commit message

bzr add now skips large files in recursive mode. The default "large"
size is 20MB, and is configurable via the add.maximum_file_size
option. A value of 0 disables skipping. Named items passed to add are
never skipped. (Shannon Weyrick, #54624)

Description of the change

This branch proposes to add the feature requested in bug #54624: to skip (with warning) on large file addition, with configurable "large" threshold.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

1MB might be a little small :)

Revision history for this message
Shannon Weyrick (weyrick) wrote :

I thought about this, and in the end I did 1 MB simply because that's
what the original bug mentioned. I wonder, what's the best way to get
a feel about a reasonable default from the current user base? Mailing
list?

On Sun, Aug 7, 2011 at 8:37 PM, Robert Collins
<email address hidden> wrote:
> 1MB might be a little small :)
>
> --
> https://code.launchpad.net/~weyrick/bzr/54624-warn-on-large-files/+merge/70691
> You are the owner of lp:~weyrick/bzr/54624-warn-on-large-files.
>

Revision history for this message
Shannon Weyrick (weyrick) wrote :

A nice way to figure out the default would be to get a histogram of file sizes in the bzr repos hosted on launchpad. Then we could pick something towards the top end of the decline in the curve. I don't suppose that info is readily available....

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, Aug 8, 2011 at 1:37 PM, Shannon Weyrick <email address hidden> wrote:
> A nice way to figure out the default would be to get a histogram of file sizes in the bzr repos hosted on launchpad. Then we could pick something towards the top end of the decline in the curve. I don't suppose that info is readily available....

Not preprocessed, but the vast majority of branches are public, so you
should be able to use bzrlib's API to read the inventories of those
branches - just the tip revision might be sufficient.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

Thanks for this Shannon, I'm sure it will save some users some grief.

I think the default 'large' should probably be set at the kind of size
where there's a fair chance people would regret adding it, because of
the impact on disk/memory usage/transfer time. 1MB shouldn't be a
problem for any of them and I think it's fairly likely people will
have assets that big (if not actual source files.) Computers are
bigger than they were when Vesta was new...

I think say 50MB would be nearer the mark.

I don't think it's especially worth scanning Launchpad branches unless
you really want to. You might just as well look in your own home
directory or perhaps ask on the list.

As far as implementation:

+ _DEFAULT_LARGE_FILE_THRESHOLD = 1<<20; # 1 MB

[nit] we don't normally have semicolons at the end of statements

[optional] A lot of _large_file_threshhold really could be generic
configuration code for "a non-negative integer with a default". You
could at least hoist this out; eventually it might be good if the
default was declared along with the option name.

[fix] Our convention now is that option names are grouped into the
area they relate to; I think this is only for add so it should be
probably add.large_file_threshhold; perhaps we can also make it more
obvious as something like add.maximum_file_size. (What do you think?)

+ os.path.getsize(abspath) > large_threshold):

[optional] This will do another system call to get the size and we
probably already have it and can use the cached value. But it will be
a cheap call, since the inode should already in cache.

+ "large_file_threshold of %i bytes)",

[tweak] For some reason we always say %d not %i.

[fix] This ought to be described in help_topics/en/configuration.txt

[fix] This probably also needs to be in the user documentation; I'm
not sure off hand where would be most appropriate.

Martin

Revision history for this message
Shannon Weyrick (weyrick) wrote :

> Thanks for this Shannon, I'm sure it will save some users some grief.
>
> I think the default 'large' should probably be set at the kind of size
> where there's a fair chance people would regret adding it, because of
> the impact on disk/memory usage/transfer time. 1MB shouldn't be a
> problem for any of them and I think it's fairly likely people will
> have assets that big (if not actual source files.) Computers are
> bigger than they were when Vesta was new...
>
> I think say 50MB would be nearer the mark.
>
> I don't think it's especially worth scanning Launchpad branches unless
> you really want to. You might just as well look in your own home
> directory or perhaps ask on the list.
>

I agree 1 MB is almost definitely too low, otoh 50 sounds high to me. I was thinking something on the order of 10-20MB - that almost definitely covers all source files and most normal resources from web projects as well. But, I'm not stuck on that number.

What would be real handy here is the first time this limit was reached, it could prompt the user if they wanted to raise/set it. Is there infrastructure in place in the ui for that kind of thing?

> As far as implementation:
>
> + _DEFAULT_LARGE_FILE_THRESHOLD = 1<<20; # 1 MB
>
> [nit] we don't normally have semicolons at the end of statements
>

np. obviously my c-like lang background showing here :)

> [optional] A lot of _large_file_threshhold really could be generic
> configuration code for "a non-negative integer with a default". You
> could at least hoist this out; eventually it might be good if the
> default was declared along with the option name.
>

Actually, I think what we really want is to not force the user to enter an integer for this, but rather "20M" or "1G". I could whip something up for this and stick it in Config.

Is there a central place defaults/option names are defined? I didn't see one.

> [fix] Our convention now is that option names are grouped into the
> area they relate to; I think this is only for add so it should be
> probably add.large_file_threshhold; perhaps we can also make it more
> obvious as something like add.maximum_file_size. (What do you think?)
>
> + os.path.getsize(abspath) > large_threshold):
>

Yes, maximum_file_size sounds better.

> [optional] This will do another system call to get the size and we
> probably already have it and can use the cached value. But it will be
> a cheap call, since the inode should already in cache.
>

I thought of this as well, and thought it should be cached on some level as the size is/will be used in the add. But I'll investigate using the bzr level cache - which I think I saw in the tree? I'll check.

> + "large_file_threshold of %i bytes)",
>
> [tweak] For some reason we always say %d not %i.
>
> [fix] This ought to be described in help_topics/en/configuration.txt
>
> [fix] This probably also needs to be in the user documentation; I'm
> not sure off hand where would be most appropriate.
>

No prob on these items.

Thanks for the help! I'll work on these changes as I have time over the next few days.

Shannon

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Should this be this close to the core? I wonder if it makes more sense as a UI-level thing that's just part of "bzr add".

With the current approach, we'd have to introduce workarounds for things like the UDD importer, but I presume there will also be a fair number of packages for which "bzr merge-upstream" will now fail.

It also means all custom tree implementations will have to explicitly add support.

Revision history for this message
Jelmer Vernooij (jelmer) :
review: Needs Information
Revision history for this message
Martin Pool (mbp) wrote :

@jelmer Of course you're right, i'm glad you caught this. Perhaps the interface by which add reports back to the ui (I think there is a result or something) would be better.

Revision history for this message
Martin Pool (mbp) wrote :

On 9 August 2011 01:01, Shannon Weyrick <email address hidden> wrote:
>> Thanks for this Shannon, I'm sure it will save some users some grief.
>>
>> I think the default 'large' should probably be set at the kind of size
>> where there's a fair chance people would regret adding it, because of
>> the impact on disk/memory usage/transfer time.  1MB shouldn't be a
>> problem for any of them and I think it's fairly likely people will
>> have assets that big (if not actual source files.)  Computers are
>> bigger than they were when Vesta was new...
>>
>> I think say 50MB would be nearer the mark.
>>
>> I don't think it's especially worth scanning Launchpad branches unless
>> you really want to.  You might just as well look in your own home
>> directory or perhaps ask on the list.
>>
>
> I agree 1 MB is almost definitely too low, otoh 50 sounds high to me. I was thinking something on the order of 10-20MB - that almost definitely covers all source files and most normal resources from web projects as well. But, I'm not stuck on that number.

10-20MB is also fine with me.

> What would be real handy here is the first time this limit was reached, it could prompt the user if they wanted to raise/set it. Is there infrastructure in place in the ui for that kind of thing?

There is some infrastructure towards that - actually this reminds me
of something I should have said before, which is that rather than
trace.warning you should use ui_factory.warning - that takes a
structure identifier of the type of warning, with the idea that
eventually we can give the user an option to not show the warning any
more, or perhaps to change it.

> Actually, I think what we really want is to not force the user to enter an integer for this, but rather "20M" or "1G". I could whip something up for this and stick it in Config.

That would be great.

> Is there a central place defaults/option names are defined? I didn't see one.

There is option_registry in config.py.

Revision history for this message
Robert Collins (lifeless) wrote :

On performance: we've found add to be pretty sensitive to syscall
volumes these days; I would personally want a test on a 50K file tree
before discounting the impact of additional syscalls.

Revision history for this message
Shannon Weyrick (weyrick) wrote :

Ok, thanks for the feedback everyone. To summarize then, here are my goals for completing this:

1) Ensure that skipping a large file doesn't happen directly in MutableTree.smart_add, but rather find a way to have the skip take place in the ui. This part may need more discussion - clearly we don't want to scan files in the ui _and_ in smart_add, but the latter is where all the scanning takes place now. AFAICT, by the time control returns to the ui, the adds have taken place. Only a count of files added is returned right now. Even if it returned the full add list, it would have to make calls to unadd/revert them at that point, I believe. Doesn't seem ideal, but let me know if that's desirable. Otherwise, I'm thinking either an optional callback of some sort (maybe similar to the existing AddAction - i.e. a new SkipAction?), or possibly making some code from _SmartAddHelper more central. Other ideas from those more knowledgeable with the codebase obviously welcome here.

2) Ensure that the stat for filesize is cached. Benchmark on large import.

3) Add a simple parser for specifying values in "human readable" sizes (i.e. "500M") to Config

4) Default limit to 20M

5) Possibly prompt user if the limit is hit and they haven't overridden the default

6) Use config name "add.maximum_file_size"

7) Document in configuration.txt and user docs

Revision history for this message
Shannon Weyrick (weyrick) wrote :

All the goals have been met in the current patch, except 5 and 2. 5 I'm not doing for now. 2 is a larger problem - some informal tests importing large trees show about 23% slowdown with the extra stat. Unfortunately I don't see a simple path towards sharing stat calls - they seem to be spread throughout the code now with no central way to cache. I'm looking into this further now.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/12/2011 4:55 PM, Shannon Weyrick wrote:
> All the goals have been met in the current patch, except 5 and 2. 5
> I'm not doing for now. 2 is a larger problem - some informal tests
> importing large trees show about 23% slowdown with the extra stat.
> Unfortunately I don't see a simple path towards sharing stat calls -
> they seem to be spread throughout the code now with no central way to
> cache. I'm looking into this further now.
>

Is it possible to put it in as part of get_file_with_stat or something
along those lines? That uses 'fstat' to make sure to stat the object
that is already in memory.

I do realize that add does things slightly differently, however we
already have the stat object around there, because we need to know if
the object is a file or a directory.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5FQbAACgkQJdeBCYSNAAOUoQCfXHL1daxOCO0opAic7nLjiMu8
GwQAn0xfkHedGsN3/DXM8/0w8kK5oxfa
=S28e
-----END PGP SIGNATURE-----

Revision history for this message
Shannon Weyrick (weyrick) wrote :

>
> Is it possible to put it in as part of get_file_with_stat or something
> along those lines? That uses 'fstat' to make sure to stat the object
> that is already in memory.
>
> I do realize that add does things slightly differently, however we
> already have the stat object around there, because we need to know if
> the object is a file or a directory.
>

Yes currently there's a stat to determine the file kind, which is using osutils.file_kind. Actually, it only does this if the inventory entry didn't have the kind, but I haven't hit a case where that's true in my tests. But I don't see a trivial way to have the file size check share the stat with the one from file_kind ... So it seems add would have to use something other than osutils.file_kind. If get_file_with_stat will work for both, that may do it. I'll take a look, thanks.

Shannon

Revision history for this message
Shannon Weyrick (weyrick) wrote :

get_file_with_stat did not seem applicable, because it required a file_id which was not yet available since the file isn't added yet. Instead, I added a file_stat function (which may potentially cache results, but doesn't currently), and in turn made file_kind use this. _SmartAddHelper.add now uses file_stat which can pass the results on to the new AddAction. Therefore, there is a single stat call and my tests on a large tree (linux kernel, ~39k files) show these changes to be only about .01 - .03 seconds slower than bzr.dev on my system.

So, I believe all goals have been met, please review.

Shannon

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Shannon,

Thanks for you patience with this. I think this is broadly ok, though I have a few more minor nitpicks:

We don't use camel casing for method names, so skipFile should be skip_file.

I'm not sure if it's part of the syntax guide, but we generally don't add parentheses where they aren't necessary, e.g. in if statements.

Please don't use relative imports - i.e. "from bzrlib import ui" rather than "import ui".

except without arguments can easily hide bugs (it also catches e.g. NameError) - or "eat" things like KeyboardInterrupts, making it impossible to use Ctrl+C. Please catch specific exceptions instead.

Revision history for this message
Shannon Weyrick (weyrick) wrote :

Jelmer, thanks I've made those changes. Let me know if there's something else.
Shannon

Revision history for this message
Jonathan Riddell (jr) wrote :

Unfortunately this doesn't work if you specify the file name on the bzr add command, in this case the code path seems to go through mutabletree.py _SmartAddHelper._add_one_and_parent().

Also I think there should be a command line way to force adding large files (although that don't need to be part of this patch).

review: Needs Fixing
Revision history for this message
Shannon Weyrick (weyrick) wrote :

Hi Jonathan,

I think your second comment is solved by the first. By that I mean, I
believe if you specify the file on the command line, it should never
be skipped because it's large because that's never what you'd want
(actually maybe a tab completion mistake, but that's not likely
enough). Therefore, an effective way to force adding a large file is
simply to specify it on the command line.

So if you had said "bzr add ." and it told you it skipped a few files,
you could follow it up with "bzr add foo bar" to get the large ones
you really wanted. Seems like a decent work flow to me, and solves the
problem of accidentally adding large files during a recursive import,
while still allowing flexibility with both the config option and the
explicit adds.

Shannon

On Wed, Aug 17, 2011 at 7:41 AM, Jonathan Riddell <email address hidden> wrote:
> Review: Needs Fixing
> Unfortunately this doesn't work if you specify the file name on the bzr add command, in this case the code path seems to go through mutabletree.py  _SmartAddHelper._add_one_and_parent().
>
> Also I think there should be a command line way to force adding large files (although that don't need to be part of this patch).
>
> --
> https://code.launchpad.net/~weyrick/bzr/54624-warn-on-large-files/+merge/70691
> You are the owner of lp:~weyrick/bzr/54624-warn-on-large-files.
>

Revision history for this message
Jonathan Riddell (jr) wrote :

sent to pqm by email

Revision history for this message
Jonathan Riddell (jr) wrote :

sent to pqm by email

Revision history for this message
kesten broughton (dathomir) wrote :

Hi all,
I got bit by this bug, so thanks a lot for fixing it. A couple of thoughts from a newbie:
On file size, the problem might not be one large file - 50 Mb, but 100 1Mb files, say in a photo album.

As i thought about it, i wonder if the large multi-media files which might be assets for a code project belong in a code revision repo. You don't really change photos or movies the way you would code and none of the merging, etc works for video. Perhaps a tutorial suggesting different ways to back up non-code components of a project would be useful.

On the other hand, documentation and video tutorials might belong in a code dev repository. I guess I'm wondering if anyone has thought this through and come up with an elegant solution.

thanks again all,

kesten

Revision history for this message
martin suchanek (martin-suc) wrote :

Hi,

I am newbie. Could please anybody tell me how/where to setup add.maximum_file_size
because I have too many errors like following:
bzr: warning: skipping /etc/apache2/core.17269 (larger than add.maximum_file_size of 20000000 bytes)
files are bigger than 65 Mbytes.

thank you,
kind regards,
M.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

'bzr add -Oadd.maximum_file_size=66MB' should do it.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/add.py'
2--- bzrlib/add.py 2011-06-14 02:21:41 +0000
3+++ bzrlib/add.py 2011-08-19 22:28:34 +0000
4@@ -17,9 +17,11 @@
5 """Helper functions for adding files to working trees."""
6
7 import sys
8+import os
9
10 from bzrlib import (
11 osutils,
12+ ui,
13 )
14
15
16@@ -53,6 +55,48 @@
17 self._to_file.write('adding %s\n' % _quote(path))
18 return None
19
20+ def skip_file(self, tree, path, kind, stat_value = None):
21+ """Test whether the given file should be skipped or not.
22+
23+ The default action never skips. Note this is only called during
24+ recursive adds
25+
26+ :param tree: The tree we are working in
27+ :param path: The path being added
28+ :param kind: The kind of object being added.
29+ :param stat: Stat result for this file, if available already
30+ :return bool. True if the file should be skipped (not added)
31+ """
32+ return False
33+
34+
35+class AddWithSkipLargeAction(AddAction):
36+ """A class that can decide to skip a file if it's considered too large"""
37+
38+ # default 20 MB
39+ _DEFAULT_MAX_FILE_SIZE = 20000000
40+ _optionName = 'add.maximum_file_size'
41+ _maxSize = None
42+
43+ def skip_file(self, tree, path, kind, stat_value = None):
44+ if kind != 'file':
45+ return False
46+ if self._maxSize is None:
47+ config = tree.branch.get_config()
48+ self._maxSize = config.get_user_option_as_int_from_SI(
49+ self._optionName,
50+ self._DEFAULT_MAX_FILE_SIZE)
51+ if stat_value is None:
52+ file_size = os.path.getsize(path);
53+ else:
54+ file_size = stat_value.st_size;
55+ if self._maxSize > 0 and file_size > self._maxSize:
56+ ui.ui_factory.show_warning(
57+ "skipping %s (larger than %s of %d bytes)" %
58+ (path, self._optionName, self._maxSize))
59+ return True
60+ return False
61+
62
63 class AddFromBaseAction(AddAction):
64 """This class will try to extract file ids from another tree."""
65
66=== modified file 'bzrlib/builtins.py'
67--- bzrlib/builtins.py 2011-08-18 04:23:06 +0000
68+++ bzrlib/builtins.py 2011-08-19 22:28:34 +0000
69@@ -674,6 +674,10 @@
70
71 Any files matching patterns in the ignore list will not be added
72 unless they are explicitly mentioned.
73+
74+ In recursive mode, files larger than the configuration option
75+ add.maximum_file_size will be skipped. Named items are never skipped due
76+ to file size.
77 """
78 takes_args = ['file*']
79 takes_options = [
80@@ -706,7 +710,7 @@
81 action = bzrlib.add.AddFromBaseAction(base_tree, base_path,
82 to_file=self.outf, should_print=(not is_quiet()))
83 else:
84- action = bzrlib.add.AddAction(to_file=self.outf,
85+ action = bzrlib.add.AddWithSkipLargeAction(to_file=self.outf,
86 should_print=(not is_quiet()))
87
88 if base_tree:
89
90=== modified file 'bzrlib/config.py'
91--- bzrlib/config.py 2011-08-16 15:12:39 +0000
92+++ bzrlib/config.py 2011-08-19 22:28:34 +0000
93@@ -75,6 +75,7 @@
94 import os
95 import string
96 import sys
97+import re
98
99
100 from bzrlib.decorators import needs_write_lock
101@@ -413,6 +414,45 @@
102 # add) the final ','
103 l = [l]
104 return l
105+
106+ def get_user_option_as_int_from_SI(self, option_name, default=None):
107+ """Get a generic option from a human readable size in SI units, e.g 10MB
108+
109+ Accepted suffixes are K,M,G. It is case-insensitive and may be followed
110+ by a trailing b (i.e. Kb, MB). This is intended to be practical and not
111+ pedantic.
112+
113+ :return Integer, expanded to its base-10 value if a proper SI unit is
114+ found. If the option doesn't exist, or isn't a value in
115+ SI units, return default (which defaults to None)
116+ """
117+ val = self.get_user_option(option_name)
118+ if isinstance(val, list):
119+ val = val[0]
120+ if val is None:
121+ val = default
122+ else:
123+ p = re.compile("^(\d+)([kmg])*b*$", re.IGNORECASE)
124+ try:
125+ m = p.match(val)
126+ if m is not None:
127+ val = int(m.group(1))
128+ if m.group(2) is not None:
129+ if m.group(2).lower() == 'k':
130+ val *= 10**3
131+ elif m.group(2).lower() == 'm':
132+ val *= 10**6
133+ elif m.group(2).lower() == 'g':
134+ val *= 10**9
135+ else:
136+ ui.ui_factory.show_warning('Invalid config value for "%s" '
137+ ' value %r is not an SI unit.'
138+ % (option_name, val))
139+ val = default
140+ except TypeError:
141+ val = default
142+ return val
143+
144
145 def gpg_signing_command(self):
146 """What program should be used to sign signatures?"""
147
148=== modified file 'bzrlib/help_topics/en/configuration.txt'
149--- bzrlib/help_topics/en/configuration.txt 2011-08-16 15:12:39 +0000
150+++ bzrlib/help_topics/en/configuration.txt 2011-08-19 22:28:34 +0000
151@@ -628,6 +628,14 @@
152 If present, defines the ``--strict`` option default value for checking
153 uncommitted changes before sending a merge directive.
154
155+add.maximum_file_size
156+~~~~~~~~~~~~~~~~~~~~~
157+
158+Defines the maximum file size the command line "add" operation will allow
159+in recursive mode, with files larger than this value being skipped. You may
160+specify this value as an integer (in which case it is interpreted as bytes),
161+or you may specify the value using SI units, i.e. 10KB, 20MB, 1G. A value of 0
162+will disable skipping.
163
164 External Merge Tools
165 --------------------
166
167=== modified file 'bzrlib/mutabletree.py'
168--- bzrlib/mutabletree.py 2011-07-23 16:33:38 +0000
169+++ bzrlib/mutabletree.py 2011-08-19 22:28:34 +0000
170@@ -582,8 +582,9 @@
171 :param parent_ie: Parent inventory entry if known, or None. If
172 None, the parent is looked up by name and used if present, otherwise it
173 is recursively added.
174+ :param path:
175 :param kind: Kind of new entry (file, directory, etc)
176- :param action: callback(tree, parent_ie, path, kind); can return file_id
177+ :param inv_path:
178 :return: Inventory entry for path and a list of paths which have been added.
179 """
180 # Nothing to do if path is already versioned.
181@@ -628,7 +629,7 @@
182 if (prev_dir is None or not is_inside([prev_dir], path)):
183 yield (path, inv_path, this_ie, None)
184 prev_dir = path
185-
186+
187 def __init__(self, tree, action, conflicts_related=None):
188 self.tree = tree
189 if action is None:
190@@ -695,12 +696,18 @@
191
192 # get the contents of this directory.
193
194- # find the kind of the path being added.
195+ # find the kind of the path being added, and save stat_value
196+ # for reuse
197+ stat_value = None
198 if this_ie is None:
199- kind = osutils.file_kind(abspath)
200+ stat_value = osutils.file_stat(abspath)
201+ kind = osutils.file_kind_from_stat_mode(stat_value.st_mode)
202 else:
203 kind = this_ie.kind
204-
205+
206+ # allow AddAction to skip this file
207+ if self.action.skip_file(self.tree, abspath, kind, stat_value):
208+ continue
209 if not InventoryEntry.versionable_kind(kind):
210 trace.warning("skipping %s (can't add file of kind '%s')",
211 abspath, kind)
212@@ -769,7 +776,7 @@
213 # recurse into this already versioned subdir.
214 things_to_add.append((subp, sub_invp, sub_ie, this_ie))
215 else:
216- # user selection overrides ignoes
217+ # user selection overrides ignores
218 # ignore while selecting files - if we globbed in the
219 # outer loop we would ignore user files.
220 ignore_glob = self.tree.is_ignored(subp)
221
222=== modified file 'bzrlib/osutils.py'
223--- bzrlib/osutils.py 2011-08-12 12:18:34 +0000
224+++ bzrlib/osutils.py 2011-08-19 22:28:34 +0000
225@@ -2178,15 +2178,18 @@
226 return file_kind_from_stat_mode(mode)
227 file_kind_from_stat_mode = file_kind_from_stat_mode_thunk
228
229-
230-def file_kind(f, _lstat=os.lstat):
231+def file_stat(f, _lstat=os.lstat):
232 try:
233- return file_kind_from_stat_mode(_lstat(f).st_mode)
234+ # XXX cache?
235+ return _lstat(f)
236 except OSError, e:
237 if getattr(e, 'errno', None) in (errno.ENOENT, errno.ENOTDIR):
238 raise errors.NoSuchFile(f)
239 raise
240
241+def file_kind(f, _lstat=os.lstat):
242+ stat_value = file_stat(f, _lstat)
243+ return file_kind_from_stat_mode(stat_value.st_mode)
244
245 def until_no_eintr(f, *a, **kw):
246 """Run f(*a, **kw), retrying if an EINTR error occurs.
247
248=== modified file 'bzrlib/tests/blackbox/test_add.py'
249--- bzrlib/tests/blackbox/test_add.py 2011-07-11 06:47:32 +0000
250+++ bzrlib/tests/blackbox/test_add.py 2011-08-19 22:28:34 +0000
251@@ -239,3 +239,27 @@
252 out, err = self.run_bzr(["add", "a", "b"], working_dir=u"\xA7")
253 self.assertEquals(out, "adding a\n" "adding b\n")
254 self.assertEquals(err, "")
255+
256+ def test_add_skip_large_files(self):
257+ """Test skipping files larger than add.maximum_file_size"""
258+ tree = self.make_branch_and_tree('.')
259+ self.build_tree(['small.txt', 'big.txt', 'big2.txt'])
260+ self.build_tree_contents([('small.txt', '0\n')])
261+ self.build_tree_contents([('big.txt', '01234567890123456789\n')])
262+ self.build_tree_contents([('big2.txt', '01234567890123456789\n')])
263+ tree.branch.get_config().set_user_option('add.maximum_file_size', 5)
264+ out = self.run_bzr('add')[0]
265+ results = sorted(out.rstrip('\n').split('\n'))
266+ self.assertEquals(['adding small.txt'],
267+ results)
268+ # named items never skipped, even if over max
269+ out, err = self.run_bzr(["add", "big2.txt"])
270+ results = sorted(out.rstrip('\n').split('\n'))
271+ self.assertEquals(['adding big2.txt'],
272+ results)
273+ self.assertEquals(err, "")
274+ tree.branch.get_config().set_user_option('add.maximum_file_size', 30)
275+ out = self.run_bzr('add')[0]
276+ results = sorted(out.rstrip('\n').split('\n'))
277+ self.assertEquals(['adding big.txt'],
278+ results)
279
280=== modified file 'bzrlib/tests/test_config.py'
281--- bzrlib/tests/test_config.py 2011-08-12 14:07:21 +0000
282+++ bzrlib/tests/test_config.py 2011-08-19 22:28:34 +0000
283@@ -1035,6 +1035,26 @@
284 # automatically cast to list
285 self.assertEqual(['x'], get_list('one_item'))
286
287+ def test_get_user_option_as_int_from_SI(self):
288+ conf, parser = self.make_config_parser("""
289+plain = 100
290+si_k = 5k,
291+si_kb = 5kb,
292+si_m = 5M,
293+si_mb = 5MB,
294+si_g = 5g,
295+si_gb = 5gB,
296+""")
297+ get_si = conf.get_user_option_as_int_from_SI
298+ self.assertEqual(100, get_si('plain'))
299+ self.assertEqual(5000, get_si('si_k'))
300+ self.assertEqual(5000, get_si('si_kb'))
301+ self.assertEqual(5000000, get_si('si_m'))
302+ self.assertEqual(5000000, get_si('si_mb'))
303+ self.assertEqual(5000000000, get_si('si_g'))
304+ self.assertEqual(5000000000, get_si('si_gb'))
305+ self.assertEqual(None, get_si('non-exist'))
306+ self.assertEqual(42, get_si('non-exist-with-default', 42))
307
308 class TestSupressWarning(TestIniConfig):
309
310
311=== modified file 'doc/en/release-notes/bzr-2.5.txt'
312--- doc/en/release-notes/bzr-2.5.txt 2011-08-18 04:23:06 +0000
313+++ doc/en/release-notes/bzr-2.5.txt 2011-08-19 22:28:34 +0000
314@@ -78,6 +78,11 @@
315 * Relative local paths can now be specified in URL syntax by using the
316 "file:" prefix. (Jelmer Vernooij)
317
318+* bzr add now skips large files in recursive mode. The default "large"
319+ size is 20MB, and is configurable via the add.maximum_file_size
320+ option. A value of 0 disables skipping. Named items passed to add are
321+ never skipped. (Shannon Weyrick, #54624)
322+
323 Improvements
324 ************
325
326@@ -169,6 +174,10 @@
327 no longer support it.
328 (Martin Pool)
329
330+* New method ``Config.get_user_option_as_int_from_SI`` added for expanding a
331+ value in SI format (i.e. "20MB", "1GB") into its integer equivalent.
332+ (Shannon Weyrick)
333+
334 * ``Transport`` now has a ``_parsed_url`` attribute instead of
335 separate ``_user``, ``_password``, ``_port``, ``_scheme``, ``_host``
336 and ``_path`` attributes. Proxies are provided for the moment but