Merge lp:~fo0bar/swoffsite/generator into lp:swoffsite

Proposed by Ryan Finnie
Status: Merged
Approved by: Stuart Bishop
Approved revision: 43
Merged at revision: 43
Proposed branch: lp:~fo0bar/swoffsite/generator
Merge into: lp:swoffsite
Diff against target: 39 lines (+7/-4)
1 file modified
swoffsite/mirror.py (+7/-4)
To merge this branch: bzr merge lp:~fo0bar/swoffsite/generator
Reviewer Review Type Date Requested Status
Stuart Bishop (community) Approve
Canonical IS Reviewers Pending
Review via email: mp+367817@code.launchpad.net

Commit message

Set full_listing=False, revert r16, change SWIFT_BATCHSIZE to 1000

Description of the change

- full_listing=False allows the rest of swift_walk's generator
  functionality to work, as full_listing=True internally walks the entire
  container before returning the full data set, causing memory exhaustion
  on large containers.
- Change SWIFT_BATCHSIZE to 1000 to avoid Swift server timeouts. Comments
  updated to point out why >10000 is futile.
- Reverting r16 allows s3_walk to be used as a generator, fixing crashes
  on large S3 buckets.

To post a comment you must log in.
Revision history for this message
Stuart Bishop (stub) wrote :

Yup.

Reducing SWIFT_BATCHSIZE probably does nothing, making an order of magnitude more calls that each return an order of magnitude less information each. I think if pauses cause timeouts between batches, those pauses are due to waiting for large file uploads to complete to S3 (at least on the librarian deploy, it is not uncommon to be blocked on 3 or 4 GB sized files uploading, providing enough disk space to continue, because I was unable to stream them directly from PGP). But we should certainly go with what you have been testing.

review: Approve
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

Change successfully merged at revision 43

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'swoffsite/mirror.py'
2--- swoffsite/mirror.py 2016-11-18 01:48:41 +0000
3+++ swoffsite/mirror.py 2019-05-23 03:25:03 +0000
4@@ -37,7 +37,10 @@
5 from .streams import SwiftStream, HashStream
6
7
8-SWIFT_BATCHSIZE = 10000 # Swift container listing batch size.
9+# Swift container listing batch size.
10+# Note that setting this to >10000 is futile, as Swift has
11+# a hard internal limit of 10000.
12+SWIFT_BATCHSIZE = 1000
13
14 CHUNK_SIZE = 512*1024
15
16@@ -103,7 +106,7 @@
17 headers, listing = con.get_container(container,
18 marker=marker,
19 limit=SWIFT_BATCHSIZE,
20- full_listing=True)
21+ full_listing=False)
22 if not listing:
23 break
24 marker = listing[-1]['name']
25@@ -124,12 +127,12 @@
26 if b is None:
27 con.create_bucket(bucket)
28 return
29- for prefix in list(b.list(delimiter='/')):
30+ for prefix in b.list(delimiter='/'):
31 if not match(include_globs, exclude_globs, prefix.name[:-1]):
32 log.debug('Skipping S3 prefix {}'.format(prefix.name))
33 else:
34 log.info('Listing {} in S3'.format(prefix.name[:-1]))
35- for s3_key in list(b.list(prefix=prefix.name)):
36+ for s3_key in b.list(prefix=prefix.name):
37 container, name, _ = parse_s3_name(s3_key.name)
38 if match(include_globs, exclude_globs, container, name):
39 log.debug('S3 {} matched'.format(s3_key.name))

Subscribers

People subscribed via source and target branches

to all changes: