Swoffsite

Merge lp:~fo0bar/swoffsite/generator into lp:swoffsite

generator
Merge into trunk

Proposed by Ryan Finnie on 2019-05-23

Status:	Merged
Approved by:	Stuart Bishop on 2019-05-23
Approved revision:	43
Merged at revision:	43
Proposed branch:	lp:~fo0bar/swoffsite/generator
Merge into:	lp:swoffsite
Diff against target:	39 lines (+7/-4) 1 file modified swoffsite/mirror.py (+7/-4)
To merge this branch:	bzr merge lp:~fo0bar/swoffsite/generator
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Stuart Bishop (community)	2019-05-23	Approve on 2019-05-23
Canonical IS Reviewers	2019-05-23	Pending
Review via email: mp+367817@code.launchpad.net

Commit message

Set full_listing=False, revert r16, change SWIFT_BATCHSIZE to 1000

Description of the change

- full_listing=False allows the rest of swift_walk's generator
  functionality to work, as full_listing=True internally walks the entire
  container before returning the full data set, causing memory exhaustion
  on large containers.
- Change SWIFT_BATCHSIZE to 1000 to avoid Swift server timeouts. Comments
  updated to point out why >10000 is futile.
- Reverting r16 allows s3_walk to be used as a generator, fixing crashes
  on large S3 buckets.

Revision history for this message

Stuart Bishop (stub) wrote on 2019-05-23:

Yup.

Reducing SWIFT_BATCHSIZE probably does nothing, making an order of magnitude more calls that each return an order of magnitude less information each. I think if pauses cause timeouts between batches, those pauses are due to waiting for large file uploads to complete to S3 (at least on the librarian deploy, it is not uncommon to be blocked on 3 or 4 GB sized files uploading, providing enough disk space to continue, because I was unable to stream them directly from PGP). But we should certainly go with what you have been testing.

review: Approve

Revision history for this message

🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote on 2019-05-23:

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message

🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote on 2019-05-23:

Change successfully merged at revision 43

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Ryan Finnie

1	=== modified file 'swoffsite/mirror.py'
2	--- swoffsite/mirror.py 2016-11-18 01:48:41 +0000
3	+++ swoffsite/mirror.py 2019-05-23 03:25:03 +0000
4	@@ -37,7 +37,10 @@
5	from .streams import SwiftStream, HashStream
6
7
8	-SWIFT_BATCHSIZE = 10000 # Swift container listing batch size.
9	+# Swift container listing batch size.
10	+# Note that setting this to >10000 is futile, as Swift has
11	+# a hard internal limit of 10000.
12	+SWIFT_BATCHSIZE = 1000
13
14	CHUNK_SIZE = 512*1024
15
16	@@ -103,7 +106,7 @@
17	headers, listing = con.get_container(container,
18	marker=marker,
19	limit=SWIFT_BATCHSIZE,
20	- full_listing=True)
21	+ full_listing=False)
22	if not listing:
23	break
24	marker = listing[-1]['name']
25	@@ -124,12 +127,12 @@
26	if b is None:
27	con.create_bucket(bucket)
28	return
29	- for prefix in list(b.list(delimiter='/')):
30	+ for prefix in b.list(delimiter='/'):
31	if not match(include_globs, exclude_globs, prefix.name[:-1]):
32	log.debug('Skipping S3 prefix {}'.format(prefix.name))
33	else:
34	log.info('Listing {} in S3'.format(prefix.name[:-1]))
35	- for s3_key in list(b.list(prefix=prefix.name)):
36	+ for s3_key in b.list(prefix=prefix.name):
37	container, name, _ = parse_s3_name(s3_key.name)
38	if match(include_globs, exclude_globs, container, name):
39	log.debug('S3 {} matched'.format(s3_key.name))