Merge lp:~adeuring/juju-ci-tools/backup-to-s3 into lp:juju-ci-tools

Proposed by Abel Deuring
Status: Merged
Merged at revision: 588
Proposed branch: lp:~adeuring/juju-ci-tools/backup-to-s3
Merge into: lp:juju-ci-tools
Diff against target: 76 lines (+72/-0)
1 file modified
backup-to-s3.py (+72/-0)
To merge this branch: bzr merge lp:~adeuring/juju-ci-tools/backup-to-s3
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code Approve
Review via email: mp+228296@code.launchpad.net

Description of the change

A new script to backup Jenkins data to S3.

The main idea is to archive the whole home directory to S3, except those files and directories where we know that they are not important or "archived" as Bazaar branches.

Workspace and build data of jobs are also excluded: Most build information is already stored on S3.

There is a special rule for "jobs/disabled-repository" -- As I understand it, this is not a real job.

The first rule, "exclude hidden files/dirs in the home directory", may omit some important files. On the other hand, archiving for example .ssh/ may be questionable since it contains sensitive data.

To post a comment you must log in.
Revision history for this message
Curtis Hovey (sinzui) wrote :

Thank you Abel.

I don't want .ssh backed up. The keys in .ssh came from cloud-city. Maybe we want to move .ssh/config to cloud-city too.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'backup-to-s3.py'
2--- backup-to-s3.py 1970-01-01 00:00:00 +0000
3+++ backup-to-s3.py 2014-07-25 13:02:30 +0000
4@@ -0,0 +1,72 @@
5+#!/usr/bin/env python
6+"""Backup Jenkins data to S3 and remove old backups."""
7+
8+from datetime import datetime
9+import os
10+import re
11+import subprocess
12+
13+
14+MAX_BACKUPS = 10
15+BACKUP_URL = 's3://juju-qa-data/juju-ci/backups/'
16+# Exclude hidden files in the home directory, workspace and build data
17+# of jobs, caches and Bazaar repositories.
18+BACKUP_PARAMS = [
19+ r'--rexclude=^\.',
20+ '--rexclude=^jobs/.*?/workspace/',
21+ '--rexclude=^jobs/.*?/builds/',
22+ '--rexclude=^jobs/disabled-repository',
23+ '--rexclude=^local-tools-cache/',
24+ '--rexclude=^ci-director/',
25+ '--rexclude=^cloud-city/',
26+ '--rexclude=^failure-emails/',
27+ '--rexclude=^juju-ci-tools/',
28+ '--rexclude=^juju-release-tools/',
29+ '--rexclude=^repository',
30+ ]
31+
32+
33+def s3_cmd(params, drop_output=False):
34+ s3cfg_path = os.path.join(
35+ os.environ['HOME'], 'cloud-city/juju-qa.s3cfg')
36+ if drop_output:
37+ return subprocess.check_call(
38+ ['s3cmd', '-c', s3cfg_path] + params, stdout=open('/dev/null', 'w'))
39+ else:
40+ return subprocess.check_output(
41+ ['s3cmd', '-c', s3cfg_path] + params)
42+
43+
44+def current_backups():
45+ """Return a list of S3 URLs of existing backups."""
46+ # We expect lines like
47+ # " DIR s3://juju-qa-data/juju-ci/backups/2014-07-25/"
48+ result = []
49+ for line in s3_cmd(['ls', BACKUP_URL]).split('\n'):
50+ mo = re.search(r'^\s+DIR\s+(%s\d\d\d\d-\d\d-\d\d/)$' % BACKUP_URL, line)
51+ if mo is None:
52+ continue
53+ url = mo.group(1)
54+ result.append(url)
55+ return sorted(result)
56+
57+
58+def run_backup(url):
59+ s3_cmd(['sync', '.', url] + BACKUP_PARAMS, drop_output=True)
60+
61+
62+def remove_backups(urls):
63+ if urls:
64+ s3_cmd(['del', '-r'] + urls, drop_output=True)
65+
66+
67+if __name__ == '__main__':
68+ all_backups = current_backups()
69+ today = datetime.now().strftime('%Y-%m-%d')
70+ todays_url = '%s%s/' % (BACKUP_URL, today)
71+ if todays_url in all_backups:
72+ print "backup for %s already exists." % today
73+ else:
74+ run_backup(todays_url)
75+ all_backups.append(todays_url)
76+ remove_backups(all_backups[:-MAX_BACKUPS])

Subscribers

People subscribed via source and target branches