Merge lp:~stub/charms/precise/postgresql/wal-e into lp:charms/postgresql

Proposed by Stuart Bishop
Status: Merged
Merged at revision: 105
Proposed branch: lp:~stub/charms/precise/postgresql/wal-e
Merge into: lp:charms/postgresql
Prerequisite: lp:~stub/charms/precise/postgresql/bug-1276024-fix-log-shipping
Diff against target: 462 lines (+288/-16) (has conflicts)
5 files modified
README.md (+39/-0)
config.yaml (+57/-8)
hooks/hooks.py (+125/-7)
templates/postgres.cron.tmpl (+5/-0)
test.py (+62/-1)
Text conflict in hooks/hooks.py
To merge this branch: bzr merge lp:~stub/charms/precise/postgresql/wal-e
Reviewer Review Type Date Requested Status
Whit Morriss (community) charm Needs Information
charmers Pending
Review via email: mp+225807@code.launchpad.net

Description of the change

WAL-E support.

WAL-E gives us log shipping and point in time backups to OpenStack Swift, Windows Azure Block Storage, and Amazon S3.

This branch adds EXPERIMENTAL support for all three.

It adds tests for Swift log shipping. S3 and WABS have not been tested (not even manually).

There are no tests yet for PITR backup or restore.

To post a comment you must log in.
Revision history for this message
Whit Morriss (whitmo) wrote :

This is an important feature, thanks Stuart!

My main concern is adding some documentation to the README about wal-e and this implementation: basic overview of the feature, how to use the config values, caveats about it's experimental nature.

With that added, I'd suggest it for merging.

-w

review: Needs Information (charm)
Revision history for this message
Stuart Bishop (stub) wrote :

Thanks. I have added some documentation for the experimental wal-e support. I have not bothered documenting SwiftWAL, as I expect my wal-e tests will confirm that wal-e is a suitable and more capable replacement.

Revision history for this message
Stuart Bishop (stub) wrote :

Not actually merged.

There is a second MP made on request for people who want to review several branches as a big unrelated mess of changes (https://code.launchpad.net/~stub/charms/precise/postgresql/integration/+merge/233666), but this one should be much easier to review.

Revision history for this message
Whit Morriss (whitmo) wrote :

thanks Stuart!

-w

On Sun, Sep 21, 2014 at 1:39 AM, Stuart Bishop <email address hidden>
wrote:

> Not actually merged.
>
> There is a second MP made on request for people who want to review several
> branches as a big unrelated mess of changes (
> https://code.launchpad.net/~stub/charms/precise/postgresql/integration/+merge/233666),
> but this one should be much easier to review.
> --
>
> https://code.launchpad.net/~stub/charms/precise/postgresql/wal-e/+merge/225807
> You are reviewing the proposed merge of
> lp:~stub/charms/precise/postgresql/wal-e into lp:charms/postgresql.
>

--
---------------
D. Whit Morriss
Developer, Juju Ecosystem
Canonical USA

160. By Stuart Bishop

Merged bug-1276024-fix-log-shipping into wal-e.

161. By Stuart Bishop

Merged bug-1276024-fix-log-shipping into wal-e.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'README.md'
2--- README.md 2014-06-27 05:27:12 +0000
3+++ README.md 2014-10-08 12:49:24 +0000
4@@ -255,6 +255,44 @@
5 6. juju add-relation postgresql storage
6
7
8+# Point In Time Recovery
9+
10+The PostgreSQL charm has experimental support for log shipping and point
11+in time recovery. This feature uses the wal-e[2] tool, and requires access
12+to Amazon S3, Microsoft Azure Block Storage or Swift. This feature is
13+flagged as experimental because it has only been tested with Swift, and
14+not yet been tested under load. It also may require some API changes,
15+particularly on how authentication credentials are accessed when a standard
16+emerges. The charm can be configured to perform regular filesystem backups
17+and ship WAL files to the object store. Hot standbys will make use of the
18+archived WAL files, allowing them to resync after extended netsplits or
19+even let you turn off streaming replication entirely.
20+
21+With a base backup and the WAL archive you can perform point in time
22+recovery, but this is still a manual process and the charm does not
23+yet help you do it. The simplest approach would be to create a new
24+PostgreSQL service containing a single unit, 'juju ssh' in and use
25+wal-e to replace the database after shutting it down, create a
26+recovery.conf to replay the archived WAL files using wal-e, restart the
27+database and wait for it to recover. Once recovered, new hot standby
28+units can be added and client services related to the new database
29+service.
30+
31+To enable the experimental wal-e support with Swift, you will need to
32+use Ubuntu 14.04 (Trusty), and set the service configuration settings
33+similar to the following::
34+
35+ postgresql:
36+ wal_e_storage_uri: swift://mycontainer
37+ os_username: my_swift_username
38+ os_password: my_swift_password
39+ os_auth_url: https://keystone.auth.url.example.com:8080/v2/
40+ os_tenant_name: my_tenant_name
41+ install_sources: |
42+ - ppa:stub/pgcharm
43+ - cloud:icehouse
44+
45+
46 # Contact Information
47
48 ## PostgreSQL
49@@ -265,3 +303,4 @@
50 - [PostgreSQL Mailing List](http://www.postgresql.org/list/)
51
52 [1]: https://bugs.launchpad.net/charms/+source/postgresql/+bug/1258485
53+ [2]: https://github.com/wal-e/wal-e
54
55=== modified file 'config.yaml'
56--- config.yaml 2014-10-08 12:49:24 +0000
57+++ config.yaml 2014-10-08 12:49:24 +0000
58@@ -389,7 +389,7 @@
59 # Swift backups and PITR via SwiftWAL
60 swiftwal_container_prefix:
61 type: string
62- default: ""
63+ default: null
64 description: |
65 EXPERIMENTAL.
66 Swift container prefix for SwiftWAL to use. Must be set if any
67@@ -398,7 +398,7 @@
68 implemented in juju.
69 swiftwal_backup_schedule:
70 type: string
71- default: ""
72+ default: null
73 description: |
74 EXPERIMENTAL.
75 Cron-formatted schedule for SwiftWAL database backups.
76@@ -416,34 +416,83 @@
77 description: |
78 EXPERIMENTAL.
79 Archive WAL files into Swift. If swiftwal_backup_schedule is set,
80- this allows point-in-time recovery and WAL files are removed
81+ allows point-in-time recovery and WAL files are removed
82 automatically with old backups. If swiftwal_backup_schedule is not set
83 then WAL files are never removed. Enabling this option will override
84 the archive_mode and archive_command settings.
85+ wal_e_storage_uri:
86+ type: string
87+ default: null
88+ description: |
89+ EXPERIMENTAL.
90+ Specify storage to be used by WAL-E. Every PostgreSQL service must use
91+ a unique URI. Backups will be unrecoverable if it is not unique. The
92+ URI's scheme must be one of 'swift' (OpenStack Swift), 's3' (Amazon AWS)
93+ or 'wabs' (Windows Azure). For example:
94+ 'swift://some-container/directory/or/whatever'
95+ 's3://some-bucket/directory/or/whatever'
96+ 'wabs://some-bucket/directory/or/whatever'
97+ Setting the wal_e_storage_uri enables regular WAL-E filesystem level
98+ backups (per wal_e_backup_schedule), and log shipping to the configured
99+ storage. Point-in-time recovery becomes possible, as is disabling the
100+ streaming_replication configuration item and relying solely on
101+ log shipping for replication.
102+ wal_e_backup_schedule:
103+ type: string
104+ default: "13 0 * * *"
105+ description: |
106+ EXPERIMENTAL.
107+ Cron-formatted schedule for WAL-E database backups. If
108+ wal_e_backup_schedule is unset, WAL files will never be removed from
109+ WAL-E storage.
110+ wal_e_backup_retention:
111+ type: int
112+ default: 2
113+ description: |
114+ EXPERIMENTAL.
115+ Number of recent base backups and WAL files to retain.
116+ You need enough space for this many backups plus one more, as
117+ an old backup will only be removed after a new one has been
118+ successfully made to replace it.
119 streaming_replication:
120 type: boolean
121 default: true
122 description: |
123- EXPERIMENTAL.
124 Enable streaming replication. Normally, streaming replication is
125 always used, and any log shipping configured is used as a fallback.
126 Turning this off without configuring log shipping is an error.
127 os_username:
128 type: string
129- default: ""
130+ default: null
131 description: EXPERIMENTAL. OpenStack Swift username.
132 os_password:
133 type: string
134- default: ""
135+ default: null
136 description: EXPERIMENTAL. OpenStack Swift password.
137 os_auth_url:
138 type: string
139- default: ""
140+ default: null
141 description: EXPERIMENTAL. OpenStack Swift authentication URL.
142 os_tenant_name:
143 type: string
144- default: ""
145+ default: null
146 description: EXPERIMENTAL. OpenStack Swift tenant name.
147+ aws_access_key_id:
148+ type: string
149+ default: null
150+ description: EXPERIMENTAL. Amazon AWS access key id.
151+ aws_secret_access_key:
152+ type: string
153+ default: null
154+ description: EXPERIMENTAL. Amazon AWS secret access key.
155+ wabs_account_name:
156+ type: string
157+ default: null
158+ description: EXPERIMENTAL. Windows Azure account name.
159+ wabs_access_key:
160+ type: string
161+ default: null
162+ description: EXPERIMENTAL. Windows Azure access key.
163 package_status:
164 default: "install"
165 type: string
166
167=== modified file 'hooks/hooks.py'
168--- hooks/hooks.py 2014-10-08 12:49:24 +0000
169+++ hooks/hooks.py 2014-10-08 12:49:24 +0000
170@@ -16,6 +16,7 @@
171 from tempfile import NamedTemporaryFile
172 from textwrap import dedent
173 import time
174+import urlparse
175
176 from charmhelpers import fetch
177 from charmhelpers.core import hookenv, host
178@@ -407,11 +408,16 @@
179
180 # Log shipping to Swift using SwiftWAL. This could be for
181 # non-streaming replication, or for PITR.
182- if config_data['swiftwal_log_shipping']:
183+ if config_data.get('swiftwal_log_shipping', None):
184 config_data['archive_mode'] = True
185 config_data['wal_level'] = 'hot_standby'
186 config_data['archive_command'] = swiftwal_archive_command()
187
188+ if config_data.get('wal_e_storage_uri', None):
189+ config_data['archive_mode'] = True
190+ config_data['wal_level'] = 'hot_standby'
191+ config_data['archive_command'] = wal_e_archive_command()
192+
193 # Send config data to the template
194 # Return it as pg_config
195 charm_dir = hookenv.charm_dir()
196@@ -610,6 +616,8 @@
197 config_data['scripts_dir'] = postgresql_scripts_dir
198 config_data['swiftwal_backup_command'] = swiftwal_backup_command()
199 config_data['swiftwal_prune_command'] = swiftwal_prune_command()
200+ config_data['wal_e_backup_command'] = wal_e_backup_command()
201+ config_data['wal_e_prune_command'] = wal_e_prune_command()
202
203 charm_dir = hookenv.charm_dir()
204 template_file = "{}/templates/postgres.cron.tmpl".format(charm_dir)
205@@ -633,12 +641,15 @@
206 charm_dir = hookenv.charm_dir()
207 streaming_replication = hookenv.config('streaming_replication')
208 template_file = "{}/templates/recovery.conf.tmpl".format(charm_dir)
209- recovery_conf = Template(open(template_file).read()).render({
210- 'host': master_host,
211- 'port': master_port,
212- 'password': local_state['replication_password'],
213- 'streaming_replication': streaming_replication,
214- 'restore_command': swiftwal_restore_command()})
215+ params = dict(
216+ host=master_host, port=master_port,
217+ password=local_state['replication_password'],
218+ streaming_replication=streaming_replication)
219+ if hookenv.config('wal_e_storage_uri'):
220+ params['restore_command'] = wal_e_restore_command()
221+ elif hookenv.config('swiftwal_log_shipping'):
222+ params['restore_command'] = swiftwal_restore_command()
223+ recovery_conf = Template(open(template_file).read()).render(params)
224 log(recovery_conf, DEBUG)
225 host.write_file(
226 os.path.join(postgresql_cluster_dir, 'recovery.conf'),
227@@ -660,6 +671,108 @@
228 if os.path.isfile(config_file):
229 return(open(config_file).read())
230 =======
231+def ensure_swift_container(container):
232+ from swiftclient import client as swiftclient
233+ config = hookenv.config()
234+ con = swiftclient.Connection(
235+ authurl=config.get('os_auth_url', ''),
236+ user=config.get('os_username', ''),
237+ key=config.get('os_password', ''),
238+ tenant_name=config.get('os_tenant_name', ''),
239+ auth_version='2.0',
240+ retries=0)
241+ try:
242+ con.head_container(container)
243+ except swiftclient.ClientException:
244+ con.put_container(container)
245+
246+
247+def wal_e_envdir():
248+ '''The envdir(1) environment location used to drive WAL-E.'''
249+ return os.path.join(_get_postgresql_config_dir(), 'wal-e.env')
250+
251+
252+def create_wal_e_envdir():
253+ '''Regenerate the envdir(1) environment used to drive WAL-E.'''
254+ config = hookenv.config()
255+ env = dict(
256+ SWIFT_AUTHURL=config.get('os_auth_url', ''),
257+ SWIFT_TENANT=config.get('os_tenant_name', ''),
258+ SWIFT_USER=config.get('os_username', ''),
259+ SWIFT_PASSWORD=config.get('os_password', ''),
260+ AWS_ACCESS_KEY_ID=config.get('aws_access_key_id', ''),
261+ AWS_SECRET_ACCESS_KEY=config.get('aws_secret_access_key', ''),
262+ WABS_ACCOUNT_NAME=config.get('wabs_account_name', ''),
263+ WABS_ACCESS_KEY=config.get('wabs_access_key', ''),
264+ WALE_SWIFT_PREFIX='',
265+ WALE_S3_PREFIX='',
266+ WALE_WABS_PREFIX='')
267+
268+ uri = config.get('wal_e_storage_uri', None)
269+
270+ if uri:
271+ # Until juju provides us with proper leader election, we have a
272+ # state where units do not know if they are alone or part of a
273+ # cluster. To avoid units stomping on each others WAL and backups,
274+ # we use a unique container for each unit when they are not
275+ # part of the peer relation. Once they are part of the peer
276+ # relation, they share a container.
277+ if local_state.get('state', 'standalone') == 'standalone':
278+ if not uri.endswith('/'):
279+ uri += '/'
280+ uri += hookenv.local_unit().split('/')[-1]
281+
282+ parsed_uri = urlparse.urlparse(uri)
283+
284+ required_env = []
285+ if parsed_uri.scheme == 'swift':
286+ env['WALE_SWIFT_PREFIX'] = uri
287+ required_env = ['SWIFT_AUTHURL', 'SWIFT_TENANT',
288+ 'SWIFT_USER', 'SWIFT_PASSWORD']
289+ ensure_swift_container(parsed_uri.netloc)
290+ elif parsed_uri.scheme == 's3':
291+ env['WALE_S3_PREFIX'] = uri
292+ required_env = ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY']
293+ elif parsed_uri.scheme == 'wabs':
294+ env['WALE_WABS_PREFIX'] = uri
295+ required_env = ['WABS_ACCOUNT_NAME', 'WABS_ACCESS_KEY']
296+ else:
297+ log('Invalid wal_e_storage_uri {}'.format(uri), ERROR)
298+
299+ for env_key in required_env:
300+ if not env[env_key].strip():
301+ log('Missing {}'.format(env_key), ERROR)
302+
303+ # Regenerate the envdir(1) environment recommended by WAL-E.
304+ # All possible keys are rewritten to ensure we remove old secrets.
305+ host.mkdir(wal_e_envdir(), 'postgres', 'postgres', 0o750)
306+ for k, v in env.items():
307+ host.write_file(
308+ os.path.join(wal_e_envdir(), k), v.strip(),
309+ 'postgres', 'postgres', 0o640)
310+
311+
312+def wal_e_archive_command():
313+ '''Return the archive_command needed in postgresql.conf.'''
314+ return 'envdir {} wal-e wal-push %p'.format(wal_e_envdir())
315+
316+
317+def wal_e_restore_command():
318+ return 'envdir {} wal-e wal-fetch "%f" "%p"'.format(wal_e_envdir())
319+
320+
321+def wal_e_backup_command():
322+ postgresql_cluster_dir = os.path.join(
323+ postgresql_data_dir, pg_version(), hookenv.config('cluster_name'))
324+ return 'envdir {} wal-e backup-push {}'.format(
325+ wal_e_envdir(), postgresql_cluster_dir)
326+
327+
328+def wal_e_prune_command():
329+ return 'envdir {} wal-e delete --confirm retain {}'.format(
330+ wal_e_envdir(), hookenv.config('wal_e_backup_retention'))
331+
332+
333 def swiftwal_config():
334 postgresql_config_dir = _get_postgresql_config_dir()
335 return os.path.join(postgresql_config_dir, "swiftwal.conf")
336@@ -1000,6 +1113,7 @@
337 create_ssl_cert(os.path.join(
338 postgresql_data_dir, pg_version(), config_data['cluster_name']))
339 create_swiftwal_config()
340+ create_wal_e_envdir()
341 update_service_port()
342 update_nrpe_checks()
343
344@@ -1671,6 +1785,9 @@
345 if hookenv.config('swiftwal_container_prefix'):
346 packages.append('swiftwal')
347
348+ if hookenv.config('wal_e_storage_uri'):
349+ packages.extend(['wal-e', 'daemontools'])
350+
351 packages.extend((hookenv.config('extra-packages') or '').split())
352 packages = fetch.filter_installed_packages(packages)
353 # Set package state for main postgresql package if installed
354@@ -1938,6 +2055,7 @@
355 # config. This can go away when we have real leader election and can
356 # safely share a single container.
357 create_swiftwal_config()
358+ create_wal_e_envdir()
359
360 local_state.publish()
361
362
363=== modified file 'templates/postgres.cron.tmpl'
364--- templates/postgres.cron.tmpl 2014-10-08 12:49:24 +0000
365+++ templates/postgres.cron.tmpl 2014-10-08 12:49:24 +0000
366@@ -9,3 +9,8 @@
367 {{swiftwal_backup_schedule}} postgres \
368 {{swiftwal_backup_command}} && {{swiftwal_prune_command}}
369 {% endif -%}
370+
371+{% if wal_e_backup_schedule -%}
372+{{wal_e_backup_schedule}} postgres \
373+ {{wal_e_backup_command}} && {{wal_e_prune_command}}
374+{% endif -%}
375
376=== modified file 'test.py'
377--- test.py 2014-10-08 12:49:24 +0000
378+++ test.py 2014-10-08 12:49:24 +0000
379@@ -25,7 +25,7 @@
380 from testing.jujufixture import JujuFixture, run
381
382
383-SERIES = os.environ.get('SERIES', 'precise').strip()
384+SERIES = os.environ.get('SERIES', 'trusty').strip()
385 TEST_CHARM = 'local:{}/postgresql'.format(SERIES)
386 PSQL_CHARM = 'local:{}/postgresql-psql'.format(SERIES)
387
388@@ -43,6 +43,22 @@
389 return lambda x: x
390
391
392+def skip_if_s3_is_unavailable():
393+ os_keys = set(['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'])
394+ for os_key in os_keys:
395+ if os_key not in os.environ:
396+ return unittest.skip('S3 is unavailable')
397+ return lambda x: x
398+
399+
400+def skip_if_wabs_is_unavailable():
401+ os_keys = set(['WABS_ACCOUNT_NAME', 'WABS_ACCESS_KEY'])
402+ for os_key in os_keys:
403+ if os_key not in os.environ:
404+ return unittest.skip('WABS is unavailable')
405+ return lambda x: x
406+
407+
408 class PostgreSQLCharmBaseTestCase(object):
409
410 # Override these in subclasses to run these tests multiple times
411@@ -429,6 +445,51 @@
412 table_found = True
413 self.assertTrue(table_found, "Replication not replicating")
414
415+ @skip_if_swift_is_unavailable()
416+ def test_wal_e_swift_logshipping(self):
417+ os_keys = set(['OS_TENANT_NAME', 'OS_AUTH_URL',
418+ 'OS_USERNAME', 'OS_PASSWORD'])
419+ container = '_juju_pg_tests'
420+ for os_key in os_keys:
421+ self.pg_config[os_key.lower()] = os.environ[os_key]
422+ self.pg_config['streaming_replication'] = False
423+ self.pg_config['wal_e_storage_uri'] = 'swift://{}/{}'.format(
424+ container, datetime.utcnow().strftime('%Y%m%dT%H%M%SZ'))
425+ self.pg_config['install_sources'] = 'ppa:stub/pgcharm'
426+
427+ def swift_cleanup():
428+ subprocess.call(['swift', 'delete', container],
429+ stdout=open(os.devnull, 'wb'),
430+ stderr=subprocess.STDOUT)
431+ self.addCleanup(swift_cleanup)
432+
433+ self.juju.deploy(
434+ TEST_CHARM, 'postgresql', num_units=2, config=self.pg_config)
435+ self.juju.deploy(PSQL_CHARM, 'psql')
436+ self.juju.do(['add-relation', 'postgresql:db-admin', 'psql:db-admin'])
437+ self.wait_until_ready(['postgresql/0', 'postgresql/1'])
438+
439+ # Confirm that the slave has not opened a streaming
440+ # replication connection.
441+ num_slaves = self.sql('SELECT COUNT(*) FROM pg_stat_replication',
442+ 'master', dbname='postgres')[0][0]
443+ self.assertEqual(num_slaves, 0, 'Streaming connection found')
444+
445+ # Confirm that replication is actually happening.
446+ # Create a table and force a WAL change.
447+ self.sql('CREATE TABLE foo AS SELECT generate_series(0,100)',
448+ 'master', dbname='postgres')
449+ self.sql('SELECT pg_switch_xlog()',
450+ 'master', dbname='postgres')
451+ timeout = time.time() + 120
452+ table_found = False
453+ while time.time() < timeout and not table_found:
454+ time.sleep(1)
455+ if self.sql("SELECT TRUE from pg_class WHERE relname='foo'",
456+ 'hot standby', dbname='postgres'):
457+ table_found = True
458+ self.assertTrue(table_found, "Replication not replicating")
459+
460 def test_basic_admin(self):
461 '''Connect to a single unit service via the db-admin relationship.'''
462 self.juju.deploy(TEST_CHARM, 'postgresql', config=self.pg_config)

Subscribers

People subscribed via source and target branches