Merge lp:~stub/charms/precise/postgresql/bug-1276024-fix-log-shipping into lp:charms/postgresql

Proposed by Stuart Bishop
Status: Merged
Merge reported by: Marco Ceppi
Merged at revision: not available
Proposed branch: lp:~stub/charms/precise/postgresql/bug-1276024-fix-log-shipping
Merge into: lp:charms/postgresql
Prerequisite: lp:~stub/charms/precise/postgresql/bug-1281600-log_temp_files
Diff against target: 333 lines (+167/-30)
5 files modified
config.yaml (+6/-3)
hooks/hooks.py (+89/-9)
templates/postgres.cron.tmpl (+9/-12)
templates/swiftwal.conf.tmpl (+5/-5)
test.py (+58/-1)
To merge this branch: bzr merge lp:~stub/charms/precise/postgresql/bug-1276024-fix-log-shipping
Reviewer Review Type Date Requested Status
charmers Pending
Review via email: mp+225324@code.launchpad.net

Description of the change

Wire up the STILL EXPERIMENTAL Swift log shipping, per Bug #1276024.

SwiftWAL is a tool I wrote to allow PostgreSQL to push filesystem level
backups and log ship WAL files to Swift. It is similar to WAL-E. In fact,
it is so similar that I will probably scrap it in favor of WAL-E since
WAL-E now supports Swift in addition to Amazon S3. Switching to or adding
support for WAL-E should be mostly mechanical work, although IIRC it
still needed bidirectional SSH setup too.

Anyway, back to the branch at hand, this branch is concerned with log
shipping to Swift. Log shipping is one component of configuring point
in time recovery, and allows us to replicate without a streaming
replication connection.

This branch also fixes up filesystem backups to Swift, but finalizing
backups, recovery and tests is for a later branch and may require juju
actions to be implemented first.

I'm keeping the EXPERIMENTAL status since I'm expecting changes to the
design. For instance, should all the OpenStack configuration be in the
config, or should be be creating a relation to some service to retrieve
the credentials? Or perhaps we need a generic object storage service that
proxies to the provider's object storage system? It would be great if I
could write this once and have it work on all providers.

The Swift tests are skipped unless a valid set of OpenStack credentials
are found in the environment (OS_TENANT_NAME, OS_AUTH_URL etc.).

To post a comment you must log in.
Revision history for this message
Charles Butler (lazypower) wrote :

Stub,

I've taken a look through this, and with you openly stating its a stop gap prior to Wal-E, do we want to pursue this branch in leu of that integration? Or would it be prudent to pin this and review when WAL-E lands?

I'm not seeing anything terribly heinous in this, but I also haven't run deployment tests on it as of yet.

Revision history for this message
Stuart Bishop (stub) wrote :

@lazypower The next branch in the pipeline builds on this one and adds wal-e support. I then get to test both under real conditions, and will probably drop SwiftWAL when this stops being experimental.

Revision history for this message
Stuart Bishop (stub) wrote :

btw. if you test https://code.launchpad.net/~stub/charms/precise/postgresql/test-upgrade-charm/+merge/229914 you will be testing all 4 mps in the queue (they form a development pipeline, each dependant on the previous to avoid merge conflicts)

150. By Stuart Bishop

Resolve conflicts

151. By Stuart Bishop

Merged bug-1281600-log_temp_files into bug-1276024-fix-log-shipping.

152. By Stuart Bishop

Merged bug-1281600-log_temp_files into bug-1276024-fix-log-shipping.

153. By Stuart Bishop

Merged bug-1281600-log_temp_files into bug-1276024-fix-log-shipping.

Revision history for this message
Whit Morriss (whitmo) wrote :

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'config.yaml'
2--- config.yaml 2014-08-27 12:56:33 +0000
3+++ config.yaml 2014-08-27 12:56:33 +0000
4@@ -172,9 +172,10 @@
5 default: False
6 type: boolean
7 description: |
8+ DEPRECATED.
9 Hot standby or warm standby. When True, queries can be run against
10 the database when in recovery or standby mode (ie. replicated).
11- Overridden by juju when master/slave relations are used.
12+ Overridden when service contains multiple units.
13 hot_standby_feedback:
14 default: False
15 type: boolean
16@@ -189,7 +190,7 @@
17 'minimal', 'archive' or 'hot_standby'. Defines how much information
18 is written to the WAL. Set to 'minimal' for stand alone databases
19 and 'hot_standby' for replicated setups. Overridden by juju when
20- replication s used.
21+ replication is used.
22 max_wal_senders:
23 default: 0
24 type: int
25@@ -392,7 +393,9 @@
26 description: |
27 EXPERIMENTAL.
28 Swift container prefix for SwiftWAL to use. Must be set if any
29- SwiftWAL features are enabled.
30+ SwiftWAL features are enabled. This will become a simple
31+ swiftwal_container config item when proper leader election is
32+ implemented in juju.
33 swiftwal_backup_schedule:
34 type: string
35 default: ""
36
37=== modified file 'hooks/hooks.py'
38--- hooks/hooks.py 2014-08-27 12:56:33 +0000
39+++ hooks/hooks.py 2014-08-27 12:56:33 +0000
40@@ -395,11 +395,21 @@
41 log('Ensuring minimal replication settings')
42 config_data['hot_standby'] = True
43 config_data['wal_level'] = 'hot_standby'
44- config_data['max_wal_senders'] = max(
45- num_slaves, config_data['max_wal_senders'])
46 config_data['wal_keep_segments'] = max(
47 config_data['wal_keep_segments'],
48 config_data['replicated_wal_keep_segments'])
49+ # We need this set even if config_data['streaming_replication']
50+ # is False, because the replication connection is still needed
51+ # by pg_basebackup to build a hot standby.
52+ config_data['max_wal_senders'] = max(
53+ num_slaves, config_data['max_wal_senders'])
54+
55+ # Log shipping to Swift using SwiftWAL. This could be for
56+ # non-streaming replication, or for PITR.
57+ if config_data['swiftwal_log_shipping']:
58+ config_data['archive_mode'] = True
59+ config_data['wal_level'] = 'hot_standby'
60+ config_data['archive_command'] = swiftwal_archive_command()
61
62 # Send config data to the template
63 # Return it as pg_config
64@@ -596,15 +606,14 @@
65 def install_postgresql_crontab(output_file):
66 '''Create the postgres user's crontab'''
67 config_data = hookenv.config()
68- crontab_data = {
69- 'backup_schedule': config_data["backup_schedule"],
70- 'scripts_dir': postgresql_scripts_dir,
71- 'backup_days': config_data["backup_retention_count"],
72- }
73+ config_data['scripts_dir'] = postgresql_scripts_dir
74+ config_data['swiftwal_backup_command'] = swiftwal_backup_command()
75+ config_data['swiftwal_prune_command'] = swiftwal_prune_command()
76+
77 charm_dir = hookenv.charm_dir()
78 template_file = "{}/templates/postgres.cron.tmpl".format(charm_dir)
79 crontab_template = Template(
80- open(template_file).read()).render(crontab_data)
81+ open(template_file).read()).render(config_data)
82 host.write_file(output_file, crontab_template, perms=0600)
83
84
85@@ -627,7 +636,8 @@
86 'host': master_host,
87 'port': master_port,
88 'password': local_state['replication_password'],
89- 'streaming_replication': streaming_replication})
90+ 'streaming_replication': streaming_replication,
91+ 'restore_command': swiftwal_restore_command()})
92 log(recovery_conf, DEBUG)
93 host.write_file(
94 os.path.join(postgresql_cluster_dir, 'recovery.conf'),
95@@ -638,6 +648,64 @@
96 postgresql_restart()
97
98
99+def swiftwal_config():
100+ postgresql_config_dir = _get_postgresql_config_dir()
101+ return os.path.join(postgresql_config_dir, "swiftwal.conf")
102+
103+
104+def create_swiftwal_config():
105+ if not hookenv.config('swiftwal_container_prefix'):
106+ return
107+
108+ # Until juju provides us with proper leader election, we have a
109+ # state where units do not know if they are alone or part of a
110+ # cluster. To avoid units stomping on each others WAL and backups,
111+ # we use a unique Swift container for each unit when they are not
112+ # part of the peer relation. Once they are part of the peer
113+ # relation, they share a container.
114+ if local_state.get('state', 'standalone') == 'standalone':
115+ container = '{}_{}'.format(hookenv.config('swiftwal_container_prefix'),
116+ hookenv.local_unit().split('/')[-1])
117+ else:
118+ container = hookenv.config('swiftwal_container_prefix')
119+
120+ template_file = os.path.join(hookenv.charm_dir(),
121+ 'templates', 'swiftwal.conf.tmpl')
122+ params = dict(hookenv.config())
123+ params['swiftwal_container'] = container
124+ content = Template(open(template_file).read()).render(params)
125+ host.write_file(swiftwal_config(), content, "postgres", "postgres", 0o600)
126+
127+
128+def swiftwal_archive_command():
129+ '''Return the archive_command needed in postgresql.conf'''
130+ return 'swiftwal --config={} archive-wal %p'.format(swiftwal_config())
131+
132+
133+def swiftwal_restore_command():
134+ '''Return the restore_command needed in recovery.conf'''
135+ return 'swiftwal --config={} restore-wal %f %p'.format(swiftwal_config())
136+
137+
138+def swiftwal_backup_command():
139+ '''Return the backup command needed in postgres' crontab'''
140+ cmd = 'swiftwal --config={} backup --port={}'.format(swiftwal_config(),
141+ get_service_port())
142+ if not hookenv.config('swiftwal_log_shipping'):
143+ cmd += ' --xlog'
144+ return cmd
145+
146+
147+def swiftwal_prune_command():
148+ '''Return the backup & wal pruning command needed in postgres' crontab'''
149+ config = hookenv.config()
150+ args = '--keep-backups={} --keep-wals={}'.format(
151+ config.get('swiftwal_backup_retention', 0),
152+ max(config['wal_keep_segments'],
153+ config['replicated_wal_keep_segments']))
154+ return 'swiftwal --config={} prune {}'.format(swiftwal_config(), args)
155+
156+
157 def update_service_port():
158 old_port = local_state.get('listen_port', None)
159 new_port = get_service_port()
160@@ -918,6 +986,7 @@
161 generate_postgresql_hba(postgresql_hba)
162 create_ssl_cert(os.path.join(
163 postgresql_data_dir, pg_version(), config_data['cluster_name']))
164+ create_swiftwal_config()
165 update_service_port()
166 update_nrpe_checks()
167 if force_restart:
168@@ -1556,12 +1625,18 @@
169 "postgresql-contrib-{}".format(version),
170 "postgresql-plpython-{}".format(version),
171 "python-jinja2", "python-psycopg2"]
172+
173 # PGDG currently doesn't have debversion for 9.3 & 9.4. Put this back
174 # when it does.
175 if not (hookenv.config('pgdg') and version in ('9.3', '9.4')):
176 packages.append("postgresql-{}-debversion".format(version))
177+
178 if hookenv.config('performance_tuning').lower() != 'manual':
179 packages.append('pgtune')
180+
181+ if hookenv.config('swiftwal_container_prefix'):
182+ packages.append('swiftwal')
183+
184 packages.extend((hookenv.config('extra-packages') or '').split())
185 packages = fetch.filter_installed_packages(packages)
186 # Set package state for main postgresql package if installed
187@@ -1825,6 +1900,11 @@
188 _get_postgresql_config_dir(), "pg_hba.conf")
189 generate_postgresql_hba(postgresql_hba)
190
191+ # Swift container name make have changed, so regenerate the SwiftWAL
192+ # config. This can go away when we have real leader election and can
193+ # safely share a single container.
194+ create_swiftwal_config()
195+
196 local_state.publish()
197
198
199
200=== modified file 'templates/postgres.cron.tmpl'
201--- templates/postgres.cron.tmpl 2013-11-19 11:38:45 +0000
202+++ templates/postgres.cron.tmpl 2014-08-27 12:56:33 +0000
203@@ -1,14 +1,11 @@
204-{{backup_schedule}} postgres {{scripts_dir}}/pg_backup_job {{backup_days}}
205-{% if swiftwal_container -%}
206+# Maintained by juju
207+#
208+{% if backup_schedule -%}
209+{{backup_schedule}} postgres \
210+ {{scripts_dir}}/pg_backup_job {{backup_retention_count}}
211+{% endif -%}
212+
213 {% if swiftwal_backup_schedule -%}
214-{% if swiftwal_log_shipping -%}
215-{{swiftwal_backup_schedule}} postgres \
216- swiftwal --config={{swiftwal_config}} backup && \
217- swiftwal --config={{swiftwal_config}} prune -n {{swiftwal_backup_retention}}
218-{% else -%}
219-{{swiftwal_backup_schedule}} postgres \
220- swiftwal --config={{swiftwal_config}} backup --xlog && \
221- swiftwal --config={{swiftwal_config}} prune -n {{swiftwal_backup_retention}}
222-{% endif -%}
223-{% endif -%}
224+{{swiftwal_backup_schedule}} postgres \
225+ {{swiftwal_backup_command}} && {{swiftwal_prune_command}}
226 {% endif -%}
227
228=== modified file 'templates/swiftwal.conf.tmpl'
229--- templates/swiftwal.conf.tmpl 2013-11-22 13:41:01 +0000
230+++ templates/swiftwal.conf.tmpl 2014-08-27 12:56:33 +0000
231@@ -1,6 +1,6 @@
232 # Generated and maintained by juju
233-OS_USERNAME: {{config.os_username}}
234-OS_TENANT_NAME: {{config.os_tenant_name}}
235-OS_PASSWORD: {{config.os_password}}
236-OS_AUTH_URL: {{config.os_auth_url}}
237-CONTAINER: {{vars.container}}
238+OS_USERNAME: {{os_username}}
239+OS_TENANT_NAME: {{os_tenant_name}}
240+OS_PASSWORD: {{os_password}}
241+OS_AUTH_URL: {{os_auth_url}}
242+CONTAINER: {{swiftwal_container}}
243
244=== modified file 'test.py'
245--- test.py 2014-06-10 11:36:17 +0000
246+++ test.py 2014-08-27 12:56:33 +0000
247@@ -9,6 +9,7 @@
248 juju destroy-environment
249 """
250
251+from datetime import datetime
252 import os.path
253 import signal
254 import socket
255@@ -33,6 +34,15 @@
256 pass
257
258
259+def skip_if_swift_is_unavailable():
260+ os_keys = set(['OS_TENANT_NAME', 'OS_AUTH_URL',
261+ 'OS_USERNAME', 'OS_PASSWORD'])
262+ for os_key in os_keys:
263+ if os_key not in os.environ:
264+ return unittest.skip('Swift is unavailable')
265+ return lambda x: x
266+
267+
268 class PostgreSQLCharmBaseTestCase(object):
269
270 # Override these in subclasses to run these tests multiple times
271@@ -371,6 +381,54 @@
272
273 self.assertEqual(num_slaves, 1, 'Slave not connected')
274
275+ @skip_if_swift_is_unavailable()
276+ def test_swiftwal_logshipping_replication(self):
277+ os_keys = set(['OS_TENANT_NAME', 'OS_AUTH_URL',
278+ 'OS_USERNAME', 'OS_PASSWORD'])
279+ for os_key in os_keys:
280+ self.pg_config[os_key.lower()] = os.environ[os_key]
281+ self.pg_config['streaming_replication'] = False
282+ self.pg_config['swiftwal_log_shipping'] = True
283+ self.pg_config['swiftwal_container_prefix'] = '{}_{}'.format(
284+ '_juju_pg_tests', datetime.utcnow().strftime('%Y%m%dT%H%M%SZ'))
285+ self.pg_config['install_sources'] = 'ppa:stub/pgcharm'
286+
287+ def swift_cleanup():
288+ prefix = self.pg_config['swiftwal_container_prefix']
289+ for container in [prefix, prefix + '_1', prefix + '_2']:
290+ # Ignore errors and output
291+ subprocess.call(['swift', 'delete', container],
292+ stdout=subprocess.PIPE,
293+ stderr=subprocess.STDOUT)
294+ self.addCleanup(swift_cleanup)
295+
296+ self.juju.deploy(
297+ TEST_CHARM, 'postgresql', num_units=2, config=self.pg_config)
298+ self.juju.deploy(PSQL_CHARM, 'psql')
299+ self.juju.do(['add-relation', 'postgresql:db-admin', 'psql:db-admin'])
300+ self.wait_until_ready(['postgresql/0', 'postgresql/1'])
301+
302+ # Confirm that the slave has not opened a streaming
303+ # replication connection.
304+ num_slaves = self.sql('SELECT COUNT(*) FROM pg_stat_replication',
305+ 'master', dbname='postgres')[0][0]
306+ self.assertEqual(num_slaves, 0, 'Streaming connection found')
307+
308+ # Confirm that replication is actually happening.
309+ # Create a table and force a WAL change.
310+ self.sql('CREATE TABLE foo AS SELECT generate_series(0,100)',
311+ 'master', dbname='postgres')
312+ self.sql('SELECT pg_switch_xlog()',
313+ 'master', dbname='postgres')
314+ timeout = time.time() + 120
315+ table_found = False
316+ while time.time() < timeout and not table_found:
317+ time.sleep(1)
318+ if self.sql("SELECT TRUE from pg_class WHERE relname='foo'",
319+ 'hot standby', dbname='postgres'):
320+ table_found = True
321+ self.assertTrue(table_found, "Replication not replicating")
322+
323 def test_basic_admin(self):
324 '''Connect to a single unit service via the db-admin relationship.'''
325 self.juju.deploy(TEST_CHARM, 'postgresql', config=self.pg_config)
326@@ -546,7 +604,6 @@
327 self.assertIs(False, self.is_master(standby_unit_1, 'postgres'))
328
329 def test_admin_addresses(self):
330-
331 # This test also tests explicit port assignment. We need
332 # a different port for each PostgreSQL version we might be
333 # testing, because clusters from previous tests of different

Subscribers

People subscribed via source and target branches