Merge lp:~free.ekanayaka/storm/schema-sharding into lp:storm

Proposed by Free Ekanayaka
Status: Merged
Merged at revision: 469
Proposed branch: lp:~free.ekanayaka/storm/schema-sharding
Merge into: lp:storm
Prerequisite: lp:~free.ekanayaka/storm/schema-advance
Diff against target: 0 lines
To merge this branch: bzr merge lp:~free.ekanayaka/storm/schema-sharding
Reviewer Review Type Date Requested Status
Björn Tillenius (community) Approve
Review via email: mp+242133@code.launchpad.net

Description of the change

Add a new storm.schema.sharding.Sharding class that can be used for performing patch upgrades across a set of stores. The patches can be applied either "vertically" (i.e. all pending patches for the first store, then all pending patches for the second store, etc) or "horizontally" (i.e. all patches with version N across all stores, than all patches with version N + 1 across all stores etc).

To post a comment you must log in.
471. By Free Ekanayaka

Add docstrings

Revision history for this message
Björn Tillenius (bjornt) wrote :

Before reviewing the code itself, I'd like to have a discussion the approach it self.

Looking at the code, it looks like everytime I want to patch a single store, I would have to add an empty patch file for all the other stores? That seems cumbersome.

I'd like to have a single patch file that can patch all the stores. That will make it much easier to track what's going on when you have multiple-store patches. With different files it's hard to get an overview and you still have the problem that you have to know the order the stores get patched in.

For example, let's say that you have two stores, A and B. You want to patch both A and B, and migrate data using data from both A and B. If you select from tables you patched in A and B, you have to consider whether the table have been patched or not. It'd be much easier if you had everything in a single file, e.g.:

  def patch(stores):
     patch_something(stores["a"])
     migrate_data(stores["a"], stores["b"])
     patch_something_else(stores["b])

What would be the downsides of having a single file for all the stores?

review: Needs Information
Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

> Before reviewing the code itself, I'd like to have a discussion the approach
> it self.
>
> Looking at the code, it looks like everytime I want to patch a single store, I
> would have to add an empty patch file for all the other stores? That seems
> cumbersome.
>
> I'd like to have a single patch file that can patch all the stores. That will
> make it much easier to track what's going on when you have multiple-store
> patches. With different files it's hard to get an overview and you still have
> the problem that you have to know the order the stores get patched in.
>
> For example, let's say that you have two stores, A and B. You want to patch
> both A and B, and migrate data using data from both A and B. If you select
> from tables you patched in A and B, you have to consider whether the table
> have been patched or not. It'd be much easier if you had everything in a
> single file, e.g.:
>
> def patch(stores):
> patch_something(stores["a"])
> migrate_data(stores["a"], stores["b"])
> patch_something_else(stores["b])
>
> What would be the downsides of having a single file for all the stores?

The downside I see is that you might some stores where you want to apply the same schema changes (think account-1 and account-2) in Landscape.

In that case you have to manually loop over those stores (whose number might be dynamic). For example let's say you have "b-1" and "b-2" instead of "b":

   def patch(stores):
      patch_something(stores["a"])
      b_stores = [store for name, store in stores.items() if name.startswith("b-")]
      for b_store in b_stores:
          migrate_data(stores["a"], b_store)
          patch_something_else(b_store)

At least for Landscape the use case of cross-store data migration is much more infrequent than the use case of an isolated schema change against a certain store or set of stores.

Also, migrating from the current "vertical" patch system to the one you propose is more involved than migrating to the system proposed in the branch.

Regarding downsides of the system proposed in the branch, yes, you have to know the order, but it doesn't seem too create big headaches in real life. As for code living in different files, depending on how you see it the other flip of the coin is that could be also be a plus, because you know more easily what happens in a certain store (without the "noise" of changes in other stores), but I think either way the differences are minor.

Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

> > Before reviewing the code itself, I'd like to have a discussion the approach
> > it self.
> >
> > Looking at the code, it looks like everytime I want to patch a single store,
> I
> > would have to add an empty patch file for all the other stores? That seems
> > cumbersome.
> >
> > I'd like to have a single patch file that can patch all the stores. That
> will
> > make it much easier to track what's going on when you have multiple-store
> > patches. With different files it's hard to get an overview and you still
> have
> > the problem that you have to know the order the stores get patched in.
> >
> > For example, let's say that you have two stores, A and B. You want to patch
> > both A and B, and migrate data using data from both A and B. If you select
> > from tables you patched in A and B, you have to consider whether the table
> > have been patched or not. It'd be much easier if you had everything in a
> > single file, e.g.:
> >
> > def patch(stores):
> > patch_something(stores["a"])
> > migrate_data(stores["a"], stores["b"])
> > patch_something_else(stores["b])
> >
> > What would be the downsides of having a single file for all the stores?
>
> The downside I see is that you might some stores where you want to apply the
> same schema changes (think account-1 and account-2) in Landscape.
>
> In that case you have to manually loop over those stores (whose number might
> be dynamic). For example let's say you have "b-1" and "b-2" instead of "b":
>
> def patch(stores):
> patch_something(stores["a"])
> b_stores = [store for name, store in stores.items() if
> name.startswith("b-")]
> for b_store in b_stores:
> migrate_data(stores["a"], b_store)
> patch_something_else(b_store)
>
> At least for Landscape the use case of cross-store data migration is much more
> infrequent than the use case of an isolated schema change against a certain
> store or set of stores.
>
> Also, migrating from the current "vertical" patch system to the one you
> propose is more involved than migrating to the system proposed in the branch.
>
> Regarding downsides of the system proposed in the branch, yes, you have to
> know the order, but it doesn't seem too create big headaches in real life. As
> for code living in different files, depending on how you see it the other flip
> of the coin is that could be also be a plus, because you know more easily what
> happens in a certain store (without the "noise" of changes in other stores),
> but I think either way the differences are minor.

Another downside of having a single patch file/function is that when there's a failure you don't immediately know which store failed, so it can be a tad harder to debug the problem.

472. By Free Ekanayaka

Merge from schema-advance

473. By Free Ekanayaka

Merge from schema-advance

Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

I've changed the code to support a directory structure like this one:

myschema/
    __init__.py
    patch_1/
        __init__.py
        a.py
    patch_2/
        __init__.py
        b.py
    patch_3/
        __init__.py
        a.py
        b.py

This solves the issue of needing empty patch files (if you don't put a file for a certain store/schema in the patch directory, it will be automatically considered an empty patch for that store/schema), and makes it a bit easier all patch files for a certain patch level.

It still assumes a fixed stores order when applying patches, but I think the additional complexity of per-patch ordering isn't worth the effort.

The only branch that actually needed to be changed is the other one, schema-advance, this one is mostly untouched.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Looks good now. I added a comment with a suggestion of a slightly alternative approach, but I still think this branch can be landed as it is.

review: Approve
Revision history for this message
Free Ekanayaka (free.ekanayaka) :
474. By Free Ekanayaka

Address review comments

475. By Free Ekanayaka

Improve documentation

Preview Diff

Empty

Subscribers

People subscribed via source and target branches

to status/vote changes: