Proposal to use data_migrate gem for API

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal to use data_migrate gem for API

Evan Rolfe

Proposal to use data_migrate gem for API

Hi all, I would like to propose that we use the data_migrate gem to handle changes to db data. Here is a PR
I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This
doesn't make sense to do in a normal migration because it has nothing to do with database structure only the
content of the database. This gem allows us to handle such changes in the same way we handle normal
migrations.

The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit.

However the downsides of that are that we need to always update the README's to let updaters know exactly
which rake tasks need to be run to update their db and it also makes it easier to sync data changes with
other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.

If you have any opinions please post on the PR to continue this discussion.

Thanks


-- 
Evan Rolfe
Full Stack Web Developer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Christian Bruckmayer-2

Hi,


On 09/04/2017 12:13 PM, Evan Rolfe wrote:

Proposal to use data_migrate gem for API

Hi all, I would like to propose that we use the data_migrate gem to handle changes to db data. Here is a PR
I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This
doesn't make sense to do in a normal migration because it has nothing to do with database structure only the
content of the database. This gem allows us to handle such changes in the same way we handle normal
migrations.

The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit.

However the downsides of that are that we need to always update the README's to let updaters know exactly
which rake tasks need to be run to update their db and it also makes it easier to sync data changes with
other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.

If you have any opinions please post on the PR to continue this discussion.

Thanks


I'm fine with it and like the idea! Go ahead :)

Christian
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Adrian Schröter
On Dienstag, 5. September 2017, 16:18:37 CEST wrote Christian Bruckmayer:

> Hi,
>
>
> On 09/04/2017 12:13 PM, Evan Rolfe wrote:
> >
> > Proposal to use data_migrate gem for API
> >
> > Hi all, I would like to propose that we use the data_migrate
> > <https://github.com/ilyakatz/data-migrate> gem to handle changes to db
> > data. Here is a PR
> > <https://github.com/openSUSE/open-build-service/pull/3701>
> > I have opened where I need to update each notification's event_payload
> > from YAML to JSON serialisation. This
> > doesn't make sense to do in a normal migration because it has nothing
> > to do with database structure only the
> > content of the database. This gem allows us to handle such changes in
> > the same way we handle normal
> > migrations.

well, we used to do that also with current migrations?

Eg. check new issue tracker entries or new attributes.

> > The alternative is to use rake tasks for data changes which is what I
> > initially did in the PR in this commit
> > <https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c2b9930976b25cff7c07e0002a440a>.
> >
> > However the downsides of that are that we need to always update the
> > README's to let updaters know exactly
> > which rake tasks need to be run to update their db and it also makes
> > it easier to sync data changes with
> > other developers. All thats needed to get your database up to date
> > with the gem is to run `rake db:migrate:with_data`.

I don't get this exactly, does this mean that there is no single
command to update all data to current state?

You always have to know lot's of special commands when updating
from 2.8 to 2.9 documented in the README files?

Does this also mean that it breaks our update tests in CI and auto
deployment?

I am strongly against this in that case....

Or do I miss something here?

bye
adrian


> >
> > If you have any opinions please post on the PR to continue this
> > discussion.
> >
> > Thanks
> >
> >
> I'm fine with it and like the idea! Go ahead :)
>
> Christian
>


--

Adrian Schroeter
email: [hidden email]

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
 
Maxfeldstraße 5                        
90409 Nürnberg
Germany


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Evan Rolfe

Adrian, thanks for raising these concerns. Let me just explain a use case of data migrations:

We have an event_payload column which is serialised in YAML in the notifications table, we need to change this to be serialised in JSON to fix issue #3638 .

To do this we need a script to convert the serialisation of existing notifications from YAML to JSON, but we only want to run this once. We could create a rake task to do it, but then there is the problem that we need to add that rake task to the update guide so upgrading becomes more complicated and also all other developers will need to run this rake task to get their database up to date too.

The other option is to use a normal database migration to handle the conversion from yaml to json. However then we would have a database migration which does not change the database structure at all. Also it means that we have to include the database structure changes (which require downtime) with the data changes (which might not require downtime) so for some data changes which might take a long time (i.e. > 1hour) that would be difficult to deploy if there were also database migrations that needed running. (This is precisely the case when changing the serialisation of project_log_entires, the script to do that will take at least a couple hours because there are ~3million rows in the project_log_entries table).

See this stackoverflow: Rails migration: only for schema change or also for updating data?

On 05/09/17 16:42, Adrian Schröter wrote:

I don't get this exactly, does this mean that there is no single
command to update all data to current state?
Please keep in mind the distinction between database and data, in this context by "database" i mean the database structure. And "data" means the content of the rows in tables. So the command to get the database up to date is (as it always has been) `rake db:migrate`. This gem now gives us a new command to get the data up to date (which we didn't have before) which is `rake data:migrate`.

You always have to know lot's of special commands when updating 
from 2.8 to 2.9 documented in the README files?
No, thats one of the main reasons to use this gem, it gives you the `rake data:migrate` so that you don't have to know lots of special commands when updating. It even gives you this rake task which performs both the database and data migrate commands:

`rake db:migrate:with_data`

Does this also mean that it breaks our update tests in CI and auto
deployment?
It won't break any CI stuff because this is only for existing obs instances with populated databases. If you're creating a new obs instance from scratch then this rake task is not necessary. I'm not sure what "auto deployment" is but the deployment process in the wiki will need to be changed to make sure that we run `rake db:migrate:with_data` instead of just running `rake db:migrate`.
-- 
Evan Rolfe
Full Stack Web Developer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Adrian Schröter
On Mittwoch, 6. September 2017, 10:02:25 CEST wrote Evan Rolfe:

> Adrian, thanks for raising these concerns. Let me just explain a use
> case of data migrations:
>
> We have an event_payload column which is serialised in YAML in the
> notifications table, we need to change this to be serialised in JSON to
> fix issue #3638
> <https://github.com/openSUSE/open-build-service/issues/3638> .
>
> To do this we need a script to convert the serialisation of existing
> notifications from YAML to JSON, but we only want to run this once. We
> could create a rake task to do it, but then there is the problem that we
> need to add that rake task to the update guide so upgrading becomes more
> complicated and also all other developers will need to run this rake
> task to get their database up to date too.
>
> The other option is to use a normal database migration to handle the
> conversion from yaml to json. However then we would have a database
> migration which does not change the database structure at all. Also it
> means that we have to include the database structure changes (which
> require downtime) with the data changes (which might not require
> downtime) so for some data changes which might take a long time (i.e. >
> 1hour) that would be difficult to deploy if there were also database
> migrations that needed running. (This is precisely the case when
> changing the serialisation of project_log_entires, the script to do that
> will take at least a couple hours because there are ~3million rows in
> the project_log_entries table).

Okay, but I don't see this mixture as a big problem, because

* Updaters from OBS 2.8.x to 2.9.x need a down time anyway

* Updaters like us, who run on git master should follow the migrations
  and understand the nature. We can still run this migration in parallel
  and without downtime. We just need to ensure that there isn't another
  migration afterwards, which requires a downtime, right?

Btw, we used to do even structural changes without downtime, if we know
that the old code won't cause problems (eg. when just adding a new column
which does not harm).

On the other side, I would like to keep to steps for updating as small
as possible to the user.

So I am still very much in favor to do this with our standard migrations,
if you don't see a problem in my points above.

good morning :)
adrian

> See this stackoverflow: Rails migration: only for schema change or also
> for updating data?
> <https://stackoverflow.com/questions/19387440/rails-migration-only-for-schema-change-or-also-for-updating-data>
>
> On 05/09/17 16:42, Adrian Schröter wrote:
>
> > I don't get this exactly, does this mean that there is no single
> > command to update all data to current state?
> Please keep in mind the distinction between database and data, in this
> context by "database" i mean the database structure. And "data" means
> the content of the rows in tables. So the command to get the database up
> to date is (as it always has been) `rake db:migrate`. This gem now gives
> us a new command to get the data up to date (which we didn't have
> before) which is `rake data:migrate`.
>
> > You always have to know lot's of special commands when updating
> > from 2.8 to 2.9 documented in the README files?
> No, thats one of the main reasons to use this gem, it gives you the
> `rake data:migrate` so that you don't have to know lots of special
> commands when updating. It even gives you this rake task which performs
> both the database and data migrate commands:
>
> `rake db:migrate:with_data`
>
> > Does this also mean that it breaks our update tests in CI and auto
> > deployment?
> It won't break any CI stuff because this is only for existing obs
> instances with populated databases. If you're creating a new obs
> instance from scratch then this rake task is not necessary. I'm not sure
> what "auto deployment" is but the deployment process in the wiki will
> need to be changed to make sure that we run `rake db:migrate:with_data`
> instead of just running `rake db:migrate`.
>
>


--

Adrian Schroeter
email: [hidden email]

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
 
Maxfeldstraße 5                        
90409 Nürnberg
Germany


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Evan Rolfe
On 08/09/17 08:24, Adrian Schröter wrote:

> We just need to ensure that there isn't another
>    migration afterwards, which requires a downtime, right?

This is the part that concerns me, how do we ensure that? If you have a
data-migration which takes 3 hours but is lumped in with the other
database migrations how are we going to ensure that the 3 hour
data-migration is run without downtime but the other migrations are run
with downtime?

I would also be open to using a rake task for the data migration if you
would prefer over using a third party gem to handle this?

--
Evan Rolfe
Full Stack Web Developer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Adrian Schröter
On Montag, 11. September 2017, 11:53:33 CEST wrote Evan Rolfe:

> On 08/09/17 08:24, Adrian Schröter wrote:
>
> > We just need to ensure that there isn't another
> >    migration afterwards, which requires a downtime, right?
>
> This is the part that concerns me, how do we ensure that? If you have a
> data-migration which takes 3 hours but is lumped in with the other
> database migrations how are we going to ensure that the 3 hour
> data-migration is run without downtime but the other migrations are run
> with downtime?
>
> I would also be open to using a rake task for the data migration if you
> would prefer over using a third party gem to handle this?

I don't mind the rubygem, just the additional needed step.

Could we run the data modifications also always when "db:migrate"
is called?

That way you can opt-in to do data changes only, but we don't need
to teach people yet another command to run on next OBS version update.

--

Adrian Schroeter
email: [hidden email]

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
 
Maxfeldstraße 5                        
90409 Nürnberg
Germany


--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Evan Rolfe
On 11/09/17 12:43, Adrian Schröter wrote:

> Could we run the data modifications also always when "db:migrate"
> is called?
Yes I'm sure thats possible but it would defeat the purpose of having
data migrations separate from database migrations.
> That way you can opt-in to do data changes only, but we don't need
> to teach people yet another command to run on next OBS version update.
I don't think we can avoid adding another command to the OBS version
update process. If we want to update a table which has ~3million rows
(project_log_entries) then the script to do that will at least an hour.
So as I see it we have these three options (maybe there are alternatives?):

1. Include the script in a normal rails migration
     => Downside: the updaters will have the extra step of making sure
that this particular migration is run without downtime (I dont know how
that would even work?).

2. We use the data_migration gem
     => Downside: the updaters have to run a second command `rake
data:migrate`

3. We use a rake task
     => Downside: the updaters have to run a rake task, and any future
data changes will also require new rake tasks so there are potentially
many more steps involved for updating as opposed to just one step: `rake
data:migrate`.

I don't see how #1 is going to work but if you have an idea then I would
be open to that too.

--
Evan Rolfe
Full Stack Web Developer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)

--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Henne Vogelsang-2
Hey,

On 11.09.2017 13:57, Evan Rolfe wrote:
> On 11/09/17 12:43, Adrian Schröter wrote:
>
>> Could we run the data modifications also always when "db:migrate"
>> is called?
> Yes I'm sure thats possible but it would defeat the purpose of having
> data migrations separate from database migrations.

The point is defaults. So far the default was that every kind of
migrations was lumped together. So we have two options

1. Change the default, make our Users aware of rake db:migrate:with_data

2. Don't change the default, make `db:migrate` do what
   `db:migrate:with_data` does. Make another rake task that does the
   same as `db:migrate` for people who want to untangle migration kinds.

I'm sure you can already guess what I really like to avoid ;-)

Henne

--
Henne Vogelsang
http://www.opensuse.org
Everybody has a plan, until they get hit.
        - Mike Tyson
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Björn Geuken
In reply to this post by Evan Rolfe
On 09/04/2017 12:13 PM, Evan Rolfe wrote:

> Proposal to use data_migrate gem for API
>
> Hi all, I would like to propose that we use the data_migrate
> <https://github.com/ilyakatz/data-migrate> gem to handle changes to db
> data. Here is a PR
> <https://github.com/openSUSE/open-build-service/pull/3701>
> I have opened where I need to update each notification's event_payload
> from YAML to JSON serialisation. This
> doesn't make sense to do in a normal migration because it has nothing to
> do with database structure only the
> content of the database. This gem allows us to handle such changes in
> the same way we handle normal
> migrations.
>
> The alternative is to use rake tasks for data changes which is what I
> initially did in the PR in this commit
> <https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c2b9930976b25cff7c07e0002a440a>.
>
> However the downsides of that are that we need to always update the
> README's to let updaters know exactly
> which rake tasks need to be run to update their db and it also makes it
> easier to sync data changes with
> other developers. All thats needed to get your database up to date with
> the gem is to run `rake db:migrate:with_data`.
>
> If you have any opinions please post on the PR to continue this discussion.
>
> Thanks
>

Sounds good to me. Let's try it out!

Björn

>
> --
> Evan Rolfe
> Full Stack Web Developer
> SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
> Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
> Graham Norton, HRB 21284 (AG Nürnberg)
>


--
Björn Geuken - Rails Developer - Open Build Service
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Björn Geuken
In reply to this post by Evan Rolfe
On 09/11/2017 11:53 AM, Evan Rolfe wrote:

> On 08/09/17 08:24, Adrian Schröter wrote:
>
>> We just need to ensure that there isn't another
>>    migration afterwards, which requires a downtime, right?
>
> This is the part that concerns me, how do we ensure that? If you have a
> data-migration which takes 3 hours but is lumped in with the other
> database migrations how are we going to ensure that the 3 hour
> data-migration is run without downtime but the other migrations are run
> with downtime?

Well, since we have to run the migrations manually, we could just run
them independently. Assuming the migrations themself don't depend on
each other.

This might not be an option for other people that host OBS. Though they
are probably not affected as much by this problem as we are by hosting a
public service.

Björn

>
> I would also be open to using a rake task for the data migration if you
> would prefer over using a third party gem to handle this?
>


--
Björn Geuken - Rails Developer - Open Build Service
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg)
--
To unsubscribe, e-mail: [hidden email]
To contact the owner, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Proposal to use data_migrate gem for API

Evan Rolfe
In reply to this post by Evan Rolfe

Just for the record, we've agreed in a meeting that we will keep database structure migrations separate from database content migrations, the latter of which will be handled by the data_migrate gem. This means that we will need to run `rake db:migrate:with_data` instead of just `rake db:migrate` so we will need to update the appliance to run that command too.


On 04/09/17 11:13, Evan Rolfe wrote:

Proposal to use data_migrate gem for API

Hi all, I would like to propose that we use the data_migrate gem to handle changes to db data. Here is a PR
I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This
doesn't make sense to do in a normal migration because it has nothing to do with database structure only the
content of the database. This gem allows us to handle such changes in the same way we handle normal
migrations.

The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit.

However the downsides of that are that we need to always update the README's to let updaters know exactly
which rake tasks need to be run to update their db and it also makes it easier to sync data changes with
other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.

If you have any opinions please post on the PR to continue this discussion.

Thanks


-- 
Evan Rolfe
Full Stack Web Developer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)

-- 
Evan Rolfe
Full Stack Web Developer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-0; Fax: +49-911-7417755;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)