Mistral

Cache of workflow spec

Bug #1738769 reported by Vitalii Solodilov on 2017-12-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mistral	Fix Released	Medium	Vitalii Solodilov	Mistral rocky-3 "r3"

Bug Description

Hi, Mistral's team.

There is some problems with a caching of Workflow spec if we have multiple instance of Mistral Engine.
For example, we execute some workflow which has 25 serial tasks. If we have only one Mistral Engine that the workflow spec will be transformed to Python object only one time. But if we have, for example, 10 Mistral Engine that the workflow spec will be transformed to Python object 10 times! https://github.com/openstack/mistral/blob/master/mistral/lang/parser.py#L214
Every transformation takes from 1 to 1.5 seconds. And it adds 12 seconds to execution time for every workflow.

There are some possible solutions:
* To use the distributed cache
* To serialize wf_v2.WorkflowSpec to the database like BLOB
* To bound the execution of workflow to one Mistral Engine
* Don't cache by execution id, cache only by workflow id. But it doesn't help in my case. I generate a workflow for every execution.

Dougal Matthews (d0ugal) on 2018-04-06

Changed in mistral:
status:	New → Triaged
importance:	Undecided → Medium
milestone:	none → rocky-1

Vitalii Solodilov (mcdoker18) on 2018-04-16

Changed in mistral:
assignee:	nobody → Vitalii Solodilov (mcdoker18)

OpenStack Infra (hudson-openstack) on 2018-04-16

Changed in mistral:
status:	Triaged → In Progress

Dougal Matthews (d0ugal) on 2018-04-20

Changed in mistral:
milestone:	rocky-1 → rocky-2

Revision history for this message

Renat Akhmerov (rakhmerov) wrote on 2018-05-17:

Vitalii, this is not a bug. It's a trade-off solution that we came to. Trade-off between using a distributed cache (which has a lot of downsides too) and not caching at all.

A spec will be parsed 10 times, yes, but 1 time per engine. Which is OK. For big workflows it's helpful anyway because caches quickly warm up on any operation that an engine performs (such as "on_action_complete") and the rest of the workflow will be running quicker.

Revision history for this message

Renat Akhmerov (rakhmerov) wrote on 2018-05-17:

Answering your points:

* To use the distributed cache

[renat]: May be if it's proven to work better (more reliably and faster). Keep in mind that a distributed cache requires interprocess communication and other things.

* To serialize wf_v2.WorkflowSpec to the database like BLOB

[renat]: how is it going to help? When we deserialize we'll still need to build a Python object from it. Fetching from DB + building an object may be more expensive in average during the workflow life cycle than build an object once (per engine) in the beginning and spend zero time after that. But again, if we can check that idea, it could help.

* To bound the execution of workflow to one Mistral Engine

[renat]: No. We can't do that for huge amount of reason (can discuss separately).

* Don't cache by execution id, cache only by workflow id. But it doesn't help in my case. I generate a workflow for every execution.

[renat]: this won't be enough. The idea is to be able to change a workflow definition any time without affecting those executions that are already running. One thing we can do here is to implement workflow definition versioning (keep all changes). In this case yes, we can cache by workflow definitions only (id + version) because execution will have a version of the definition.

Dougal Matthews (d0ugal) on 2018-06-07

Changed in mistral:
milestone:	rocky-2 → rocky-3

Revision history for this message

Vitalii Solodilov (mcdoker18) wrote on 2018-07-29:

mistral-engine-before.svg Edit (3.4 MiB, image/svg+xml)

I used a simple sampler profile to check performance benefit. Flamegraphs were attached. You can search by instantiate_spec, example http://www.brendangregg.com/blog/2015-08-11/flame-graph-search.html
In addition, I executed a simple test. I started a 10 workflow with 40 sleep(0.01 seconds) tasks which start one after another. (1 mistarl engine)
Before fix: 29 seconds.
After fix: 16 seconds.

Revision history for this message

Vitalii Solodilov (mcdoker18) wrote on 2018-07-29:

mistral-engine-after.svg Edit (2.7 MiB, image/svg+xml)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-07-31: Fix merged to mistral (master)

Reviewed: https://review.openstack.org/586883
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=4bf03d8e7df0ecc418b7069819cf3b59898c1eb8
Submitter: Zuul
Branch: master

commit 4bf03d8e7df0ecc418b7069819cf3b59898c1eb8
Author: Vitalii Solodilov <email address hidden>
Date: Sun Jul 29 18:17:51 2018 +0400

Remove extra a specification validation

    Currently when we get a specification using the instantiate_spec function,
    we always validate their schema and semantics over and over again.
    To prevent it we add new validate parameter to a Spec class.
    The validate parameter must be True when we create a workflow, workbook
    or action using a mistral-api. In all other cases, it must be False.

    Change-Id: Ia450ea9635bc75c204fe031cfeeab154f1d03862
    Closes-Bug: #1738769
    Signed-off-by: Vitalii Solodilov <email address hidden>

Changed in mistral:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-06: Fix included in openstack/mistral 7.0.0.0b3

This issue was fixed in the openstack/mistral 7.0.0.0b3 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-09: Change abandoned on mistral (master)

Change abandoned by Vitalii Solodilov (<email address hidden>) on branch: master
Review: https://review.openstack.org/555606

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.