lp:~rakhmerov/mistral/master

Created by Renat Akhmerov on 2013-11-06 and last modified on 2017-10-16
Get this branch:
bzr branch lp:~rakhmerov/mistral/master

Related bugs

Related blueprints

Branch information

Owner:
Renat Akhmerov
Project:
Mistral
Status:
Development

Import details

Import Status: Reviewed

This branch is an import of the HEAD branch of the Git repository at https://github.com/stackforge/mistral.git.

The next import is scheduled to run in 3 hours.

Last successful import was 2 hours ago.

Import started 2 hours ago on pear and finished 2 hours ago taking 25 seconds — see the log
Import started 8 hours ago on pear and finished 8 hours ago taking 25 seconds — see the log
Import started 14 hours ago on pear and finished 14 hours ago taking 25 seconds — see the log
Import started 20 hours ago on pear and finished 20 hours ago taking 30 seconds — see the log
Import started on 2017-10-16 on pear and finished on 2017-10-16 taking 30 seconds — see the log
Import started on 2017-10-16 on pear and finished on 2017-10-16 taking 20 seconds — see the log
Import started on 2017-10-16 on pear and finished on 2017-10-16 taking 30 seconds — see the log
Import started on 2017-10-15 on pear and finished on 2017-10-15 taking 25 seconds — see the log
Import started on 2017-10-15 on pear and finished on 2017-10-15 taking 25 seconds — see the log
Import started on 2017-10-15 on pear and finished on 2017-10-15 taking 30 seconds — see the log

Recent revisions

1847. By Zuul <email address hidden> on 2017-10-16

Merge "Revert "Enable eventlet monkey patching for MySQLdb driver""

1846. By Zuul <email address hidden> on 2017-10-16

Merge "Add yaml and json parsing functions"

1845. By Mike Fedosin <email address hidden> on 2017-10-16

Optimize mistral queries for 'get_task_executions'

During workflow executions there are a lot of calls of function
'find_task_executions_by_name' from lookup_utils. This function
internally calls 'get_task_executions' with two filters: name and
workflow_execution_id.

To optimize this call it's reasonable to add a composite db index
by these two fields.

Another optimization is to explicitely disable sorting by 'created_at',
because it's unnecessary in this particulat case.

Change-Id: Ife150a384575cc4ad2fe45dc093816b48268df6f

1844. By Jenkins <email address hidden> on 2017-10-14

Merge "Enable eventlet monkey patching for MySQLdb driver"

1843. By Jenkins <email address hidden> on 2017-10-13

Merge "[Event-engine] Make listener pool name configurable"

1842. By Jenkins <email address hidden> on 2017-10-13

Merge "Updated from global requirements"

1841. By Jenkins <email address hidden> on 2017-10-13

Merge "Use get_rpc_transport instead of get_transport"

1840. By Renat Akhmerov on 2017-10-09

Make scheduler delay configurable

* Made scheduler delay configurable. It now consists of a fixed
  part configured with the 'fixed_delay' property and a random
  addition limited by the 'random_delay' config property.
  Because of this, using loopingcall from oslo was replaced with
  a regular loop in a separate thread becase loopingcall
  supports only fixed delays.

Closes-Bug: #1721733
Change-Id: I8f6a15be339e208755323afb18e4b58f886770c1

1839. By Renat Akhmerov on 2017-10-09

Optimize sending result to parent workflow

* If a subworkflow completes it sends its result to a parent
  workflow by using the scheduler (delayed call) which operates
  through the database and has a delay between iterations.
  This patch optimizes this by reusing already existing
  decorator @action_queue.process to make RPC calls to convey
  subworkflow results outside of a DB transaction, similar
  way as we schedule action runs after completion of a task.
  The main reason for making this change is how Scheduler now
  works in HA mode. In fact, it doesn't scale well because
  every Scheduler instance keeps quering DB for delayed calls
  eligible for processing and hence in HA setup many Schedulers
  take same delayed calls often and clash between each other
  causing DB deadlocks in mysql. They are caused just by mysql
  locking model (it's documented in their docs) so we have
  means to handle them. However, Scheduler still remans a
  bottleneck in the system and it's better to reduce the load
  on it as much as possible.
  One more reason to make this change is that we don't solve
  the problem of eleminating the possibility to loose RPC
  messages (when a DB TX is committed and RPC calls is not made
  yet) with Scheduler anyway. If we use Scheduler for scheduling
  RPC calls we just shift the place where we can unsync DB and
  MQ to the Scheduler. So, in other words, it is a fundamental
  problem of syncing two external data sources which can't be
  naturally enrolled into one distributed transaction.
  Based on our experience or running big workflows we concluded
  that simplication of network protocols gives better results,
  meaning that the less components we use for network
  communications the better. Eventually it increases performance
  and reduces the load on the system and also reduces the
  probability of having DB and MQ out of sync.
  We used to use Scheduler for running actions on executors too by
  scheduling RPC calls but at some point we saw that it reduces
  performance on 40-50% without bringing any real benefits at
  this expense. The opposite way, Scheduler was even a worse
  bottleneck because of this. So we decided to eliminate the
  Scheduler from this chain and the system became practically
  much more performant and reliable. So now I did the same
  with delivering a subworkflow result.
  I believe when it comes to recovering from situations of
  DB and MQ being out of sync we need to come up with special
  tools that will assume some minimal human intervention
  (although I think we can recover some things automatically).
  Such a tool should just make it very obvious what's broken
  and how to fix it, and make it convenient to fix it (restart
  a task/action etc.).
* Processing action queue now happens within a new greenthread
  because otherwise Mistral engine can get into a deadlock
  by sending a request to itself while processing another one.
  It can happen if we use blocking RPC which is the only option
  for now.
* Other small fixes

Change-Id: Ic3cf6c47bba215dc6a13944b0585cce59e4e88f9

1838. By Jenkins <email address hidden> on 2017-10-06

Merge "Mistral fails on RabbitMQ restart"

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
This branch contains Public information 
Everyone can see this information.

Subscribers