MAAS

~maas-committers/maas/+git/temporal:ppv/wfcache

Last commit made on 2023-12-05

Get this branch:: git clone -b ppv/wfcache https://git.launchpad.net/~maas-committers/maas/+git/temporal

Branch merges

Branch information

Name:: ppv/wfcache

Repository:: lp:~maas-committers/maas/+git/temporal

Recent commits

440832d... by Prathyush PV on 2023-12-05

Fixing test

ba7e75f... by Prathyush PV on 2023-12-05

Fixing race

384e24e... by Prathyush PV on 2023-12-05

minor fixes

a61e7fb... by Prathyush PV on 2023-12-04

Min-max cache for MutableStates

881b4b4... by pdoerner <email address hidden> on 2023-11-30

Always set RPS on dynamic rate limiter refresh (#5180)

## What changed?

Always set the RPS of the dynamic rate limiter when it is refreshed

## Why?

Previously it would only apply changes made to `rateFn` when the dynamic
multiplier should be adjusted. The rate function should always be
checked to pick up any changes to the maximum RPS dynamic config.

## How did you test it?

Existing tests

## Potential risks

Extra rate limiting

## Is hotfix candidate?

aa34d01... by Yichao Yang <email address hidden> on 2023-11-30

Fix default persistence request priority (#5177)

f602a59... by Shahab Tajik <email address hidden> on 2023-11-30

Prevent forwarding and sync match when there is a backlog (#4982)

**What changed?**
- Added a config to define a threshold for *negligible* backlog based on
the age of the oldest task in the backlog. The value is configurable per
TQ, but planning to set it to 5 secs by default in prod. A backlog older
than this threshold is considered non-negligible.
- Stop forwarding tasks when there is a non-negligible backlog
- This in turn stops pollers to be forwarded in presence of a
non-negligible backlog.
- This ensures that all the partition accumulate and process backlog at
the same rate. Effectively, the backlog age diff between partitions
should not grow significantly larger than the negligibility threshold
(e.g. 5 secs).
- There is an exception to this: forwarding tasks is allowed when the
partition has not received a poll recently (configurable, default
200ms). This is to ensure that in a TQ with few pollers, tasks and
pollers are do not stuck in different partitions for a long time.
- Prevent sync match when there is a non-negligible backlog. This is
already mostly happening (i.e. history tasks are all spooled and only
synced matched in race conditions, when there is a backlog). However,
forwarded tasks are also “sync match”ed in the root partition. In effect
this change makes sure that the root partition does not dispatch
forwarded tasks when itself has a backlog. This helps further with
keeping the partitions balanced.
- Also added the following metrics to be able to monitor poll and task
dispatch behavior at TQ partition level:
- `poll_latency` with `forwarded=false/true`
- `task_dispatch_latency` with `forwarded=false/true` and
`source=History/DbBacklog`

**Why?**
With previous forwarding logic, when a backlog started to form, the root
partition was effectively receiving more pollers and was able to process
its backlog faster. This mixed up the order of dispatching tasks: a new
task could be dispatched faster by going to the root partitions while
old tasks are backlogged in other partitions. This change prevents that
situation.

**How did you test it?**
- Unit tests
- Manually verified against 9 workload+worker configurations as defined
here:
https://github.com/ShahabT/temporal-tests/blob/main/balanced-partitions/experiments.go

**Potential risks**
The change is config protected and the default value is disabled (by
setting a very high threshold). Will be enabled by default in a future
release after successful verification in prod workloads.

**Is hotfix candidate?**
No.

bbb5b2d... by ast2023 <email address hidden> on 2023-11-30

reimplement nextBackoffInterval tests to cover functional requirements (#5161)

## What changed?
Reimplemented unit-tests for nextBackoffInterval

## Why?
original tests description and functionality did not match, it what hard
to say what it was testing. This is the best readability approach I
could think of today. I tried table-based approach, but I think it was
less readable because the call to the function under test was hidden.

## How did you test it?
run tests

## Potential risks
some edge cases were not covered by tests

## Is hotfix candidate?
No

c6aef53... by Alex Shtin <email address hidden> on 2023-11-30

ScheduleToStart timeout for speculative WT on normal task queue (#5146)

**What changed?**
Add `ScheduleToStart` timeout for speculative WT on normal task queue.

**Why?**
`UpdateWorkflowExecution` API handler tries to add WT to matching
directly. This call must be made outside of workflow lock and might fail
(if matching is temporarily unavailable, for example). There is no good
way to handle this error after lock was released. This is why I
introduced this `ScheduleToStart` timeout for speculative WT on the
normal task queue. This timeout will ensure that WT will be eventually
added to matching and workflow won't stuck.

Also removed special flag which was used to skip transfer task creation
when speculative WT is converted to normal before MS with speculative WT
is persisted.

Both these speculative WT logic changes (timeout on normal and skip
transfer task) are extracted to special
`GenerateScheduleSpeculativeWorkflowTaskTasks` method.

**How did you test it?**
Added functional test.

**Potential risks**
Speculative WT times out once if there is no worker available.

**Is hotfix candidate?**
No.

691bd20... by ast2023 <email address hidden> on 2023-11-29

Mutable state approximate size should not change if we do not retry the activity (#5175)

## What changed?
Keep approximate size of mutable state when activity cannot be restarted
because of deadline exceeded or other error by nextBackoffInterval

## Why?
It is wrong behavior, the mutable state size was reduced before activity
replaced

## How did you test it?
unit tests

## Potential risks
not sure what persistence was doing with the wrong size

## Is hotfix candidate?
No

MAAS

~maas-committers/maas/+git/temporal:ppv/wfcache

Branch merges

Related source package recipes

Related snap packages

Related charm recipes

Branch information

Recent commits