~maas-committers/maas/+git/temporal:main

Last commit made on 2024-05-21
Get this branch:
git clone -b main https://git.launchpad.net/~maas-committers/maas/+git/temporal

Branch merges

Branch information

Recent commits

21b04a1... by Yichao Yang <email address hidden>

Account for state only changes in LastWriteVersion (#5872)

## What changed?
<!-- Describe what has changed in this PR -->
- Account for state only changes in LastWriteVersion
- ^ also means LastWriteVersion could change after workflow is closed.
- NOTE: this PR is stacked on top of
https://github.com/temporalio/temporal/pull/5860

## Why?
<!-- Tell your future self why have you made these changes -->
- Account for state only changes

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Existing tests & new unit tests

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

f2b6497... by Yichao Yang <email address hidden>

Refresh sub state machine tasks (#5906)

## What changed?
<!-- Describe what has changed in this PR -->
- Refresh sub state machine tasks

## Why?
<!-- Tell your future self why have you made these changes -->
- Refresh sub state machine tasks

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Added unit test

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

81e65a8... by Yichao Yang <email address hidden>

Allow updating workflow current version after close (#5860)

## What changed?
<!-- Describe what has changed in this PR -->
- Allow updating workflow current version after close

## Why?
<!-- Tell your future self why have you made these changes -->
- Workflow state machine could be updated after workflow closed from
user POV

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Existing tests & added new unit tests

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

1d9289b... by Yichao Yang <email address hidden>

Move update proto to persistence proto package (#5940)

## What changed?
<!-- Describe what has changed in this PR -->
- Move update proto to persistence proto package

## Why?
<!-- Tell your future self why have you made these changes -->
- Avoid cycle dependency when later adding proto messages defined in
persistence package (specifically, `VersionedTransition` in this case)
into UpdateInfo.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Existing test

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

711eccb... by David Reiss <email address hidden>

Move most dynamicconfig tests to external package (#5957)

## What changed?
Move most unit tests to external package, except two for an internal
helper function.

## Why?
To write tests as an client to the package would use it, which usually
produces better tests.

## How did you test it?
is tests

0bf37a0... by David Reiss <email address hidden>

Fix schedule_action_delay metric for buffered actions (#5884)

## What changed?
- Calculate "delay" for buffered actions from previous action close.
- Update a few logs.

## Why?
If an action is held up by waiting for a previous action to finish (e.g.
BufferOne, BufferAll, CancelOther overlap policies), it's not fair to
count the waiting time as the "delay" for that action.

## How did you test it?
Tested locally by building up a backlog and checking that the metric
didn't increase (and did before this change). Also tested upgrade and
downgrade.

## Potential risks
This is touching workflow code, but is compatible because:
- If it sees nil for the close time or desired time, that's fine, it
just falls back to actual time.
- All the behavior changes are in logs and metrics only.

d81c953... by Shahab Tajik <email address hidden>

Fix some flaky versioning tests (#5954)

## What changed?
<!-- Describe what has changed in this PR -->
Fixing some flaky tests.

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

797bbdf... by Tim Deeb-Swihart <email address hidden>

Reconnect to SQL databases when connections fail (#5926)

## What changed?

Both our PostgreSQL and MySQL database backends will now automatically
reconnect to the database when certain errors occur: all errors chosen
have been experienced when testing this behavior through an AWS Aurora
RDS failover of either MySQL or PostgreSQL.

For both backends we will reconnect when we see:
- `ECONNRESET`
- `ECONNABORTED`
- `ECONNREFUSED`
- `io.EOF`
- `io.ErrUnexpectedEOF`
- `database/sql/driver.ErrBadConn`

for postgres we will also reconnect on the following SQLStates:
- `25006` read-only transaction
- `57P03` cannot connect now
- `0A000` feature not supported, but ONLY when the message is `cannot
set transaction read-write mode during recovery`

for mysql we will also reconnect when we see the following error codes:
- `1040` too many connections
- `1792` read-only transaction (SQLstate `25006`)
- `1836` running in read-only mode

This logic is easily extensible should we discover more failure modes
over time

## Why?

We've had multiple community reports of Temporal problems during RDS
failover. One part of this is the fact that we wouldn't necessarily
reconnect; we were at the whims of our chosen SQL abstraction's
connection pooling logic.

## How did you test it?

I manually tested this functionality in the presence of repeated RDS
failovers:

- [x] postgres12 plugin with pq driver
- [x] postgres12 plugin with pgx driver
- [x] mysql plugin

Automated testing will be added to our regular testing pipelines once
our infrastructure friends have added the support I need (it's in
progress)

## Potential risks

We're concerned there's a correctness issue in our PostgreSQL backend
that's related to our behavior during an RDS failover. If we merge this
before I figure out what's going on, we could hide the issue and make it
harder to reproduce.

## Documentation
N/A

## Is hotfix candidate?
Yes?

ab8bf5e... by Rob Holland <email address hidden>

Allow DescribeHistoryHost requests without namespace set (#5949)

## What changed?
Exclude `DescribeHistoryHost` requests from namespace interceptor. Fixes
#5933.

## Why?
`Namespace` field is optional on the request, and it should be processed
regardless of presence or namespace state.

## How did you test it?
Tested with `tdbg history-host describe --history-address
<history-host-ip:port>`

## Is hotfix candidate?
Probably not, I don't know how common or urgently the history host
describe command is needed.

931cbb8... by Rodrigo Zhou <email address hidden>

Run tests in cloud branch (#5955)

## What changed?
<!-- Describe what has changed in this PR -->
Run tests in cloud branch

## Why?
<!-- Tell your future self why have you made these changes -->
Verify cherry picked commits in cloud branch are good.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->