~maas-committers/maas/+git/temporal:history-service-interceptor

Last commit made on 2023-10-27
Get this branch:
git clone -b history-service-interceptor https://git.launchpad.net/~maas-committers/maas/+git/temporal

Branch merges

Branch information

Name:
history-service-interceptor
Repository:
lp:~maas-committers/maas/+git/temporal

Recent commits

03e5d65... by Stephan Behnke <email address hidden>

Merge branch 'main' into history-service-interceptor

222bdc0... by David Reiss <email address hidden>

Share built images in CI and more sharding for functional tests (#4589)

**What changed?**
- In CI, build most of the tests and dependencies and reuse them in
subsequent steps.
- Additionally pre-build dependencies separately on a weekly schedule
(in buildkite configuration) and reuse them when possible.
- Add more test sharding for functional tests (split the main functional
test suite into three).

**Why?**
Increase parallelism, reduce latency of getting test results, reduce
granularity of retries

**How did you test it?**
lots of testing on buildkite

**Potential risks**

**Is hotfix candidate?**

1706389... by Yimin Chen <email address hidden>

Adjust action metrics (#5043)

<!-- Describe what has changed in this PR -->
**What changed?**
Adjust action metrics

<!-- Tell your future self why have you made these changes -->
**Why?**
To reflect pricing change.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Run locally and verify metrics.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
No

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

1b0ec47... by Wenquan Xing <email address hidden>

Bugfix: ImportWorkflow should always clear mutable state cache (#5039)

<!-- Describe what has changed in this PR -->
**What changed?**
* ImportWorkflow should always clear "cached" mutable state since
  * shared mutable state will at most be used once
  * shared mutable state may contain changes not persisted to DB

<!-- Tell your future self why have you made these changes -->
**Why?**
N/A

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
N/A

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
N/A

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
N/A

949b0f9... by pdoerner <email address hidden>

Refactor frontend poll wf tq (#4992)

<!-- Describe what has changed in this PR -->
**What changed?**
Removed direct calls to persistence for `PollWorkflowTaskQueue` out of
frontend service
The logic to get histories was moved to matching service
Matching service now includes history as part of its
`PollWorkflowTaskQueueResponse`
Actual behavior should be unchanged. Existing logic is kept as part of
`workflow_handler_deprecated` since it will be needed if frontend gets a
response from an older matching instance without a history.
Uses updated protos from
https://github.com/temporalio/temporal/pull/4968

<!-- Tell your future self why have you made these changes -->
**Why?**
Frontend shouldn't be responsible for doing database queries.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Existing tests and new test for error returned to matching from
persistence

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
Inconsistent handling of errors between old logic and new

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
No

a14d0ef... by Michael Snowden <email address hidden>

Add tdbg dlq merge for v2 (#5025)

<!-- Describe what has changed in this PR -->
**What changed?**
This PR adds support for `--dlq-version v2` to the `tdbg dlq merge`
command.

<!-- Tell your future self why have you made these changes -->
**Why?**
Admins need this to be able to re-enqueue tasks.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
I added an end-to-end test with a real workflow which does this:
1. Start a workflow whose WFT get DLQ'd
2. Merge that WFT back
3. Verify the workflow now succeeds.
4. Verify that the DLQ task is deleted.

I also added various unit tests for parameter validation and control
flow to get 100% test coverage.

Note that this won't work for cross-cluster tasks yet. I will add that
and update the history replication DLQ integration test as well in a
follow-up.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

9c14ce1... by Rodrigo Zhou <email address hidden>

Parse ExecutionDuration for SQL queries (#5027)

<!-- Describe what has changed in this PR -->
**What changed?**
Support duration expression (eg: "1m", "20h") in SQL queries involving
`ExecutionDuration`.

<!-- Tell your future self why have you made these changes -->
**Why?**
Better UX in advanced visibility with SQL (already supported with ES).
Raw field is in nanoseconds.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Created some workflows, and send some queries using those expression.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
No.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
No.

6a82c54... by Michael Snowden <email address hidden>

Add tdbg dlq purge for v2 (#5023)

<!-- Describe what has changed in this PR -->
**What changed?**
I added the ability to purge DLQ tasks from `tdbg`.

<!-- Tell your future self why have you made these changes -->
**Why?**

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
I added an end-to-end test which starts a workflow that produces
terminally failing tasks, verifies that it's added to the DLQ, and then
uses the new tdbg purge command to delete the DLQ tasks.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

c3d53ba... by Michael Snowden <email address hidden>

Remove ExecutableWrapper (#5033)

<!-- Describe what has changed in this PR -->
**What changed?**
I discovered an issue with the `ExecutableWrapper`
[here](https://github.com/temporalio/temporal/blob/df4705d6488485ae12e27f4cb1c719b62c980304/service/history/queues/executable.go#L451).
Because we don't wrap the `Reschedule` method, it adds the unwrapped
version to the rescheduler, which loses all of the wrapped behavior. I
implemented a fix using the "delegate" pattern, where the base instance
holds a reference to the wrapped instance, but it is fraught with peril:
you need to remember to use the delegate everywhere it's applicable, and
you need to remember to set it--also only one delegate can be set, so
you can't have different wrappers for the same object. As a result, I
decided to get rid of the wrapper-based approach, and instead added the
code directly to the base executable implementation.

<!-- Tell your future self why have you made these changes -->
**Why?**
The huge refactoring is necessary because the `ExecutableWrapper`
approach is too complicated and risky.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
There's still 100% test coverage for the DLQ code. Most imporantly,
there is now an end-to-end integration test which starts a workflow that
produces terminally failing tasks, verifies that it's added to the DLQ,
and then uses the tdbg read command to delete the DLQ tasks. This wasn't
possible before due to the bug in the `ExecutableWrapper`.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

276d610... by David Reiss <email address hidden>

Use ContinueAsNewSuggested in scheduler workflow (#4990)

**What changed?**
Scheduler workflow uses server-sent suggestion for when to
continue-as-new instead of fixed iteration count.

Note this is not enabled in this PR yet.

**Why?**
Automatically handle history size/event count too large conditions (or
any future conditions added by the server), which we might get if we do
more work than expected per iteration.

**How did you test it?**
new unit tests, also replaced for loop with previous version to verify
actual iteration count didn't change

**Potential risks**
The default history size suggestion is at 4MB, which we could hit after
just a few large payload responses, and then we'd do continue-as-new
more often than we might like.