~maas-committers/maas/+git/temporal:remove-shard-lock-v2

Last commit made on 2023-11-22
Get this branch:
git clone -b remove-shard-lock-v2 https://git.launchpad.net/~maas-committers/maas/+git/temporal

Branch merges

Branch information

Name:
remove-shard-lock-v2
Repository:
lp:~maas-committers/maas/+git/temporal

Recent commits

8953f6c... by Yichao Yang <email address hidden>

Merge branch 'main' into remove-shard-lock-v2

18c5daf... by Anton Romanovich <email address hidden>

Add the ability to implement and test custom advanced visibility stores (#4871)

**What changed?**

Two things:
* `common/persistence/tests/visibility_persistence_suite_test.go`
renamed to `visibility_persistence_suite.go`, so that its
`VisibilityPersistenceSuite` could be imported and run by the
third-party package;
* `VisibilityStoreFactory` interface introduced. The server now accepts
`customVisibilityStoreFactory` and invokes it if visibility store
configuration points to `CustomDataStoreConfig`.

**Why?**

To be have an ability to implement and test custom advanced visibility
stores.

**How did you test it?**
Ran Temporal tests.
Also built and ran a Temporal server with
`temporal.WithCustomVisibilityStoreFactory(mypackage.NewMyVisibilityStoreFactory())`
option passed.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
None.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
No.

a34d820... by Norbert Hu <email address hidden>

Include WorkflowID & RunID when initializing branch tokens (#5152)

<!-- Describe what has changed in this PR -->
**What changed?**

<!-- Tell your future self why have you made these changes -->
**Why?**

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

c80258c... by Michael Snowden <email address hidden>

Fix race conditions in DLQ integration tests (#5150)

<!-- Describe what has changed in this PR -->
**What changed?**
I made all the variables that are accessed by multiple goroutines atomic
in all the DLQ functional tests.

<!-- Tell your future self why have you made these changes -->
**Why?**
Initially, I left these as non-atomic types because I assumed that there
would be no workflows running before the tests were started, but that's
not true because, as soon as the server starts, there could be system
workflows running like the history scanner, and that means they could
read variables that our overriden task executor uses while the test
itself is writing them.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Initially, the only race condition detected was in `add_tasks_test`, but
I went through the other DLQ integration tests because they have similar
implementations. I didn't run something like `go test -count 1000 ...`
because that doesn't work with our functional tests for some reason.
Something about the rapid restarts fails because there's another race
condition with shutdown.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

84b5b43... by Norbert Hu <email address hidden>

Propagate fork request raw branch token to persistence layer (#5151)

<!-- Describe what has changed in this PR -->
**What changed?**

<!-- Tell your future self why have you made these changes -->
**Why?**

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

d0c3435... by Michael Snowden <email address hidden>

Propagate ctx throughout tdbg commands (#5149)

<!-- Describe what has changed in this PR -->
**What changed?**
This PR makes us use the `context.Context` provided to the CLI command,
if one is provided, and it also makes our tests provide one.

<!-- Tell your future self why have you made these changes -->
**Why?**
This way, tests can drive timeouts and cancellations when they run
`tdbg` commands.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
I just looked for references to `app.Run` in our code, and I switched
them to `RunContext` wherever it was applicable (not needed for things
that aren't I/O related like `tdbg decode`.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

1aa9e85... by Michael Snowden <email address hidden>

Don't enable system scanners in functional tests (#5148)

<!-- Describe what has changed in this PR -->
**What changed?**
I turned off the history and task queue scanners for all the tests that
use `FunctionalTestBase`.

<!-- Tell your future self why have you made these changes -->
**Why?**
None of the tests we run rely on them, they make extra overhead, and
they also trigger some race conditions in several tests that override
our task executors. Those race conditions will be fixed separately
because those tests shouldn't depend on running in an environment with
no system workflows, though.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
I reran the tests/dlq_test which was triggering a race condition, slept
for 5 seconds, and verified that no other workflow tasks were sent to
the task executor except for the ones for the DLQ test itself.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
This will mean we're testing the scanners themselves less, but they're
already run as part of the `tests/server_test`, and they have their own
unit tests, so I'm not too concerned.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

4567c56... by Michael Snowden <email address hidden>

Fix matching read-level race condition (#5142)

<!-- Describe what has changed in this PR -->
**What changed?**
I fixed a small and extremely rare race condition in matching.

<!-- Tell your future self why have you made these changes -->
**Why?**
This was causing test failures e.g.
https://buildkite.com/temporal/temporal-public/builds/13342#018bb7f8-c04f-4f91-b12b-ca4ce6336c1b

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
```
go test ./service/matching -run TestMatchingEngineSuite/TestAddTaskAfterStartFailure -count 100000
```

Rationale for why this happens is that there is a goroutine reading from
`w.maxReadLevel`, and it may or may not read from it before we write to
it if we wake it up with the w.sendWriteResponse call before setting the
read level.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**

---------

Co-authored-by: David Reiss <email address hidden>

50789df... by Rodrigo Zhou <email address hidden>

Fix set up search attributes in secondary SQL visibility (#5143)

<!-- Describe what has changed in this PR -->
**What changed?**
Fix setting up pre-allocated search attributes when a secondary SQL
visibility is added.

<!-- Tell your future self why have you made these changes -->
**Why?**
PR https://github.com/temporalio/temporal/pull/4531 removed code that
did this.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Started single SQL visibility, and then added secondary SQL visibility.
Checked cluster metadata, it looks right.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
No.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
No.

bea663d... by Rodrigo Zhou <email address hidden>

Add parent execution search attributes to ES mappings (#5137)

<!-- Describe what has changed in this PR -->
**What changed?**
Add parent execution search attributes to ES mappings

<!-- Tell your future self why have you made these changes -->
**Why?**
PR https://github.com/temporalio/temporal/pull/5054 modified the SQL
schemas to add parent execution to visibility. This one does for ES.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
Unit Tests.

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
No.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
No.