~maas-committers/maas/+git/temporal:shivam/backlog-count-updated

Last commit made on 2024-05-21
Get this branch:
git clone -b shivam/backlog-count-updated https://git.launchpad.net/~maas-committers/maas/+git/temporal

Branch merges

Branch information

Name:
shivam/backlog-count-updated
Repository:
lp:~maas-committers/maas/+git/temporal

Recent commits

c8ce8ec... by Shivam <email address hidden>

Made approximateBacklogCounter available via legacy API (#5934)

## What changed?
- `approximateBacklogCounter` to be available via legacy
`DescribeTaskQueue` API
- Legacy `BacklogCountHint` is removed. I propose that the new
`approximateBacklogCounter` is a better estimate for task backlog and
has been replaced.
- Functional test checking if the in-memory backlog counter is updated
as expected
- Updated unit tests which were dealing with the legacy API

## Why?
- It is better to emit the more accurate backlog counter, which in this
case, would be `approximateBacklogCounter`.

## How did you test it?
- Current CI/CD tests
- Functional test added

## Potential risks
None since this is merging into my own feature branch.

## Is hotfix candidate?
No

4c03b85... by Shivam <email address hidden>

SQL Support for Backlog Counter (#5915)

## What changed?
Support for the backlog counter, which was added in #5593, results in
Cassandra persisting the backlog counts when tasks are created. However,
the same is not being done by SQL databases as we don't persist task
queue related information while creating tasks. Thus, if we fetch
information immediately after we have created tasks, the backlog counter
for SQL databases would read `0` even when we do have tasks in our
backlog.

To fix this, a Count(*) query has been implemented whenever we fetch
task queues.

## Why?
This is required for the backlog counter accurate for both Postgres and
SQL databases.

## How did you test it?
- Integration tests testing the functionality have been added. These
verify if things on the postgres end also work as expected.
- Existing suite of test cases.

## Potential risks
None, since the backlog counter work has not been released yet.

## Is hotfix candidate?
No

9c924c6... by Shivam <email address hidden>

Unit test for approximateBacklogCounter with mocked taskManager (#5773)

## What changed?
Added unit tests validating the backlog counter under various scenarios:

- Adding tasks by a single worker
- Completing tasks by a single worker
- Workers polling and adding tasks simultaneously (concurrency)
- Resetting the backlog counter if it does undercount (particularly
useful when Cassandra TTL's a task and the backlog Manager has no idea
about it)
- DB failures which lead to TaskQueuePartitionManagers being unloaded.

- **NOTE**: Base branch is not updated since the the previously open
[PR](https://github.com/temporalio/temporal/pull/5767) has not been
merged. For the purpose of this PR, please only review the following
files:

1. `service/matching/backlog_manager_test.go`
2. `service/matching/matching_engine_test.go`

## Why?
To ensure that the counter works as expected.

## How did you test it?
Tests itself were added.

## Potential risks
None

## Is hotfix candidate?
No

d1af1c2... by Shivam <email address hidden>

Added GetBacklogInfo to allow frontend to view the Backlog counts. (#5767)

## What changed?
- Added `GetBacklogInfo` to allow backlog count to be present for each
partition when `DescribeTaskQueue` is called.
- Added logic to aggregate backlog counts from different partitions of
the same task queue.
- Added functional test for `DescribeTaskQueue` to verify if backlog
values are being returned.

- **Note**: The base feature PR for `approximateBacklogCounter`, prior
to this PR, was
[shivam/backlog-count](https://github.com/temporalio/temporal/tree/shivam/backlog-count).
However, after the most recent push of `versioning-2` to `main`, a lot
of conflicts exist between itself and `main`. Moreover, the current
branch in consideration is branched of the latest changes in main. Thus,
in the interest of time, conflicts were resolved by creating a new base
feature branch (now called `shivam/backlog-count-updated`) which is
branched off the latest main and consists of all the changes that were
present in the previous feature branch (shivam/backlog-count).

- **NOTE**: The functional tests for `sql` are expected to fail since
this feature will not currently work for `sql` databases as the backlog
counter is not persisted when tasks are created in sql. Work for that
will follow.

## Why?
Previous PR's went over the addition of the `approximateBacklogCounter`
as an in-memory variable. While unit and integration tests are present
to check its functionality, functional tests are required to check if it
is present accurately in request/response structures.

## How did you test it?
Functional tests were added.

67792fc... by Shivam <email address hidden>

Adding support for approximateBacklogCounter (#5593)

This PR intends on adding an in-memory variable, called
`approximateBacklogCounter`, which shall be used to approximate the size
of the backlog at a given moment.

The `approximateBacklogCounter` follows a tasks-to-be-read semantics, in
the sense that if a task exists in the db it is counted as part of the
backlog.

As of now, it does not take care about fanning out this variable to
child partitions.

**NOTE**: This PR shall ensure that the `approximateBacklogCounter` will
be an overestimate of the backlog **only** in the case of Cassandra.
This is because creation of tasks in Cassandra results in updating the
`taskQueueInfo` which in turn updates the backlog counter. More work has
to be done in the case of `SQL` and shall be addressed in a later PR.

Unit testing with more to come.
Functional testing in the near future.

No.

---------

Co-authored-by: ShahabT <email address hidden>

3bb9570... by Shivam <email address hidden>

Moved max_read_level inside of db (#5675)

This PR moves `max_read_level` inside of db so that `db` becomes the
single source of truth for this variable.

While working on #5593, we realized there were concurrency issues, such
as deadlock, which were spinning up when `db` tried to read
`max_read_level` for resetting the `approximateBacklogCounter`. Thus,
moving `max_read_level` inside of db will solve that issue and shall
also serve towards the purpose of making `db` the single source of truth
for most shared resources in the near future.

Current unit tests.

None

No

33e4c6b... by Carly de Frondeville <email address hidden>

redo #5707: define versioning rule errors and improve unit test coverage (#5721)

## What changed?
- Define named version rule errors
- Add unit tests for errors that were previously untested

## Why?
So that the error messages can be standardized when they are used
repeatedly, and so that it is easier to tell whether all errors are
covered by the unit tests

## How did you test it?
Unit tests

75b8cc9... by Stephan Behnke <email address hidden>

Remove WorkflowIdConflictPolicy feature flag (#5715)

## What changed?
<!-- Describe what has changed in this PR -->

Removed feature flag for WorkflowIdConflictPolicy.

## Why?
<!-- Tell your future self why have you made these changes -->

The feature flag only existed to make sure we address the internal
billing changes first; which are now complete.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

Existing tests.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

The feature is fully tested; but there is no SDK support yet.

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

d45f098... by Stephan Behnke <email address hidden>

WorkflowIdConflictPolicy for Batch operation (#5709)

## What changed?
<!-- Describe what has changed in this PR -->

Set the WorkflowIdConflictPolicy for batch operation.

## Why?
<!-- Tell your future self why have you made these changes -->

To be consistent with the StartWorkflow operation handler and ensure
that the policy is always set.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

This only makes the implicit behavior explicit.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

No, because the History Service is setting the default if it's not set
(but this will be removed eventually):
https://github.com/temporalio/temporal/pull/5701

0f7c9f7... by Rodrigo Zhou <email address hidden>

Add root workflow execution to visibility (#5697)

## What changed?
<!-- Describe what has changed in this PR -->
Add root workflow execution to Visibility

## Why?
<!-- Tell your future self why have you made these changes -->
Follow up of https://github.com/temporalio/temporal/pull/5520.
Be able to list all workflows based on the root execution.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
Modified existing tests to check the root workflow execution.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
No.

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
No.