CloudWatch: fix `test_put_metric_alarm` flakiness by bentsku · Pull Request #13851 · localstack/localstack

bentsku · 2026-02-26T12:28:57Z

Motivation

We have been seeing test_put_metric_alarm being flaky for a long while now. This has been a bit exacerbated by the fact that we run 3 versions of the test for each protocol CloudWatch supports (query, json and cbor)

I tracked down the issue to be because the way we collected SQS messages, and a rare condition that could happen:

If you put the metrics at the same time the alarm scheduled task which fetch those metrics, you might end up with an alarm state that is "OK", because it only picked up half the metrics. This is fine, it somewhat means the alarm was executed before those metrics were put. When CloudWatch re-run the alarm scheduler task, this time it will properly trigger the alarm on the second time. But our SQS snapshot helper was only fetching the first message and not deleting it, so it would stay with the "OK" message notification and fail, even though the right ALARM message was there, as shown by the logs:

Here are the full failing log, from this run:

This file contains the only relevant part, detailed under:
sqs-cloudwatch.txt

What the flow is:
-> PutMetrics with 2 metrics
-> Scheduler fetches the metrics, fetches only the first metric (executed at the same time, this is fine)
-> Alarm state is then OK
-> Publish OK notification
-> Helper calls SQS ReceiveMessage continuously
-> Scheduler fetches the metrics, triggers the Alarm
-> Alarm state is now ALARM
-> Publish ALARM notification
-> The helper is only fetching the first message, does not delete it, and never sees the ALARM notification
-> Test fails

Changes

update the cloudwatch snapshot helper to clean up SQS messages
add some comments with one case I didn't see in the snapshots
remove some parametrization for long running tests, those areas are already covered by other tests so no need to run all 3 protocols for them

Tests

Test Results - Preflight, Unit

23 070 tests 21 179 ✅ 6m 9s ⏱️
1 suites 1 891 💤
1 files 0 ❌

Results for commit 565a93f.

github-actions · 2026-02-26T12:47:06Z

Test Results - Alternative Providers

176 tests 39 ✅ 2m 30s ⏱️
1 suites 137 💤
1 files 0 ❌

Results for commit 565a93f.

github-actions · 2026-02-26T12:47:20Z

Test Results (amd64) - Acceptance

7 tests 5 ✅ 3m 2s ⏱️
1 suites 2 💤
1 files 0 ❌

Results for commit 565a93f.

github-actions · 2026-02-26T12:59:25Z

LocalStack Community integration with Pro

2 files 2 suites 49m 9s ⏱️
1 239 tests 1 161 ✅ 78 💤 0 ❌
1 241 runs 1 161 ✅ 80 💤 0 ❌

Results for commit 565a93f.

github-actions · 2026-02-26T13:01:34Z

Test Results (amd64) - Integration, Bootstrap

5 files 5 suites 1h 5m 1s ⏱️
1 263 tests 1 187 ✅ 76 💤 0 ❌
1 269 runs 1 187 ✅ 82 💤 0 ❌

Results for commit 565a93f.

pinzon

I wasn't aware of this issue. I can see it took a lot of effort to catch this.
Thank you for fixing it 👍

remove some parametrization for long running tests, those areas are already covered by other tests so no need to run all 3 protocols for them

I agree, let's not parametrize the client in long running tests like for the alarms.

fix test_put_metric_alarm flakiness

565a93f

bentsku added this to the 2026.03 milestone Feb 26, 2026

bentsku self-assigned this Feb 26, 2026

bentsku added aws:cloudwatch Amazon CloudWatch semver: patch Non-breaking changes which can be included in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Feb 26, 2026

bentsku marked this pull request as ready for review February 26, 2026 14:03

bentsku requested review from pinzon and steffyP as code owners February 26, 2026 14:03

pinzon approved these changes Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CloudWatch: fix `test_put_metric_alarm` flakiness#13851

CloudWatch: fix `test_put_metric_alarm` flakiness#13851
bentsku wants to merge 1 commit intomainfrom
fix-cloudwatch

bentsku commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

pinzon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bentsku commented Feb 26, 2026

Motivation

Changes

Tests

Related

Uh oh!

github-actions bot commented Feb 26, 2026

Test Results - Preflight, Unit

Uh oh!

github-actions bot commented Feb 26, 2026

Test Results - Alternative Providers

Uh oh!

github-actions bot commented Feb 26, 2026

Test Results (amd64) - Acceptance

Uh oh!

github-actions bot commented Feb 26, 2026

LocalStack Community integration with Pro

Uh oh!

github-actions bot commented Feb 26, 2026

Test Results (amd64) - Integration, Bootstrap

Uh oh!

pinzon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants