Do not retry on certification errors by ashwgit · Pull Request #211 · release-engineering/cloudpub

ashwgit · 2026-06-18T04:59:23Z

@lslebodn @JAVGan PTAL. This one prevents cloudpub from retrying when there's a certification error.

Summary by Sourcery

Handle Azure certification failures as non-retriable errors and adjust error modeling accordingly.

Bug Fixes:

Prevent Azure job polling and publish operations from retrying when failures are caused by certification errors.

Enhancements:

Model Azure configure job errors as structured dictionaries and introduce helpers to detect certification-related failures.
Introduce a dedicated CertificationError type and integrate it into Azure job status and publish flows to surface clearer failure reasons.

Tests:

Extend Azure service and utils tests with fixtures and cases covering certification error detection, non-retry behavior, and updated error formats.

sourcery-ai · 2026-06-18T04:59:32Z

Reviewer's Guide

Introduce explicit handling of Azure Marketplace certification failures by detecting structured certification errors in job responses, preventing retries for these permanent failures, and updating models and tests to work with structured error objects instead of plain strings.

Sequence diagram for Azure publish flow without retries on certification errors

sequenceDiagram
    actor Caller
    participant RetryWrapper
    participant AzureService as MsAzureService
    participant AzureAPI as AzureMarketplace

    Caller->>RetryWrapper: _publish_live(product, product_name)
    RetryWrapper->>AzureService: _publish_live(product, product_name)
    AzureService->>AzureAPI: submit_to_status(product_id, status='live')
    AzureAPI-->>AzureService: ConfigureStatus(job_result='failed', errors)
    AzureService->>AzureService: is_certification_error(errors)
    alt [errors indicate certification failure]
        AzureService->>RetryWrapper: raise CertificationError
        RetryWrapper-->>Caller: propagate CertificationError (no retry)
    else [non certification failure]
        AzureService->>RetryWrapper: raise RuntimeError
        RetryWrapper->>AzureService: retry _publish_live(product, product_name)
    end

File-Level Changes

Change	Details	Files
Detect Azure certification errors from structured job error payloads and expose a helper API for callers.	Add a recursive helper _contains_certification_error to inspect error dicts and nested details for invalidState certification messages. Add is_certification_error(errors) utility that returns True when any error indicates a certification failure. Add unit test coverage for is_certification_error with certification and non-certification error inputs.	`cloudpub/ms_azure/utils.py` `tests/ms_azure/test_utils.py`
Prevent retries and raise a dedicated CertificationError when Azure jobs fail due to certification issues during status polling and publish operations.	Extend query_job_status to raise CertificationError instead of InvalidStateError when job_details.errors indicate certification failure. Update _publish_preview and _publish_live to stop joining errors as strings, use the structured errors list in failure messages, and raise CertificationError on certification failures. Configure tenacity retry decorators on _publish_preview and _publish_live to not retry when CertificationError is raised. Adjust publish preview/live failure tests to work with structured error objects and to assert behavior with and without certification errors, including no-retry semantics and expected exception types.	`cloudpub/ms_azure/service.py` `tests/ms_azure/test_service.py`
Represent Azure configure job errors as structured objects and introduce a dedicated CertificationError exception type.	Change ConfigureStatus.errors from List[str] to List[Dict[str, Any]] to align with Azure error payloads and support nested error details. Introduce CertificationError as a subclass of InvalidStateError to model permanent certification failures. Add fixtures for certification-style error payloads used across tests.	`cloudpub/models/ms_azure.py` `cloudpub/error.py` `tests/ms_azure/conftest.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

There is a leftover print("errors : ", res.errors) in _publish_preview; consider removing or replacing it with proper logging to avoid noisy stdout in production.
Now that ConfigureStatus.errors is a List[Dict[str, Any]], error messages in _publish_preview and _publish_live rely on the raw list str() representation; consider adding a small formatter to produce clearer, stable error strings instead of exposing the raw structure.
In _publish_preview you use self._raise_error(CertificationError, ...) while _publish_live directly raises CertificationError; consider using a consistent error-raising path in both methods for easier handling and logging.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- There is a leftover `print("errors : ", res.errors)` in `_publish_preview`; consider removing or replacing it with proper logging to avoid noisy stdout in production.
- Now that `ConfigureStatus.errors` is a `List[Dict[str, Any]]`, error messages in `_publish_preview` and `_publish_live` rely on the raw list `str()` representation; consider adding a small formatter to produce clearer, stable error strings instead of exposing the raw structure.
- In `_publish_preview` you use `self._raise_error(CertificationError, ...)` while `_publish_live` directly raises `CertificationError`; consider using a consistent error-raising path in both methods for easier handling and logging.

## Individual Comments

### Comment 1
<location path="cloudpub/ms_azure/service.py" line_range="201-202" />
<code_context>
         """
         job_details = self._query_job_details(job_id=job_id)
         if job_details.job_result == "failed":
             error_message = f"Job {job_id} failed: \n{job_details.errors}"
+            if is_certification_error(job_details.errors):
+                self._raise_error(CertificationError, error_message)
             self._raise_error(InvalidStateError, error_message)
</code_context>
<issue_to_address>
**issue:** Using the raw list of error dicts in the failure message may hurt readability and regress prior behavior.

`job_details.errors` is now a list of dicts, so interpolating it directly will show the raw Python dicts instead of a readable message and may change existing logs/UX. Consider converting the list to a human-readable string (e.g., extracting the `message` fields) before formatting, and align this behavior with any other call sites that include `errors` in messages.
</issue_to_address>

### Comment 2
<location path="cloudpub/ms_azure/service.py" line_range="729-735" />
<code_context>
             product.id, state="preview"
         ):
-            errors = "\n".join(res.errors)
+            errors = res.errors
+            print("errors : ", res.errors)
             failure_msg = (
                 f"Failed to submit the product {product_name} ({product.id}) to preview. "
                 f"Status: {res.job_result} Errors: {errors}"
             )
+            if is_certification_error(res.errors):
+                self._raise_error(CertificationError, failure_msg)
             raise RuntimeError(failure_msg)
</code_context>
<issue_to_address>
**issue:** The plain `print` call and raw `errors` formatting are inconsistent with the surrounding logging behavior.

This block now uses `print("errors : ", res.errors)` and includes the raw `errors` list (list of dicts) directly in the message. Please avoid `print` in this context and use the existing logging approach (e.g., `log.*`) or rely on `failure_msg`. If you need to surface structured error details, consider logging a JSON-serialized version or selected key fields instead of the full list of dicts.
</issue_to_address>

### Comment 3
<location path="cloudpub/ms_azure/service.py" line_range="735-744" />
<code_context>
                 f"Failed to submit the product {product_name} ({product.id}) to preview. "
                 f"Status: {res.job_result} Errors: {errors}"
             )
+            if is_certification_error(res.errors):
+                self._raise_error(CertificationError, failure_msg)
             raise RuntimeError(failure_msg)
</code_context>
<issue_to_address>
**suggestion:** Use of `self._raise_error` for `CertificationError` would align behavior with other failure paths.

`query_job_status` uses `self._raise_error(CertificationError, error_message)`, but here `_publish_live` calls `CertificationError` via `_raise_error` as well. To keep error handling consistent and ensure any shared logic in `_raise_error` (logging, metrics, error mapping) always applies, use `self._raise_error(CertificationError, failure_msg)` instead of raising `CertificationError` directly.

Suggested implementation:

```python
            if is_certification_error(res.errors):
                self._raise_error(CertificationError, failure_msg)
            raise RuntimeError(failure_msg)

```

```python
    def _publish_live(self, product: Product, product_name: str) -> None:
        res = self.submit_to_status(product_id=product.id, status='live')

        # Example structure of the failure handling logic that should
        # now be using `self._raise_error` for CertificationError:
        if res.job_result != 'succeeded' or not self.get_submission_state(
            product.id, state="live"
        ):
            errors = res.errors
            failure_msg = (
                f"Failed to submit the product {product_name} ({product.id}) to live. "
                f"Status: {res.job_result} Errors: {errors}"
            )
            if is_certification_error(res.errors):
                self._raise_error(CertificationError, failure_msg)
            raise RuntimeError(failure_msg)

```

I only see the beginning of `_publish_live` in your snippet, so you will need to adjust the replacement block to match your existing implementation. Specifically:

1. Locate the existing error-handling code inside `_publish_live` (likely similar to the preview path) that currently does something like:
   ```python
   if is_certification_error(res.errors):
       raise CertificationError(failure_msg)
   ```
2. Replace that `raise CertificationError(...)` with:
   ```python
   self._raise_error(CertificationError, failure_msg)
   ```
3. Ensure the `failure_msg` variable name and the condition (`res.job_result`, `get_submission_state`, etc.) match your actual implementation; the block I provided is a template to show how to integrate `_raise_error`, not a strict structural change to your business logic.

The key behavioral change is: all `CertificationError` raises in `_publish_live` should now go through `self._raise_error`.
</issue_to_address>

### Comment 4
<location path="tests/ms_azure/test_service.py" line_range="989-994" />
<code_context>
         azure_service._publish_preview.retry.sleep = mock.Mock()  # type: ignore
         expected_err = (
             f"Failed to submit the product test-product \\({product_obj.id}\\) to preview. "
-            "Status: failed Errors: failure1\nfailure2"
</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the regex assertion to verify the actual error payload rather than using a broad wildcard

In `test_publish_live_fail_on_retry`, the regex `Status: failed Errors: *` is too permissive and will match almost any error content. Since `ConfigureStatus.errors` is now a structured list of dicts, make the assertion stricter by matching the serialized errors (e.g., using `re.escape(str(err_resp.errors))`, as in the preview test) or by including a specific portion of the error structure in the pattern so regressions in formatting or type are caught.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-18T05:00:51Z

+            errors = res.errors
+            print("errors : ", res.errors)
            failure_msg = (
                f"Failed to submit the product {product_name} ({product.id}) to preview. "
                f"Status: {res.job_result} Errors: {errors}"
            )
+            if is_certification_error(res.errors):


issue: The plain print call and raw errors formatting are inconsistent with the surrounding logging behavior.

This block now uses print("errors : ", res.errors) and includes the raw errors list (list of dicts) directly in the message. Please avoid print in this context and use the existing logging approach (e.g., log.*) or rely on failure_msg. If you need to surface structured error details, consider logging a JSON-serialized version or selected key fields instead of the full list of dicts.

sourcery-ai · 2026-06-18T05:00:51Z

        expected_err = (
            f"Failed to submit the product test-product \\({product_obj.id}\\) to preview. "
-            "Status: failed Errors: failure1\nfailure2"
+            f"Status: failed Errors: {re.escape(str(err_resp.errors))}"
        )

        # Test


suggestion (testing): Tighten the regex assertion to verify the actual error payload rather than using a broad wildcard

In test_publish_live_fail_on_retry, the regex Status: failed Errors: * is too permissive and will match almost any error content. Since ConfigureStatus.errors is now a structured list of dicts, make the assertion stricter by matching the serialized errors (e.g., using re.escape(str(err_resp.errors)), as in the preview test) or by including a specific portion of the error structure in the pattern so regressions in formatting or type are caught.

JAVGan

Some minor changes please, but they will make the code future proof

JAVGan · 2026-06-22T17:13:23Z

            product.id, state="preview"
        ):
-            errors = "\n".join(res.errors)
+            errors = res.errors


You can drop this variable errors as you're not using it anywhere. I see you're directly passing res.errors to the lower functions. Or replace res.errors with errors but I do prefer to have less variables.

JAVGan · 2026-06-22T17:16:53Z

                f"Failed to submit the product {product_name} ({product.id}) to preview. "
                f"Status: {res.job_result} Errors: {errors}"
            )
+            if is_certification_error(res.errors):


@ashwgit Since we're having this if two times, here and in L202 what about encapsulating it in some private function within this own module instead? Then we will reduce the nested code:

def __check_raise_certification_errors(errors): if is_certification_error(errors): self._raise_error(CertificationError, error_message) [...] # On line 202 self.__check_raise_certification_errors(job_details.errors) [...] # On this line self.__check_raise_certification_errors(res.errors)

JAVGan · 2026-06-22T17:18:49Z

+            if is_certification_error(res.errors):
+                raise CertificationError(failure_msg)


Please use here what Sourcery-AI pointed out:

Suggested change

if is_certification_error(res.errors):

raise CertificationError(failure_msg)

if is_certification_error(res.errors):

self._raise_error(CertificationError, failure_msg)

Or even better, call the private function I mentioned in the other thread:

Suggested change

if is_certification_error(res.errors):

raise CertificationError(failure_msg)

self.__check_raise_certification_errors(res.errors)

JAVGan · 2026-06-22T17:20:01Z

+def _contains_certification_error(item: Any) -> bool:
+    code: str = item.get("code", "")


Item here points to Any, can you please change it to Dict[str, Any]

OR

Verify inside whether it's a dictionary, like:

if not isinstance(item, dict): raise [...]

JAVGan · 2026-06-22T17:20:55Z

+    message: str = item.get("message", "")
+    if code == "invalidState" and "certification" in message.lower():
+        return True
+    for detail in item.get("details") or []:


Also please validate somehow that item["details"] is a list.

I know we should receive a list from MS but if they change somehow their format the code shouldn't break.

JAVGan · 2026-06-22T17:22:14Z

+            "details": [
+                {
+                    "code": "invalidState",
+                    "message": "Certification",
+                    "details": [
+                        {
+                            "code": "invalidState",
+                            "message": "Issues found during Certification.",
+                        }
+                    ],


Nice!

Please also test some invalid error messages, like details not being present or present as a dict or something unexpected.

Make sure we can parse anything and just retrieve the certification error when we're sure the format is this one

ashwgit requested review from JAVGan, jajreidy and lslebodn as code owners June 18, 2026 04:59

sourcery-ai Bot reviewed Jun 18, 2026

View reviewed changes

JAVGan requested changes Jun 22, 2026

View reviewed changes

Do not retry on certification errors

ab75c04

ashwgit force-pushed the fail-on-cert-error branch from 70d18be to ab75c04 Compare June 26, 2026 07:54

		if is_certification_error(res.errors):
		raise CertificationError(failure_msg)

	if is_certification_error(res.errors):
	raise CertificationError(failure_msg)
	self.__check_raise_certification_errors(res.errors)

		def _contains_certification_error(item: Any) -> bool:
		code: str = item.get("code", "")

Uh oh!

Conversation

ashwgit commented Jun 18, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Azure publish flow without retries on certification errors

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

JAVGan left a comment

Choose a reason for hiding this comment

Uh oh!

JAVGan Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

JAVGan Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

JAVGan Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

JAVGan Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

JAVGan Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

JAVGan Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashwgit commented Jun 18, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 18, 2026 •

edited

Loading