feat(workflows): add continue_on_error step field for non-halting failures#2663
feat(workflows): add continue_on_error step field for non-halting failures#2663doquanghuy wants to merge 5 commits into
Conversation
f34ab4c to
da8ed4d
Compare
| By default, a non-zero exit code from any step halts the entire run. | ||
| Set `continue_on_error: true` on a step to record its result and | ||
| continue to the next sibling step instead. The exit code remains | ||
| available on `steps.<id>.output.exit_code` so downstream `if`, | ||
| `switch`, or `gate` steps can branch on it: |
| assert not any( | ||
| "continue_on_error" in e for e in errors | ||
| ), errors |
| monkeypatch.setattr(gate_module.sys.stdin, "isatty", lambda: True) | ||
| monkeypatch.setattr( |
|
@mnriem — addressed all three Copilot findings in 1. README wording. Rewrote the "Error Handling" intro in terms of 2. Validation test contract tightened. 3. Gate stdin patching made runner-robust. In Full suite still passes (continue_on_error: 7/7), no regressions. Branch is AI disclosure: drafted with Claude Opus, human-reviewed. |
e9a4871 to
a0e78ee
Compare
| run — most commonly a `shell` or `command` step exiting non-zero, but | ||
| also things like a step type that fails validation. Set |
| # is declared, the engine records the result (exit_code, stderr, status) | ||
| # and continues to the next sibling step instead of halting the run. |
- README "Error Handling": drop the validation-failure example. Structural validation runs before the workflow run starts, so it never produces a `StepStatus.FAILED` mid-run and was misleading as an example for `continue_on_error`. Reword to "any runtime error raised during step execution" with a parenthetical clarifying the validation-time path. - Tests block comment: fix the misleading "(exit_code, stderr, status)" wording. `status` is stored as `steps.<id>.status`, not nested under `steps.<id>.output`. Clarify that `output` carries the exit-code / stderr from the failure and that `status` is a sibling key so future readers writing expressions against run state aren't led astray. Addresses Copilot review feedback on github#2663.
|
@mnriem — addressed both new Copilot findings in 1. README runtime-vs-validation framing. Dropped the "step type that fails validation" example from the "Error Handling" intro — structural validation runs at 2. Test-block comment about Continue_on_error suite still 7/7, no regressions. AI disclosure: drafted with Claude Opus, human-reviewed. |
| # steps can branch on it. Log a single, unambiguous | ||
| # event per failure resolution — either the run | ||
| # continued past it, or it halted. | ||
| if step_config.get("continue_on_error"): |
Engine was checking `continue_on_error` via truthiness
(`if step_config.get("continue_on_error"):`), which let truthy
non-bool values like the string `"true"` activate the flag at
runtime if validation was skipped. `execute()` does not auto-validate
(documented on the `load_definition` docstring), so a caller bypassing
`validate_workflow()` could see the contract silently violated.
Tighten the runtime check to identity comparison (`is True`) so only
a literal boolean enables the behaviour. Validation continues to
catch the authoring mistake at parse time; this is defense in depth
for the bypass-validation path. Add
`test_engine_ignores_truthy_non_bool_continue_on_error` locking the
contract — feeds `continue_on_error: "true"` (string) through
`execute()` directly and asserts the run halts with
`RunStatus.FAILED`.
Addresses Copilot review feedback on github#2663.
|
@mnriem — addressed the new Copilot finding in The gap. Copilot caught a real defense-in-depth hole: The fix. Tightened the runtime check at - if step_config.get("continue_on_error"):
+ if step_config.get("continue_on_error") is True:Only a literal New test. Continue_on_error suite: 8/8 pass (was 7/7). Full suite: 3002 passed, 40 skipped, no regressions. AI disclosure: drafted with Claude Opus, human-reviewed. |
| # the behaviour, even if validation was skipped. | ||
| # Validation rejects non-bool values at parse time, | ||
| # but `execute()` does not auto-validate (see the | ||
| # `load_definition` docstring), so a caller passing |
| behaviour. `execute()` does not auto-validate (the | ||
| `load_definition` docstring documents this), so the engine |
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback and resolve conflicts
Adds an optional `continue_on_error: bool` field on every step.
When set to `true` and the step fails, the engine records the
result (`exit_code`, `stderr` on `steps.<id>.output` plus `status`
as a sibling key on `steps.<id>`) and continues to the next sibling
step instead of halting the run. Downstream `if`, `switch`, or
`gate` steps can then branch on
`{{ steps.<id>.output.exit_code }}` to route the recovery path.
Engine details
--------------
`WorkflowEngine._execute_steps` now consults the step config when a
step returns `StepStatus.FAILED`:
- Gate aborts (`output.aborted`) always halt the run — operator
decisions take precedence over the flag.
- Otherwise, if `continue_on_error` is the literal `True`, log a
`step_continue_on_error` event and proceed to the next sibling.
The runtime check uses identity comparison (`is True`) rather
than truthiness, so truthy non-bool values like the string
`"true"` cannot silently change run semantics even if a caller
bypasses `validate_workflow()`.
- Otherwise, behave as before: log `step_failed`, set
`RunStatus.FAILED`, and return.
Validation
----------
`_validate_steps` rejects non-bool values for `continue_on_error`.
Coerced strings like `"true"` are not accepted so authoring
mistakes surface at validation time rather than silently changing
run semantics.
Tests
-----
`TestContinueOnError` in `tests/test_workflows.py` (8 tests):
- `test_undeclared_failure_halts_run` — default halt behaviour.
- `test_declared_and_fired_continues_run` — flag + fail → continue.
- `test_declared_but_step_succeeded_is_noop` — flag + success → no-op.
- `test_if_branch_routes_around_failure` — end-to-end recovery.
- `test_gate_abort_still_halts_with_continue_on_error` — abort
always halts.
- `test_validation_rejects_non_bool_continue_on_error` — `"true"`
rejected at validation.
- `test_validation_accepts_bool_continue_on_error` — `true`/`false`
pass cleanly.
- `test_engine_ignores_truthy_non_bool_continue_on_error` —
defense-in-depth: engine ignores string `"true"` even when
validation is bypassed.
Rebased onto current upstream/main (post github#2664 merge); the new
`TestContinueOnError` class sits immediately after upstream's
`TestContextRunId` so the two feature suites coexist cleanly.
Closes github#2591.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
838e7e4 to
d0b9e00
Compare
|
@mnriem — addressed both Copilot findings and resolved the conflicts in 1. Wrong method name reference. Both the engine inline comment ( 2. Conflict resolution after PR #2664 landed. Upstream I rebased the branch onto current Full suite: 3013 passed (was 3009 pre-rebase; +4 from upstream's AI disclosure: drafted with Claude Opus, human-reviewed. |
| Supported filters: `default`, `join`, `contains`, `map`. | ||
|
|
||
| ### Runtime Context | ||
|
|
||
| `{{ context.* }}` exposes engine-managed runtime metadata for the | ||
| current run: | ||
|
|
||
| | Variable | Description | | ||
| |----------|-------------| | ||
| | `context.run_id` | The current workflow run id (the same value Spec Kit prints as `Run ID:` at the end of `workflow run`). Auto-generated runs are 8-character hex from `uuid4`; operator-supplied ids may be any alphanumeric string with hyphens or underscores. Empty string outside a run context. | | ||
|
|
||
| ```yaml | ||
| # Stamp telemetry events with the run id for cross-system join. | ||
| - id: emit-event | ||
| type: shell | ||
| run: 'echo "{\"run_id\":\"{{ context.run_id }}\",\"event\":\"started\"}" >> events.jsonl' | ||
|
|
||
| # Per-run scratch directory. | ||
| - id: prep-scratch | ||
| type: shell | ||
| run: 'mkdir -p /tmp/run-{{ context.run_id }}' | ||
|
|
||
| # Pass run id into a command for artifact metadata. | ||
| - id: tag-artifact | ||
| command: speckit.specify | ||
| input: | ||
| args: "{{ context.run_id }}" | ||
| ``` | ||
|
|
||
| ## Input Types | ||
|
|
||
| Workflow inputs are type-checked and coerced from CLI string values: |
| then: | ||
| - id: review | ||
| type: gate | ||
| message: "Step failed (exit {{ steps.heavy-thing.output.exit_code }}). Retry or skip?" |
|
Please address Copilot feedback |
Two Copilot findings on d0b9e00: 1. The `### Runtime Context` documentation for `{{ context.* }}` was lost during the rebase onto current main (the squash dropped the anchor where github#2664 had added it). Restored under `## Expressions` so users can find `context.run_id` semantics and examples. 2. The continue_on_error example gate had message "Retry or skip?" but used the default `options: [approve, reject]` with `on_reject: skip`, which implied an automatic retry path that gates do not provide. Reworded the message to match the actual approve/reject semantics and added an explicit note that retry requires either custom gate options + downstream branching or a wrapper loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@mnriem — addressed both Copilot findings in 1. Runtime Context section restored. The 2. Gate prompt clarified. Reworded the example's CI was green on the prior commit (11/11 passing). This change is README-only — no Python tests touched. AI disclosure: drafted with Claude Opus, human-reviewed. |
| By default, any step that ends in `StepStatus.FAILED` at runtime halts | ||
| the entire run — most commonly a `shell` or `command` step exiting | ||
| non-zero, but also any other runtime error raised during step | ||
| execution. (Invalid workflow definitions are rejected up-front by | ||
| `specify workflow run` before the run even starts, so structural | ||
| validation failures never reach this code path.) Set | ||
| `continue_on_error: true` on a step to record its result and continue | ||
| to the next sibling step instead. When the failure was a non-zero | ||
| exit, the exit code remains available on |
Copilot finding on d0b9e00: The README's "Error Handling" intro implied `continue_on_error` covers "any other runtime error raised during step execution", but the engine only consults the flag when a step returns `StepResult(status=FAILED, ...)`. Exceptions raised out of `step_impl.execute()` propagate to `WorkflowEngine.execute()`, where the catch-all logs `workflow_failed` and re-raises — the step result is never recorded, and the flag is never consulted. Audited the whole PR diff for the same overclaim: 1. workflows/README.md — main fix. Reworded the Error Handling intro to "any step that returns StepResult(status=FAILED, ...)" and promoted the parenthetical structural-validation note into the Notes block. Added a new "Scope: returned failures only" note that names the exception path explicitly and tells step authors how to bring the flag into scope for exceptional code (catch internally and return FAILED with the failure encoded in `output`). 2. tests/test_workflows.py — section comment used "when an executable step fails", same ambiguity. Tightened to "when a step returns StepResult(status=FAILED, ...)" and added a sentence calling out that unhandled exceptions are out of scope. 3. src/specify_cli/workflows/engine.py — already correct ("any step that returns FAILED" in the validator comment; "lets the pipeline route around the failure" in the execute path). No change. Engine semantics and test bodies are unchanged. Docs-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@mnriem — addressed the new Copilot finding in The finding. Copilot caught a real overclaim in the README "Error Handling" intro: the text implied Audit (per "fix all similar issues"). Grepped the entire PR diff for the same pattern:
Engine semantics, validation, and test bodies are unchanged. CI: 161/161 Branch is clean against |
| The gate's default `options: [approve, reject]` map directly to "continue | ||
| the run" vs. "skip the rest of this branch" — gates do not automatically | ||
| re-run the failed step. To express a retry path, either define custom | ||
| gate options and branch on the choice downstream, or wrap the failing | ||
| step in your own loop. |
…MPLETED, not auto-skip Copilot finding on b8982a7: The README example's gate message said "reject to skip the rest of this branch", and the explanatory paragraph claimed [approve, reject] map to "continue" vs "skip the rest of this branch". The engine does not implement automatic branch-skipping. `on_reject: skip` returns `StepStatus.COMPLETED` (gate/__init__.py:65-66); the next sibling step runs unconditionally unless the author wires a downstream `if` reading `{{ steps.<gate-id>.output.choice }}`. Two fixes: 1. Restructured the YAML example so it actually demonstrates the manual-branching pattern: added a `recover` if-step after the gate that conditions on `steps.review.output.choice == 'approve'`. Now the example shows the real workflow author's responsibility instead of implying the engine does it. 2. Replaced the trailing paragraph with three precise notes: - both gate options return COMPLETED; `on_reject: skip` controls abort behaviour only, not sibling-skipping - all three `on_reject` values enumerated with their actual engine semantics (FAILED+aborted / COMPLETED / PAUSED) - the original retry-loop guidance retained as the third bullet Updated the gate message in the example to match — "reject to leave the failure recorded and move on" instead of "reject to skip the rest of this branch". Audited the whole PR diff for the same overclaim: no other instance. Engine semantics, validation, and test bodies are unchanged. Docs-only. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@mnriem — addressed the new Copilot finding in The finding. Copilot caught a real overclaim in the example: the gate message said "reject to skip the rest of this branch" and the explanatory paragraph claimed Two fixes:
Updated the example gate message to match — "reject to leave the failure recorded and move on" instead of the old overclaim. Audit (per "fix all similar issues"). Grepped the entire PR diff for the same pattern. No other instance — tests don't carry similar phrasing, and Engine semantics, validation, and test bodies are unchanged. CI: 161/161 |
…ally branch Audit follow-up to 393ac6b — three sites repeated the same minor overclaim about gates being one of the "branch on it" step types alongside `if` and `switch`: 1. workflows/README.md (the "downstream `if`, `switch`, or `gate` steps can branch on it" sentence introducing the example) 2. engine.py:236 (validator inline comment) 3. engine.py:657 (execute-path inline comment) A `gate` step does not have a `condition` or `expression` field — it only evaluates expressions for `message` and `show_file` (gate/__init__.py:29,36). Programmatic branching happens in `if`/`switch`; a gate surfaces the value to a human operator via message interpolation, and the operator's choice is recorded in `output.choice` for a *subsequent* `if`/`switch` to route on. Reworded all three sites consistently: "a downstream `if` or `switch` can branch on it (or a `gate` can surface it to the operator via message interpolation)". The README example already demonstrates this distinction — the gate carries `{{ }}` template variables in its message and the `recover` if-step downstream is what actually branches on the choice. Engine semantics, validation, and test bodies are unchanged. Docs-only on the README; comment-only on engine.py. 161/161 tests/test_workflows.py pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@mnriem — proactive audit follow-up in After fixing the Copilot finding in 393ac6b I audited the rest of the PR diff for the same class of issue (docs claiming engine behaviour that the code doesn't actually implement). Found three sites with the same minor overclaim:
A Reworded all three sites consistently to: "a downstream Engine semantics, validation, and test bodies are unchanged. CI: 161/161 Branch is clean against |
Description
Closes #2591.
Adds an optional
continue_on_error: boolfield on every step.When set to
trueand the step fails, the engine records theresult (exit_code, stderr, status) into
steps.<id>.outputandcontinues to the next sibling step instead of halting the run.
Downstream
if,switch, orgatesteps can then branch on{{ steps.<id>.output.exit_code }}to route the recovery path.This is the shape @mnriem proposed in the issue discussion —
it composes with primitives that already exist (the exit code
is already captured, the expression engine already resolves it,
and
if/switch/gateare already available). The only gapwas that a non-zero exit hard-stopped the pipeline before any
downstream step could evaluate it.
Canonical usage
Engine
WorkflowEngine._execute_stepsnow consults the step config whena step returns
StepStatus.FAILED:output.aborted) always halt the run — operatordecisions take precedence over the flag.
continue_on_error: true, log astep_continue_on_errorevent and proceed to the next sibling.step_failed, setRunStatus.FAILED, and return.Exactly one event per failure-resolution path is logged so the
log timeline is unambiguous: either the run continued past the
failure or it halted.
Validation
_validate_stepsrejects non-bool values forcontinue_on_error.Coerced strings like
"true"are not accepted so authoringmistakes surface at validation time rather than silently
changing run semantics.
Default behaviour preserved
When
continue_on_erroris omitted, every code path isbyte-equivalent to before this change. Existing workflows see no
difference.
Verdict coverage (from the issue discussion)
continue_on_error: true+ifbranches around the failurecontinue_on_error: true+gate→ operator approves →resumere-runs from gateFully unattended retry-on-transient (e.g. retry a 429 at 3 AM
without operator attendance) is intentionally out of scope here.
The skip and abort verdicts work without a human; the
retry verdict still pauses for one at the gate. A future
loop/retry-count primitive or an auto-approving gate type could
close that gap on top of this mechanism without further engine
changes — happy to follow up on that in a separate issue if
useful.
Testing
uv run specify --helpuv sync && uv run pytest→ 2967 passed, 35 skipped (was 2960 before; +7 new
tests added in this PR).
the middle step exits non-zero. Without
continue_on_error, run halts at the failing step (asbefore). With
continue_on_error: true, the failing steprecords
exit_codeand the third step executes. Adownstream
ifbranching on{{ steps.flaky.output.exit_code != 0 }}routes into arecovery
gatecleanly.New test coverage
TestContinueOnErrorintests/test_workflows.py:test_undeclared_failure_halts_runtest_declared_and_fired_continues_runtest_declared_but_step_succeeded_is_nooptest_if_branch_routes_around_failuretest_gate_abort_still_halts_with_continue_on_errortest_validation_rejects_non_bool_continue_on_error"true"(string) fails validation.test_validation_accepts_bool_continue_on_errortrueandfalsepass validation cleanly.AI Disclosure
Used Claude Opus to draft the engine change, the test suite, the
docs section, and this PR body. The shape (
continue_on_errorproposed by @mnriem on the issue thread; this PR implements that
proposal. Code, tests, and design decisions were human-reviewed
before submission.