Close guest process stdin to avoid TTY hang on macOS#17562
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17562Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17562" |
There was a problem hiding this comment.
Pull request overview
Fixes a macOS/Linux hang in Aspire CLI guest process execution (notably TypeScript scaffolding flows) by ensuring spawned child processes do not inherit the parent TTY for stdin, preventing lifecycle scripts or prompts from blocking indefinitely.
Changes:
- Redirect and immediately close stdin for guest processes launched via
ProcessGuestLauncherso child reads observe EOF instead of blocking on the terminal. - Apply the same stdin redirect+close behavior to
NpmRunnersubprocess invocations. - Add a regression test that launches a command which attempts to read stdin and asserts it exits promptly with EOF observed.
Show a summary per file
| File | Description |
|---|---|
| tests/Aspire.Cli.Tests/Projects/GuestRuntimeTests.cs | Adds regression coverage ensuring child stdin is closed so reads return EOF (prevents macOS TTY hangs). |
| src/Aspire.Cli/Projects/ProcessGuestLauncher.cs | Redirects stdin and closes it immediately after start to avoid inheriting the CLI’s TTY. |
| src/Aspire.Cli/Npm/NpmRunner.cs | Redirects stdin for npm processes and closes it right after start to prevent interactive hangs. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 0
|
The \Cli.EndToEnd-EmptyAppHostTemplateTests\ failure (C# empty AppHost) looks like a flaky E2E test rather than a regression from this PR:
Will request a re-run once the current CI run completes. |
dddc727 to
9c38ba5
Compare
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
ProcessGuestLauncher and NpmRunner spawn child processes (npm/pnpm/yarn/bun install, plus the guest AppHost itself) with stdout/stderr redirected but left stdin inheriting the parent CLI's TTY. On macOS/Linux, if any child (e.g. an npm postinstall script, husky, or a package-manager permission prompt) reads from stdin, it blocks indefinitely waiting on the terminal, making 'aspire new' for the TypeScript starter (and 'aspire init/add/ restore') appear to stall with no output and ~0% CPU. Redirect stdin and close it immediately after Process.Start() so any child read surfaces as EOF instead of blocking. We never write to the guest process or npm stdin, so closing is safe. dotnet-based invocations already redirect stdin via ProcessExecutionFactory. Add a regression test in GuestRuntimeTests that launches a shell script which reads stdin and asserts it observes EOF and exits within 10s. Fixes #16791 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extend the TTY-hang fix to the two AppHost server launch paths used by BuildAndGenerateSdkAsync during 'aspire new'/'init'/'add'/'restore': - DotNetBasedAppHostServerProject.Run (dev/source-based AppHost server) - PrebuiltAppHostServer (shipped AppHost server) Both previously redirected stdout/stderr but left stdin inheriting the parent CLI's TTY. The CLI communicates with the server over a Unix socket (REMOTE_APP_HOST_SOCKET_PATH), not stdin, so closing the redirected stdin pipe immediately after Process.Start() is safe and ensures any stdin read in the server (or a library it loads) surfaces as EOF instead of blocking. Combined with the earlier ProcessGuestLauncher / NpmRunner changes, this covers every child process spawned during the TypeScript starter scaffolding flow that previously inherited the parent TTY. Refs #16791 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The E2E helper AspireStopAsync was calling WaitForSuccessPromptAsync, which waits up to 500s (default) for [N OK] $. When aspire stop returns non-zero (for example the documented FailedToDotnetRunAppHost flake in #16643), the prompt arrives as [N ERR:2] $ and the test then sits idle for ~8:20 before failing with a useless 'didn't see OK' timeout. The recent failure on this PR's CI was exactly this shape: aspire stop exited within seconds with ERR:2, but the test wasted 8m20s waiting for an OK that would never come. Switch the helper to WaitForSuccessPromptFailFastAsync so any ERR prompt fails the test immediately with the captured error context. All 20 callers are happy-path tests that expect aspire stop to succeed, so this is a pure test-diagnostic improvement — no product behavior change. Refs #16643 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ProcessCaptureRunner bounded post-exit stdout/stderr capture at 250ms. On loaded Windows CI, short-lived cmd.exe wrappers can exit before the async pipe readers get enough CPU to observe EOF, causing callers to receive an empty capture even though the process wrote output. The PeerInstallProbe failure on this PR had that shape: the fake peer.cmd printed the expected --version output, but the probe reported no usable output. Increase the bounded post-exit capture window to 2s. This remains far below the full process timeout, but gives enough scheduling slack for Windows pipe readers under CI load. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fc4bf93 to
9e54954
Compare
WaitForSuccessPromptFailFastAsync was renamed to WaitForSuccessPromptAsync in b2df78e. Update the call site in AspireStopAsync to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
|
❓ CLI E2E Tests unknown — 107 passed, 0 failed, 2 unknown (commit View all recordings
📹 Recordings uploaded automatically from CI run #26613241630 |
Description
aspire newfor the TypeScript starter (andaspire init/add/restore) hangs silently on macOS, producing no further output at ~0% CPU. Issue #16791 documents the symptom and a working< /dev/nullworkaround, which is strong evidence that some spawned child inherits the parent CLI's TTY and blocks on a stdin read.Root cause
Several subprocess launch paths in
Aspire.ClisetRedirectStandardOutput/Error = trueandUseShellExecute = false, but leave stdin inheriting the parent TTY. On macOS/Linux any read from stdin in those children then blocks forever:ProcessGuestLaunchernpm/pnpm/yarn/buninstall (and the guest AppHost itself)NpmRunnernpm view/pack/audit signatures/install -gDotNetBasedAppHostServerProject.RunPrebuiltAppHostServer.CreateStartInfoThe TypeScript starter flow exercises all of these via
BuildAndGenerateSdkAsync: prepare → start AppHost server → RPC codegen →npm install. By contrast,dotnet new installalready goes throughProcessExecutionFactory, which setsRedirectStandardInput = true— that's why C#-only template scaffolding doesn't hit this.Fix
In each of the four launch paths above:
RedirectStandardInput = trueon theProcessStartInfo.process.Start(), callprocess.StandardInput.Close()(wrapped intry/catch (IOException)) so any child read surfaces as EOF instead of blocking on the inherited TTY.The CLI controls these processes via Ctrl+C / backchannel cancellation and Unix-socket IPC (
REMOTE_APP_HOST_SOCKET_PATH), never via stdin, so closing the pipe is safe.Tests
Added
ProcessGuestLauncher_ClosesChildStdinSoReadsObserveEofinGuestRuntimeTests. It launches a short shell snippet that reads stdin and asserts the launcher returns within 10s with the child reporting EOF. Without the fix the test would block on the inherited TTY (or be killed by the 10s CTS).All
GuestRuntimeTests,AppHostServerProjectTests,AppHostServerSessionTests, andNpmRunnerTestspass locally (49 succeeded, 4 Windows-only skipped).Fixes #16791