docs: Document idle/unhealthy endpoint auto scale-down and image name validation by promptless[bot] · Pull Request #674 · runpod/docs

promptless · 2026-06-24T20:20:29Z

Open this suggestion in Promptless to view citations and reasoning process

Documents three recent Serverless platform changes from DOCS-452: (1) idle endpoint auto-scaling lifecycle (3 days → 2 workers + email, 7 days → 0 workers); (2) automatic scale-down on repeated unhealthy workers; (3) image name validation now enforced at endpoint creation (runpod/*:latest rejected). Updates endpoint settings, worker states, troubleshooting, deploy, and the changelog.

Trigger Events

Message in Slack channel #topic-documentation: Handle theis doc request with following suggestions - <https://linear.app/runpod/issue/DOCS-452/fip-document-i...

Tip: Worried about broken links? Ask Promptless to find and fix them automatically 🔗

…ation - Add idle endpoint scale-down policy (3d->2 workers+email, 7d->0) to endpoint settings - Note long-term idle and unhealthy auto scale-down in worker states table - Update Unhealthy worker state row to reflect auto scale-down behavior - Add troubleshooting entry for unexpected endpoint scale-down - Document image name validation at endpoint creation (runpod/*:latest rejected) - Add June 2026 release note for automatic endpoint scale-down Refs DOCS-452 (SLS-7, SLS-121, SLS-8, SLS-238)

promptless · 2026-06-24T20:20:30Z


 How long a worker stays active after completing a request before shutting down. You're billed during idle time, but the worker remains warm for immediate processing. Default: 5 seconds.

+### Idle endpoint scale-down


Added the Idle endpoint scale-down subsection from DOCS-452 (SLS-7): endpoints idle for 3 days have max_workers auto-reduced to 2 with an email sent, and idle for 7 days have max_workers set to 0; this is system-driven and users must raise max workers manually to resume.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down

promptless · 2026-06-24T20:20:31Z

 | **Throttled** | Temporarily unable to run due to host <MachineTooltip /> resource constraints | No |
 | **Outdated** | Marked for replacement after update | Yes (while processing) |
-| **Unhealthy** | Crashed; auto-retries for up to 7 days | No |
+| **Unhealthy** | Crashed; auto-retries, then scales the endpoint down after repeated failures (email sent) | No |


Updated the Unhealthy worker-state row per DOCS-452 (SLS-121): the platform now auto-scales an endpoint down after it consistently produces unhealthy workers (to stop billing and reduce thrashing) and sends an email, replacing the prior "auto-retries for up to 7 days" description.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down

@Promptless Don't replace the text in the table. Add as a note after the table "automatically scales down an endpoint that is consistently producing unhealthy workers, to stop billing and reduce thrashing. "

promptless · 2026-06-24T20:20:32Z


+## Endpoint scaling issues
+
+### My endpoint was scaled down unexpectedly


Added the My endpoint was scaled down unexpectedly troubleshooting entry covering both DOCS-452 scale-down causes (SLS-7 prolonged inactivity and SLS-121 repeated unhealthy workers), each of which triggers an email, and how to resume the endpoint.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down

promptless · 2026-06-24T20:20:33Z

 * Document the specific image version or SHA in your deployment documentation.
 * Keep images as small as possible for faster startup times.

+Runpod validates your image name when you create an endpoint, and a reference that doesn't resolve to a published image is rejected at creation time. For example, `runpod/pytorch:latest` is rejected because `:latest` is not a published tag for Runpod's base images. When you deploy a Runpod base image, specify a tag that actually exists. You can browse the available tags for each image on [Docker Hub](https://hub.docker.com/u/runpod). For your own worker images, use a specific version or SHA tag as described above rather than `:latest`.


Documented image-name validation per DOCS-452 (SLS-8): Layer 2 image verification flipped from observe to enforce, so image names are now validated at endpoint creation and runpod/pytorch:latest (and other runpod/*:latest references) are rejected because :latest is not a published tag for these base images.

Source: https://linear.app/runpod/issue/DOCS-452/fip-document-idle-endpoint-lifecycle-unhealthy-worker-auto-scale-down

Revert the table-text replacement for the Unhealthy worker state and keep the unhealthy scale-down behavior described in the note after the table.

mintlify · 2026-06-24T20:29:12Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
runpod-docs	🟢 Ready	View Preview	Jun 24, 2026, 8:30 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

lavanya-gunreddi · 2026-06-24T20:30:11Z

+- **Repeated unhealthy workers**: An endpoint that consistently produces unhealthy workers is scaled down automatically to stop billing and reduce thrashing, and Runpod sends you an email. See [troubleshooting](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly).
+
+</Update>
+


@Promptless Don't add anything to the release notes for now

Reviewer requested no changelog entry for now; keep all other Serverless documentation updates intact.

promptless · 2026-06-24T20:36:33Z

Thanks for the review! I've updated the suggestion on both points:

Unhealthy worker table: Restored the original table cell text and instead added a note after the table describing that the platform automatically scales down an endpoint that is consistently producing unhealthy workers, to stop billing and reduce thrashing.
Release notes: Removed the changelog/release-notes addition for now, per your request. The remaining documentation updates (endpoint settings, worker states, troubleshooting, and deploy) are unchanged.

The PR has been updated with these changes.

promptless Bot commented Jun 24, 2026

View reviewed changes

docs: restore Unhealthy worker state table text per reviewer feedback

afaee1a

Revert the table-text replacement for the Unhealthy worker state and keep the unhealthy scale-down behavior described in the note after the table.

mintlify Bot deployed to staging June 24, 2026 20:30 View deployment

lavanya-gunreddi reviewed Jun 24, 2026

View reviewed changes

docs: remove release-notes/changelog addition per reviewer feedback

331fd5d

Reviewer requested no changelog entry for now; keep all other Serverless documentation updates intact.

mintlify Bot deployed to staging June 24, 2026 20:31 View deployment

mintlify Bot deployed to staging June 24, 2026 20:32 View deployment

Update endpoint-configurations.mdx

ff84fb4

mintlify Bot deployed to staging June 24, 2026 20:40 View deployment

Update overview.mdx

9ccf79c

mintlify Bot deployed to staging June 24, 2026 20:45 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Document idle/unhealthy endpoint auto scale-down and image name validation#674

docs: Document idle/unhealthy endpoint auto scale-down and image name validation#674
promptless[bot] wants to merge 5 commits into
mainfrom
promptless/serverless-idle-unhealthy-scaling-image-validation

promptless Bot commented Jun 24, 2026

Uh oh!

promptless Bot Jun 24, 2026

Uh oh!

promptless Bot Jun 24, 2026

Uh oh!

lavanya-gunreddi Jun 24, 2026

Uh oh!

promptless Bot Jun 24, 2026

Uh oh!

promptless Bot Jun 24, 2026

Uh oh!

mintlify Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

lavanya-gunreddi Jun 24, 2026

Uh oh!

promptless Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		How long a worker stays active after completing a request before shutting down. You're billed during idle time, but the worker remains warm for immediate processing. Default: 5 seconds.

		### Idle endpoint scale-down


		## Endpoint scaling issues

		### My endpoint was scaled down unexpectedly

		- Repeated unhealthy workers: An endpoint that consistently produces unhealthy workers is scaled down automatically to stop billing and reduce thrashing, and Runpod sends you an email. See [troubleshooting](/serverless/troubleshooting#my-endpoint-was-scaled-down-unexpectedly).

		</Update>

Uh oh!

Conversation

promptless Bot commented Jun 24, 2026

Uh oh!

promptless Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

promptless Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

lavanya-gunreddi Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

promptless Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

promptless Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mintlify Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lavanya-gunreddi Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

promptless Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented Jun 24, 2026 •

edited

Loading