Skip to main content

forge.yaml Platform Schema (v3)

Internal — Conservice

This reference is internal to Conservice, for Conservice GitHub users building apps on the greenfield platform.

URL major version v3 is the current forge.yaml schema contract. The headline change from v1: multi-service apps with mixed exposure — each service gets its own expose, image, and dns shape, plus gateway-level authentication and authorization. Single-service apps render identically to v1 (backwards compatible). Old majors stay live forever so apps pinned to v1 keep validating. Machine-readable JSON Schema: forge.yaml.schema.json — drop into .vscode/settings.json yaml.schemas for IDE autocomplete + validation.


Contents


What this is

forge.yaml is the declarative spec for an app on Conservice's greenfield platform. It lives at infra/forge.yaml in your app repo. Forge reads it and renders all the underlying infrastructure (Terraform for AWS resources, Kustomize for K8s manifests, GitHub Actions workflows, ArgoCD apps, Kargo pipelines, Workspace + Identity Center bindings).

You don't write Terraform. You write forge.yaml. Forge takes it from there.

This document is the complete schema reference for forge.yaml — every field, allowed value, naming constraint, and validation rule the schema enforces.


Quick start — minimal forge.yaml

forge_version: 3.0.0
app_name: my-app
team: sre
language: typescript
services:
- name: api
port: 8080
# no `expose:` → ClusterIP-only (in-cluster HTTP, sister-service callable). Add
# `expose: internal` for VPN-only routing or `expose: public` for internet-facing.
health_path: /health
dockerfile: Dockerfile
context: .

That's enough to scaffold an app: an in-cluster API service (no external or VPN exposure — sister services can call it via mesh) deploying to all platform envs (prev/stg/prod). Add expose: internal to the service for VPN-only routing or expose: public for internet-facing. Add resource declarations under resources: to get S3 buckets, SQS queues, databases, etc. Use disabled_envs: to opt an app out of specific envs (see § Environment opt-out).

Resource-only apps (no runtime code — just S3 + DDB + queues) are allowed: set services: [] and omit language. See § Language.

Multi-service example (v3)

forge_version: 3.0.0
app_name: status-page
team: sre
language: typescript
dns:
zone: conservice.ai
required: true
services:
- name: web # primary (services[0]) → status-page.conservice.ai
port: 3000
expose: public

health_path: /health
- name: dashboard # → status-page-dashboard.conservice.ai
port: 3001
expose: internal

health_path: /health
- name: api # → status-page-api.conservice.ai
port: 8080
expose: internal

health_path: /health
- name: mcp # → status-page-mcp.conservice.ai
port: 8081
expose: internal

health_path: /health
authz:
initial_grants:
- alice@conservice.com

Each service gets its own Deployment + Service + HTTPRoute. The primary service (services[0]) routes at the bare apex hostname; non-primary exposed services suffix as {app}-{service}. See § Per-service hostname resolution.

How to deploy

  1. Save your forge.yaml at infra/forge.yaml in your app repo.
  2. Push a branch and open a PR.
  3. Platform CI validates the schema, renders manifests, and commits them back to the PR.
  4. Review the PR diff — it shows both your source change and the rendered infrastructure effect.
  5. Merge to main. ArgoCD syncs the rendered manifests; Kargo promotes through prevstgprod.

That's it. You don't run Terraform, kubectl, or any CLI — the platform handles provisioning, DNS, TLS, secrets, and promotion.

Migrating from v1

v3 is backwards compatible — single-service apps render identically. The headline changes:

What's newSummary
Multi-service appsservices[] may now contain multiple entries with mixed expose: values. See § Services.
Per-service imageservices[].image block replaces flat dockerfile/context. See § Per-service image.
Authorization (authz)Gateway-level authorization with per-app roles. See § Authorization (authz).
Per-service DNSservices[].dns.hostname overrides the default hostname per service. See § Per-service hostname resolution.

To upgrade: change forge_version: 1.0.0 to forge_version: 3.0.0 in your forge.yaml. Or run forge_migrate_yaml to auto-upgrade the file shape. The flat dockerfile and context fields still work but are soft-deprecated — migrate to the image block when convenient.


Common patterns

Complete examples for the most common app shapes. Copy the one closest to your use case and adjust.

Web API with PostgreSQL database
forge_version: 3.0.0
app_name: billing-api
team: billing
language: typescript
dns:
zone: conservice.ai
required: true
services:
- name: api
port: 8080
expose: internal
health_path: /health
resources:
database:
main:
extensions: [uuid-ossp]
Public web app with SQS + DynamoDB
forge_version: 3.0.0
app_name: status-page
team: sre
language: typescript
dns:
zone: conservice.ai
required: true
services:
- name: web
port: 3000
expose: public
health_path: /health
resources:
sqs:
events:
dlq: true
dynamodb:
sessions:
hash_key: id
ttl_attribute: expires_at
Multi-service app (API + worker)
forge_version: 3.0.0
app_name: ingest-pipeline
team: data
language: python
dns:
zone: conservice.ai
required: true
services:
- name: api
port: 8080
expose: internal
health_path: /health
- name: worker
image:
dockerfile: Dockerfile
target: worker
resources:
sqs:
jobs:
visibility_timeout: 120
s3:
raw-data:
versioning: true
Resource-only app (no services)
forge_version: 3.0.0
app_name: shared-data
team: data
services: []
resources:
s3:
exports: {}
dynamodb:
config:
hash_key: key
AI app with Bedrock + vector DB
forge_version: 3.0.0
app_name: rates-agent
team: ai
language: typescript
dns:
zone: conservice.ai
required: true
services:
- name: api
port: 8080
expose: internal
health_path: /health
resources:
database:
main:
extensions: [vector, uuid-ossp]
bedrock:
model_ids:
- us.anthropic.claude-sonnet-4-20250514-v1:0
- amazon.titan-embed-text-v2:0

App repo layout

forge.yaml is the source of truth, but the rendered output of the platform lives alongside it in your app repo. Knowing which paths are dev-editable vs CI-generated matters when you read a PR diff or wonder why your hand-edit "disappeared."

forge.yaml does not configure GitHub repository visibility or team access. Forge-created app repos are always private; repo creation and access grants are handled by Forge outside this schema. Do not add a visibility key to forge.yaml.

my-app/
└── infra/
├── forge.yaml # SOURCE OF TRUTH — dev edits this
└── deploy/
├── patches/{env}/ # dev-editable escape hatch (kustomize patches)
│ └── *.yaml # strategic merge or JSON6902
├── rendered/{env}/ # CI-generated, do NOT edit
│ ├── kustomization.yaml # plain rendered manifests
│ └── *.yaml # ArgoCD reads from here
└── overlays/{env}/ # legacy, being deprecated
PathWho editsWhat it does
infra/forge.yamldev (source of truth)The declarative spec this document describes. Everything below is rendered from it.
infra/deploy/patches/{env}/*.yamldev (escape hatch)Kustomize patches layered on top of the base render. Use when you need a hand-edit not yet expressible in the schema (env vars / labels / annotations the platform hasn't surfaced as a field yet). Each patch represents a tracked schema-gap.
infra/deploy/rendered/{env}/CI (platform render workflow)Plain rendered Kubernetes manifests committed back to the PR branch by the platform's GitHub App on every push. ArgoCD reads from here. Dev hand-edits to this directory get overwritten on the next push.
infra/deploy/overlays/{env}/legacy, being deletedPre-2026-05 scaffold output. Apps that haven't migrated still have it; the directory disappears once all apps cut over to the rendered-manifests layout.

The schema itself doesn't change under this layout — the platform renderer still consumes forge.yaml from infra/forge.yaml, validates it against the JSON Schema linked at the top of this page, and emits the rendered output. The only thing that's new is where rendered files land in the repo and who maintains them. Direct forge.yaml edits in a feature branch + PR are the supported flow: CI re-renders on every push, so the PR diff shows BOTH the source change AND the rendered effect before merge.


Top-level fields

The root object accepts these keys (and rejects unknowns — forge_version: 1.0.0 typo'd as forgeVersion will fail at validation, not silently default). Strict-key validation applies recursively: every nested object below also rejects unknown keys.

FieldTypeRequiredNotes
forge_versionstringyesSchema version. X.Y or X.Y.Z semver.
app_namestringyesKebab-case (lowercase + digits + hyphens, start with letter, end with letter/digit), 3-22 chars — the 22 cap keeps every derived identifier (IAM role names, S3 bucket names, the PostgreSQL login role aws-{app}-db-{tier}) under its platform limit. Reserved prefixes are rejected — see Reserved app-name prefixes.
teamstringyesOwning team's kebab-case slug. Resolves to team-{team}@conservice.com for membership and is stamped on every resource as the team AWS tag. AWS access is keyed by team via aws-team-{team}-{tier} groups; there is no per-app aws-{app}-admin/readonly group. Must match an entry in forge's allowed-teams list.
domainstringnoBusiness domain (e.g., billing, identity, platform). Used for AWS resource tagging only — does not affect provisioning shape.
portfoliostringnoPortfolio grouping for finance/cost-allocation rollups. Used for AWS resource tagging. Falls back to team when unset.
servicesarraynoContainers Forge builds and deploys. Defaults to [] (resource-only app). When non-empty, language is required. See § Services.
languageenumconditionalRuntime/language for the app's services. Required when services is non-empty. Closed enum: typescript, javascript, python, go, csharp, java, rust. See § Language.
dnsobjectnoDNS exposure config (primary zone + hostname + optional sister aliases). See § DNS.
app_config_keysarray of stringsnoUPPER_SNAKE_CASE env var names for dev-managed secrets. See § App config keys.
resourcesobjectnoAWS resources to provision. See § Resources.
consumesobjectnoCross-app resource and service consume declarations. See § Cross-app access.
auth"none" or objectnoAuthentication. A union: the literal "none" (public app, no authentication — served via a private ALB), OR an auth: block declaring the access mode (authorization itself is AVP-grant-based — see authz). See § Authentication (auth).
authzobjectnoGateway-level authorization. See § Authorization (authz).
githubobjectnoOpt-in runtime GitHub access via short-lived, Pod-Identity-minted tokens (no stored credential). { read?: bool, write?: [code|issues] (exactly one), projects?: read|write }. Targets/breadth/permissions are server-side and platform-managed — never declared here. See § GitHub access (github).
env_varsobjectnoStatic per-env config injected into the app ConfigMap. See § Env vars.
disabled_envsarray of enumsnoOpt the app out of specific platform envs. See § Environment opt-out.
previewobjectnoPreview-environment opt-in. See § Preview.
image_tagsobjectnoPer-env image tags written by Kargo after promotion. See § Image tags.
replicasobjectnoPer-env fixed replica counts for non-autoscaled services (keys stg/prod only — no prev). For autoscaling, use the per-service scaling block instead. See § Replicas.
render_channelenumnoOne of general (default) or canary. Selects which renderer pin channel the app's CI workflows track. Supersedes the deprecated canary boolean. See § Render channel.
canarybooleannoDEPRECATED — use render_channel: canary instead. Backwards-compatible alias for render_channel: canary. When both are set, render_channel wins. See § Canary (deprecated).
app_kindenumnoOne of web-service, worker, cron, batch. Drives the Datadog monitor catalog gate — selects which kind-specific monitors emit. See § App kind.
slo_tierenumnoOne of tier-1, tier-2, tier-3. Catalog-membership lever: how much of the monitor catalog this app gets. See § SLO tier.
monitorsobjectnoDatadog monitor routing + paging config (PagerDuty service, Google Chat webhook, per-env routing, extras). See § Monitors.
observabilityobjectnoPer-app observability tuning (e.g. Datadog APM trace sample rate). See § Observability.

environments: is rejected at parse. The root object is strict and has no environments key — a file carrying that block fails validation. Remove the block; use disabled_envs: instead. See § Environment opt-out.


Language

The language field declares the runtime/ecosystem the app's services are written in. Forge uses it to pick the right Dockerfile base image and emit the matching per-language env contract.

language: typescript
ValueNotes
typescriptPer-language emitter shipped.
javascriptPer-language emitter shipped.
pythonPer-language emitter shipped.
goPer-language emitter shipped.
csharpPer-language emitter shipped.
javaPer-language emitter shipped.
rustPer-language emitter shipped.

Lowercase, no version suffix — the value identifies the ecosystem, not the specific runtime version. language is required when services is non-empty and accepted-but-ignored when services: [] (resource-only apps have no code to language-tag, but round-tripping the field is fine).

Adding a new language requires both a schema-enum bump and a corresponding per-language emitter in the platform renderer — file a platform request.


Services

Every app may declare zero or more services. Forge emits three structurally distinct shapes per service depending on port: and expose::

port: set?expose: valueForge emits
no(n/a)Just a Deployment (worker)
yes(absent)Deployment + ClusterIP Service (in-cluster HTTP, mesh-callable)
yesinternalDeployment + ClusterIP + HTTPRoute, VPN-only (private gateway, auto-derived from auth)
yespublicDeployment + ClusterIP + HTTPRoute, internet-facing (served from a private origin — no public-subnet load balancer). Authenticated public apps are supported today; no-auth public apps are not yet supported.
services:
- name: api
port: 8080
expose: public # internet-facing (public)
health_path: /health
dockerfile: services/api/Dockerfile
context: services/api/
replicas: 2
env:
LOG_LEVEL: info
resources:
cpu_request: "100m"
memory_request: "256Mi"
memory_limit: "512Mi"
- name: worker # no port + no expose → worker-only
dockerfile: services/worker/Dockerfile
context: services/worker/
FieldTypeRequiredNotes
namestringyes2-40 chars, kebab-case (^[a-z][a-z0-9-]*[a-z0-9]$ — lowercase letters, digits, dashes; start with letter, end with letter/digit). Used as the K8s service name + ECR repo suffix. Must be unique within services[].
portintegerconditional1-65535. Required when expose: is set. Omitted = worker (Deployment only).
exposeenumno"public" (internet-facing, served from a private origin — no public-subnet load balancer; authenticated public apps are supported today, no-auth public apps are not yet supported) or "internal" (VPN-only). Omitted = ClusterIP-only (in-cluster HTTP, no gateway route). The routing tier is platform-derived from auth — nothing to configure. v3: multiple services may have expose: set (mixed public + internal). At most ONE service may be public; that service must be services[0] (the primary). Any number of services may be internal.
dnsobjectnov3 NEW. Per-service DNS override. { hostname?: string }hostname must match ^[a-z0-9]([a-z0-9.-]*[a-z0-9])?$. When set, overrides the default hostname for this service's HTTPRoute. App-level dns.zone and dns.required are shared. See § Per-service hostname resolution.
imageobjectnov3 NEW. Per-service build configuration. { dockerfile?: string, target?: string }. dockerfile replaces the flat dockerfile field (which is soft-deprecated). target enables multi-target Dockerfile builds (docker build --target {target}). See § Per-service image.
authobject or "none"noPer-service auth declaration. kind: "bearer" flags a service whose callers can't follow OIDC redirects (MCP, CLI, programmatic clients) and routes it through the OAuth 2.1 bearer path — see § Authenticating an MCP server; the literal "none" declares an unauthenticated service surface. Omitted = the service inherits the app-level auth posture (OIDC-cookie for browser-facing apps). Per-user/role access is governed by authz grants, not per-service fields. See § Authentication.
health_pathstringnoHTTP path the K8s liveness/readiness probes hit. Probes are only emitted when BOTH health_path and port are set.
dockerfilestringnoSoft-deprecated in v3 — use image.dockerfile instead. Path to Dockerfile, relative to context. Default: Dockerfile. Still accepted for backwards compatibility.
contextstringnoSoft-deprecated in v3 — use the image block instead. Docker build context, relative to repo root. Default: .. When image.dockerfile is set, this field is ignored (the Dockerfile path is repo-root-relative, making a separate context unnecessary).
replicasintegernoFixed replica count for the Deployment. Non-negative. Default: 2 when port is set, 1 otherwise — applied by the platform renderer at emit time, not by the schema. Set 0 to suspend deployment without removing the resources. Mutually exclusive with scaling — a service is either fixed-count (replicas) or autoscaled (scaling), never both. (Not to be confused with the top-level per-env replicas: map — see § Replicas.)
scalingobjectnov3 NEW. Per-service horizontal autoscaling (HPA). { min_replicas, max_replicas, target_cpu }. When set, the service is managed by a HorizontalPodAutoscaler in stg + prod instead of a fixed replica count. Requires resources.cpu_request and is mutually exclusive with replicas. See § Per-service autoscaling.
scheduleobjectnoRun this service on a schedule (a Kubernetes CronJob) instead of as an always-on Deployment. { cron, timezone, concurrency?, active_deadline_seconds? } — mutually exclusive with everything routed/long-running (port, expose, health_path, dns, scaling, replicas). See § Scheduled services.
envobject (string→string)noStatic env vars baked into the Deployment manifest (UPPER_SNAKE_CASE keys). Reserved prefixes (see Reserved env-var name prefixes) cannot be shadowed here.
resourcesobjectnoK8s resource requests/limits. Keys: cpu_request, memory_request, memory_limit. All three must be set together when the block is present — per-field fallback isn't supported. Omit the block entirely for platform defaults (100m / 256Mi / 512Mi).
Service-to-service communication

Services within the same app share a Kubernetes namespace. Call a sibling service at http://{service-name}:{port} (e.g. http://api:8080). No DNS suffix or service mesh config needed — Istio ambient handles mTLS transparently.

Resource ownership

All AWS resources declared in resources: are app-scoped, shared across every service in the app. Every service runs under the same pod IAM role and can access the same S3 buckets, SQS queues, databases, etc. There is no per-service resource isolation within an app.


DNS

Controls the app's public hostname, which Gateway-API listener routes to it, and any sister hostnames on additional zones.

dns:
required: true
zone: conservice.ai
hostname: my-app
aliases:
- zone: conservice.cloud
hostname: my-app.conservice.cloud
FieldTypeRequiredNotes
requiredboolnoWhether DNS records get created. Default: false. Only a service with expose: set can satisfy required: true.
zoneenumnoPrimary zone. One of conservice.ai, conservice.cloud, capturis.ai, svc.conservice.ai. conservice.ai is the documented default for new internet-facing apps; the others are peer primaries for apps with audience/brand reason to live there. svc.conservice.ai is reserved for AWS infra CNAMEs and is rejected for forge apps by the scaffold-input refine — use conservice.ai with services[].expose: internal for VPN-only apps.
hostnamestringnoOptional explicit hostname override (e.g. rates-prod.conservice.ai). Must match ^[a-z0-9]([a-z0-9.-]*[a-z0-9])?$ (lowercase letters, digits, dots, dashes). Default: {app_name}.{zone}.
aliasesarray of objectsnoSister hostnames on additional zones for the same workload (max 5). Each entry: { zone, hostname }hostname must match ^[a-z0-9]([a-z0-9.-]*[a-z0-9])?$. Each alias renders an additional HTTPRoute on the same Gateway pointing at the same backend Service (TLS terminates at the NLB via the wildcard cert per TLD). Used for multi-TLD apps like the auth front-door — primary on conservice.ai, sisters on conservice.cloud and capturis.ai.

Per-entry alias rules (enforced at parse time):

  • aliases[].zone must differ from the primary dns.zone (aliases are sister hostnames on a different TLD).
  • aliases[].zone cannot be svc.conservice.ai.
  • aliases[].zone values must be distinct across all entries — two aliases on the same zone would collide on the same Gateway listener.
  • aliases[].hostname must end with its declared zone (e.g. auth.conservice.cloud for zone: conservice.cloud).
  • aliases requires zone to be set (you can't alias-only without a primary).

Public vs. internal exposure is per-service — set expose: public or expose: internal on the service that should accept external/VPN traffic. See § Services.

Per-service hostname resolution (v3)

When a service has expose: set, Forge resolves its hostname using this precedence (first match wins):

PrioritySourceExample
1services[i].dns.hostname (explicit per-service override)auth.conservice.ai
2dns.hostname (app-level, only when this is the sole exposed service — backwards-compat)rates.conservice.ai
3Primary service (services[0]): {app_name}.{dns.zone}status-page.conservice.ai
4Non-primary service: {app_name}-{service}.{dns.zone}status-page-dashboard.conservice.ai

The primary-service convention is position-based: services[0] gets the bare apex hostname. Non-primary exposed services append -{service} to the app name, yielding {app_name}-{service}.{dns.zone}. This means the order of services in services[] matters for public apps.

tip

If you need two public-facing services (e.g. a web UI + a public API), either use two separate apps or route through a single public service that reverse-proxies to internal siblings.

Constraint: at most one public service, and it must be services[0]. Any number of internal services are allowed at any position. An all-internal app (no public) has no position constraint.

Per-service image (v3)

The image block on each service controls Docker build inputs:

services:
- name: web
image:
dockerfile: Dockerfile # path relative to repo root
target: web # multi-stage build target
- name: worker
image:
dockerfile: Dockerfile
target: worker
FieldTypeNotes
dockerfilestringPath to Dockerfile. Default: Dockerfile. Replaces the flat services[].dockerfile field (soft-deprecated).
targetstringDocker --target stage name. Enables a single multi-stage Dockerfile serving multiple services. Default: none — the whole Dockerfile builds.
baseenumBase-image variant: musl (default, Alpine-class) or glibc. Use glibc when your service loads glibc-linked native addons (e.g. Temporal's core bridge) — on the musl base those fail at startup with ERR_DLOPEN_FAILED.

When image.dockerfile is set, the flat dockerfile and context fields are ignored. The CI build workflow emits a matrix entry per service, each with its own --target.

Per-service autoscaling (scaling)

The scaling block puts a service under a Kubernetes HorizontalPodAutoscaler (HPA) instead of a fixed replica count. The autoscaler holds the fleet at a target average CPU utilization, adding pods as load rises and removing them as it falls.

services:
- name: api
port: 8080
expose: public
health_path: /health
resources:
cpu_request: "250m" # required when scaling is set
memory_request: "256Mi"
memory_limit: "512Mi"
scaling:
min_replicas: 2
max_replicas: 10
target_cpu: 70
FieldTypeNotes
min_replicasintegerPositive. The floor the autoscaler scales down to — the replica count the Deployment holds under no load. Set ≥ 2 for high availability across rolling updates and single-pod failures.
max_replicasintegerPositive, must be ≥ min_replicas. The ceiling the autoscaler scales up to under load. Size it to peak expected traffic divided by per-pod throughput.
target_cpuinteger1-100. Target average CPU utilization as a percent of the pod's CPU request. The autoscaler adds pods when average CPU exceeds this and removes them when it falls below. 70 is a sensible default for CPU-bound services.

The block is strict — unknown keys are rejected.

Validation rules (enforced at parse time):

  • scaling requires resources.cpu_request on the same service. target_cpu is a percentage of the CPU request, so without a request the autoscaler has no denominator to compute desired replicas against.
  • scaling is mutually exclusive with the per-service replicas field. A service is either autoscaled or fixed-count — declaring both is rejected.
  • min_replicas must be ≤ max_replicas.

Behavior: Autoscaling is active in stg and prod only. In those envs Forge emits an autoscaling/v2 HorizontalPodAutoscaler and omits the Deployment's spec.replicas so the HPA owns the count; ArgoCD is configured not to reconcile that field, so it won't fight the autoscaler. Preview environments are not autoscaled — an autoscaled service's Deployment runs at the Kubernetes default of 1 pod in preview. For a fully-autoscaled app the kustomize per-env replicas: overlay disappears; for a mixed app it stays in place for the non-autoscaled services only.

Recently added field

The scaling block is a recent addition. Apps still on an older renderer build will fail schema validation at CI render — the older build's strict schema rejects scaling as an unknown key. If your render step rejects scaling as unknown, your app's renderer hasn't picked up the field yet; wait for the platform render version to advance (or opt into the canary channel) before adding the block.


App config keys

Dev-supplied secrets (API tokens, OAuth credentials, third-party keys) flow through app_config_keys:

app_config_keys:
- STRIPE_API_KEY
- SENTRY_DSN
- DATADOG_API_KEY
  • UPPER_SNAKE_CASE, must start with a letter.
  • These get REPLACE_ME placeholders in {app}/config Secrets Manager on first apply. Devs populate values via GitHub Environment Secrets → External Secrets Operator (ESO) sync (NOT by editing AWS Secrets Manager directly).
  • PR sync semantics: app_config_keys changes on a PR (additions / removals) are reflected in the PR's preview environment before merge. On merge to main, the same diff flows through to stg/prod placeholders.
  • Reserved prefixes (rejected at scaffold time, before infrastructure rendering): see Reserved env-var name prefixes below for the full list. These flow through the platform-managed ConfigMap, not through dev-supplied secrets. Putting them in app_config_keys is a mistake.

Env vars

Static per-environment config variables injected directly into the app's platform ConfigMap. Use for non-secret values that differ per environment (e.g., the app's own public URL). Secrets should go through app_config_keys + GitHub Environment Secrets instead.

env_vars:
EXTERNAL_HOSTNAME:
prev: "https://auth.prev.conservice.ai"
stg: "https://auth.stg.conservice.ai"
prod: "https://auth.conservice.ai"

Each key is an UPPER_SNAKE_CASE env var name; the value is a map of env name (prev / stg / prod) to string. Only the declared envs receive the value — you may declare a subset (e.g., just stg and prod) if the variable isn't needed in every environment.

  • Delivery path: the platform renderer emits the values into the app's {app}-env ConfigMap at kustomize-render time. The pod picks them up via envFrom.
  • Reserved prefixes rejected: The same prefixes reserved for platform-managed vars (DATABASE_*, S3_BUCKET_*, AWS_REGION, etc.) are rejected at schema validation. See Reserved env-var name prefixes.
  • Typed env contract: Keys declared in env_vars appear in the generated src/env.d.ts and src/lib/env.ts files, so TypeScript apps get compile-time type checking.
  • Preview: The prev value is used for all preview environments (per-PR envs inherit the prev entry).

Resources

Optional top-level block declaring AWS resources the app needs. Every resource type is optional. Unknown top-level keys (e.g., a typo'd resources.bedrocks) are rejected.

Naming convention: account-scoped resources (SQS, SNS, EventBridge, Step Functions, DynamoDB, Firehose, KMS, IAM) use {env}-{region}-{app}-{key} — no prefix, since the account ID in every ARN already disambiguates. S3 buckets are the one exception: they retain the conservice- prefix because S3 bucket names are globally namespaced and need a company prefix to prevent collisions across all of AWS.

resources:
s3:
chat-history:
versioning: true
sqs:
jobs: {}
notifications:
dlq: true
dlq_retention_seconds: 1209600
database:
main:
extensions: [vector, uuid-ossp]
bedrock:
model_ids:
- us.anthropic.claude-sonnet-4-20250514-v1:0
- amazon.titan-embed-text-v2:0
kms:
token-envelope:
description: "Envelope-encrypt OAuth tokens before writing to DDB"
actions: [Encrypt, Decrypt, GenerateDataKey]

Resource key naming (applies to ALL resource types)

Map keys must be:

  • 1-64 chars (resource-specific tighter caps noted per type below)
  • Lowercase, start with a letter
  • Contain only [a-z0-9_-]
  • NOT start with pr- — reserved for per-PR ephemeral resources

The key becomes the resource suffix. Example: s3.chat-history → bucket conservice-{env}-{app}-chat-history.

Per-resource grant fields (s3 / sqs / dynamodb / eventbridge / database)

These resource types may carry optional grant arrays. They name principals that should get tiered access to this resource only — narrower than a full app-level grant. (sns, stepfunctions, firehoses, and kms don't currently take the array-shaped grants; they participate only in the cross-app access: policy described below. KMS has a parallel-but-nested access.team_grants shape — see § KMS.)

  • team_grants: [{ team, tier }] — give a team's per-team Permission Set tiered access to this resource. team is the kebab slug (resolves to team-{team}@conservice.com); tier is admin or readonly.
  • user_grants: [{ email, tier }] — direct-add a single @conservice.com user (use sparingly; team_grants is preferred for code-review auditability).
  • group_grants: [{ group, tier }] — give a non-team Google group (e.g. conservice-finance@conservice.com) tiered access.

Status of materialization:

  • database.{name}.team_grants / user_grants — fully wired end-to-end. The platform materializes the per-app PostgreSQL login role and the cross-team rds-db:connect grant on the team's AWS Permission Set. Recipient can psql into the DB and has NO access to the app's other AWS resources.
  • database.{name}.group_grants — accepted at the schema layer; materializes via the per-team Permission Set enumeration.
  • s3 / sqs / dynamodb / eventbridge team_grants / user_grants / group_grants — accepted at the schema layer; platform-side consumption ships in a follow-on phase. Declaring entries today has no runtime effect on these resource types yet, but the declaration shape is stable.

Per-resource cross-app policy (access / allowed_teams / allowed_apps / tags)

The eight resource kinds that participate in cross-app consumes: (s3, sqs, sns, dynamodb, eventbridge, stepfunctions, firehoses, database) each accept an optional cross-app consume policy:

resources:
s3:
embeddings-cache:
access: team
allowed_teams: [ai]
tags:
sensitivity: pii
FieldTypeNotes
accessenumopen (any app in the org may declare consumes: against this resource), team (only apps owned by a team in allowed_teams), app (only apps in allowed_apps — most restrictive). Auto-defaults to team for database and for any resource carrying tags.sensitivity ∈ {pii, pci, hipaa, soc2} — declare explicitly to override.
allowed_teamsarray of team slugsRequired non-empty when access: team is set explicitly. Empty array ([]) is a parse error — either omit, add an entry, or pick a different access level.
allowed_appsarray of app namesRequired non-empty when access: app is set explicitly. Same kebab-case rules as app_name.
tags.sensitivityenumpii, pci, hipaa, soc2 (auto-default access: team), or public / internal (positive-intent labels, no auto-default). Closed enum — typos like confidential fail at parse time.

Consumer-side declarations go in the top-level consumes: block.

S3 (resources.s3)

s3:
history:
versioning: true
uploads: {}
FieldTypeNotes
versioningboolEnable S3 versioning. Default: true. All platform buckets default versioning ON; set false only when you don't want object history.

KMS encryption + public-access-block are always on. Per-bucket team_grants / user_grants / group_grants are accepted (renderer-side consumption is in flight — see above).

Bucket-key cap: 20 chars (enforced at the provisioning layer, not at schema parse). S3's 63-char bucket-name limit minus conservice-{env}-{app}- prefix leaves ~24 chars; cap at 20 for safety.

Resolved bucket name: conservice-{env}-{app}-{key} (e.g., conservice-prod-my-app-history). S3 retains the conservice- prefix because S3 bucket names are globally namespaced.

Emitted env var: S3_BUCKET_{KEY} → e.g. S3_BUCKET_HISTORY containing conservice-prod-my-app-history.

SQS (resources.sqs)

sqs:
jobs:
visibility_timeout: 30
retention_seconds: 1209600
dlq: true
max_receive_count: 5
FieldTypeNotes
dlqboolProvision a Dead-Letter Queue and wire the redrive policy on the main queue. Default: true.
dlq_retention_secondsintDLQ retention in seconds. Default: 1209600 (14 days, AWS max).
visibility_timeoutintSeconds. Default: 30. Set higher when the consumer's per-message processing time can exceed 30s — otherwise the same message is redelivered while still being processed.
retention_secondsintMain-queue retention in seconds. Default: 345600 (4 days; AWS max 14 days).
max_receive_countintDLQ-trigger threshold. Default: 5. Lower for fail-fast; higher when transient retries are normal.

Server-side encryption via SQS-managed keys is always on. The pod role gets Send/Receive/Delete on the queue and its DLQ.

Resolved name: {env}-use1-{app}-{key}-queue.

Emitted env vars: SQS_QUEUE_{KEY} (queue URL) and SQS_QUEUE_{KEY}_ARN → e.g. SQS_QUEUE_JOBS.

SNS (resources.sns)

sns:
events: {}

No type-specific knobs declared today — empty object opts in. The pod role gets sns:Publish on the topic ARN.

Resolved name: {env}-use1-{app}-{key}-topic.

Emitted env var: SNS_TOPIC_{KEY}_ARN → e.g. SNS_TOPIC_EVENTS_ARN.

DynamoDB (resources.dynamodb)

dynamodb:
sessions:
hash_key: id
hash_key_type: S
ttl_attribute: expires_at
point_in_time_recovery: true
events:
hash_key: stream_id
range_key: event_id
range_key_type: S
billing_mode: PAY_PER_REQUEST
gsi:
by-status:
hash_key: status
range_key: created_at
projection_type: ALL
FieldTypeNotes
hash_keystringRequired. Partition key attribute name.
hash_key_typeenumS (string), N (number), B (binary). Default: S.
range_keystringSort key attribute name.
range_key_typeenumSame set as hash_key_type. Default: S.
billing_modeenumPAY_PER_REQUEST (default) or PROVISIONED.
gsiobject mapGlobal Secondary Indexes. Each entry: hash_key (required), range_key?, projection_type? (ALL / KEYS_ONLY / INCLUDE, default: ALL). Key types on GSI attributes inherit from the parent table's hash_key_type / range_key_type — there are no per-GSI type overrides.
ttl_attributestringAttribute holding TTL epoch seconds. Enables TTL when set.
point_in_time_recoveryboolDefault: true.

KMS-encrypted by default. The auto-emitted pod-role policy covers data-plane read/write/query/scan PLUS dynamodb:DescribeTableDescribeTable is the canonical no-op control-plane probe for readiness checks (verifies IAM + resource exists without leaking item data), so app /readyz handlers can call it without hitting AccessDenied.

Resolved name: {env}-use1-{app}-{key}.

Emitted env var: DYNAMODB_TABLE_{KEY} → e.g. DYNAMODB_TABLE_SESSIONS containing prod-use1-my-app-sessions.

EventBridge (resources.eventbridge)

eventbridge:
domain:
rules:
order-placed:
pattern:
source: ["my-app.orders"]
detail-type: ["OrderPlaced"]
description: "Fire on new orders"
FieldTypeNotes
rulesobject mapEach rule: pattern (object — EventBridge event pattern, validated server-side at apply time), description (string).

Resolved bus name: {env}-use1-{app}-{key}.

Emitted env var: EVENTBRIDGE_BUS_{KEY} → e.g. EVENTBRIDGE_BUS_DOMAIN.

Step Functions (resources.stepfunctions)

stepfunctions:
flow:
type: STANDARD
definition: |
{
"StartAt": "Hello",
"States": { "Hello": { "Type": "Pass", "End": true } }
}
log_level: ALL
log_retention_days: 30
FieldTypeNotes
typeenumSTANDARD (default) or EXPRESS.
definitionstringRequired. ASL JSON definition.
log_levelenumALL / ERROR / FATAL / OFF.
log_retention_daysintCloudWatch log retention.

Key cap: 16 chars (enforced at the provisioning layer — IAM role name {prefix}-sfn-{key}-role hits AWS's 64-char ceiling; not enforced at schema parse).

Resolved name: {env}-use1-{app}-{key}.

Emitted env var: SFN_ARN_{KEY} → e.g. SFN_ARN_FLOW.

Bedrock (resources.bedrock)

bedrock:
model_ids:
- us.anthropic.claude-sonnet-4-20250514-v1:0
- amazon.titan-embed-text-v2:0
FieldTypeNotes
model_idsarray of stringsRequired. Non-empty. AWS Bedrock model IDs. Adds bedrock:InvokeModel to the pod role.
knowledge_basesboolAdds Knowledge Base API permissions. Default: false.
guardrailsboolAdds bedrock:ApplyGuardrail permission. Default: false. The per-app guardrail resource itself is not provisioned yet — IAM scope only.
Anthropic models need a region prefix

Bare anthropic.* model IDs are rejected. Use the regional inference profile form: us.anthropic.claude-sonnet-4-20250514-v1:0, not anthropic.claude-sonnet-4-20250514-v1:0.

Model ID validation: Each entry must be a valid Bedrock invocation target — either a versioned foundation-model ID ending in :N (e.g. amazon.titan-embed-text-v2:0) or a cross-region inference profile starting with a region prefix (us. / eu. / apac. / global. / ap.). Anthropic models REQUIRE a regional inference profile prefix — bare anthropic.* IDs are rejected at parse time because AWS Bedrock fails them at invocation with "on-demand throughput isn't supported" (validated 2026-05-09).

Common gotcha: the schema accepts model_ids (plural, underscore). The block uses strict-key validation, so any unknown key fails — including the common typos enabled, models, and singular model_id. Presence of the block (non-null) is the opt-in; there's no enabled: true.

Database (resources.database)

The simplest database declaration — just a PostgreSQL database with no extras:

database:
main: {}

Full example with all optional fields:

database:
main:
extensions: [vector, uuid-ossp]
schemas: [app, audit] # extra schemas owned by the migration role
connection_limit: 100
team_grants:
- team: data
tier: readonly
user_grants:
- user: alice
tier: admin
# migrations default ON (managed Liquibase) — omit the field entirely to keep it.
# Override with a custom command, or set `migrations: false` to opt out:
migrations:
command: ["npm", "run", "migrate"]
runs_on: [prev, stg, prod]
seed: # preview-only fixture seeding (optional)
command: ["npm", "run", "seed:preview"]
FieldTypeNotes
extensionsarray of stringsPostgreSQL extensions to enable. Allowlist: vector (NOT pgvector — the schema rejects pgvector; pgvector is the project name, vector is the extension name), uuid-ossp, pg_trgm, hstore, citext, postgis, btree_gist, btree_gin, unaccent, fuzzystrmatch.
connection_limitintPer-role PostgreSQL CONNECTION LIMIT. Default: unlimited at the module.
team_grantsarrayPer-DB team grants — { team: <slug>, tier: admin|readonly }. On apply, the team's per-team Permission Set gains rds-db:connect on arn:aws:rds-db:*:*:dbuser:*/aws-{app}-db-{tier}. Recipient gets DB-ONLY AWS access (no S3, queues, secrets-other-than-the-DB-secret, console for any other app resource).
user_grantsarrayPer-DB user grants — { user: <google-username>, tier: admin|readonly }. Narrow per-user exception adding rds-db:connect on aws-{app}-db-{tier} for a single individual. user is the LEFT side of @conservice.com (e.g. alice, bob.smith) — no @conservice.com suffix.
group_grantsarrayPer-DB Google-group grants — { group: <conservice.com group email>, tier }. For non-team groups (e.g. conservice-finance@conservice.com). Materializes via the per-team Permission Set enumeration.
schemasarray of stringsAdditional PostgreSQL schema names to pre-create in this database (default: none — the app uses public). Each name is created by the platform and owned by the migration role, so the managed migration job can create and own objects in it while the runtime service role gets USAGE only (no DDL). Use for frameworks that default to a named schema (e.g. EF Core HasDefaultSchema("app")). The same schemas are pre-created in per-PR (preview) databases. Each entry must match ^[a-z][a-z0-9_]*$, be ≤ 63 chars, and is not public, information_schema, or any pg_* name — those are reserved/system schemas and rejected at parse.
migrationsfalse or objectManaged database migrations. Default ON (the field is optional and defaults to enabled when omitted): forge bakes Liquibase into the app image and runs a migrate Job — a migration-job.yaml ArgoCD Sync hook — in every env overlay before app pods start, so schema is bootstrapped before traffic. Set migrations: false to turn it OFF — forge then skips both the migrate Job and the Liquibase/JRE image layer (use this when the app manages its own schema or has none). Provide an object { command: [string], runs_on?: [env] } to customize. See § Database migrations below.
seedobjectPreview-only fixture-data seed runner. { command: [string], runs_on?: [prev] }. When set, forge emits a seed-job.yaml ArgoCD Sync hook into the preview overlay only, running command as the app's runtime role after the Deployment is healthy. Preview-only by design — stg / prod are rejected in runs_on. See § Preview-only seed below.

Database tenancy: Aurora is a SHARED cluster across all apps in an env. Each database.{key} declaration creates a logical PostgreSQL database inside the shared cluster.

Resolved DB name: {app_underscored} (hyphens become underscores). Service user: {app_underscored}_svc.

engine is NOT a valid key. Always aurora-postgresql (set at the platform layer). Strict-key validation rejects any unknown key — engine is a particularly common one to accidentally include because it's standard in raw RDS Terraform. Don't.

Emitted env vars: DATABASE_HOST, DATABASE_PORT, DATABASE_NAME, DATABASE_USER — your app reads these from the pod environment. IAM auth (rds_iam) is enabled automatically; the platform provisions the PG roles and IAM bindings. You don't create DB users in forge.yaml.

Granting another team DB access: add a team_grants entry. That's the entire dev-facing surface. The platform wires the IAM role, Permission Set, and PG login role automatically.

How DB access works under the hood

Two grant surfaces apply, layering from broad → narrow. Both wire to the same per-app PostgreSQL login role aws-{app}-db-{tier}.

1. Team-keyed full AWS access (configured outside forge.yaml) — the owning team and any team granted DB access get full AWS access via per-team Permission Sets (team-{team}-{env_short}-admin / team-{team}-{env_short}-readonly). AWS access is keyed by team, not by app — there is no aws-{app}-admin/readonly group.

2. Per-DB grants on database.{name} — narrow scope, DB-only. These are the declarations developers write in forge.yaml:

  • tier: adminfull read/write via login role aws-{app}-db-admin. No access to S3, queues, or other AWS resources.
  • tier: readonlySELECT-only via login role aws-{app}-db-readonly. Same DB-only narrow scope.

Per-DB grants are the right surface for cross-team data access (e.g. team-data needs read-only access to the billing app's databases but should NOT see S3 or queues). Multiple teams sharing a tier share one cluster-level login role; per-team identity gating happens at the Permission Set layer.

warning

Removed fields: admin_groups, readonly_groups, admin_users, readonly_users are no longer accepted. Use team_grants / user_grants instead.

Database migrations

Managed migrations are ON by default. When you declare a database and omit the migrations field, forge bakes Liquibase (plus a JRE layer) into the app image and emits a migrate Job — migration-job.yaml, an ArgoCD Sync hook — in every env overlay. The Job re-uses the app image, mints an RDS IAM token, and runs the migration against the per-env (or per-PR) database before app pods start. Sync-wave ordering guarantees the Deployment only rolls once the migrate Job completes, so app code never sees an un-migrated schema. The default command for the bundled Liquibase wrapper is ["infra/database/migrations/migrate.sh"] (mints the IAM token, then runs liquibase update). You don't write any of this — declaring the database is enough to get migrations.

The field is a union of three shapes:

migrations: valueBehavior
omitted (default)Managed migrations ON — forge runs the Liquibase migrate Job and bakes the Liquibase/JRE layer into the image. Nothing to declare.
falseManaged migrations OFF — forge skips both the migrate Job and the Liquibase/JRE image layer. Use when the app manages its own schema, or has none.
{ command, runs_on? }Custom runner — replace the default command and/or scope which env overlays run it.

When you supply an object:

FieldTypeNotes
commandarray of stringsRequired, non-empty. Argv array the Job's container runs — each entry is a literal string, no shell expansion. Tool-agnostic: point it at node-pg-migrate, Prisma, Atlas, a psql script, etc. (also update the install layer in your Dockerfile to match).
runs_onarray of enumsWhich env overlays emit the migrate Job. Each entry is prev, stg, or prod. Default (omitted): all three. Scope to a subset (e.g. [prev, stg]) to skip prod migrations during a schema-stable window.
# Opt out of managed migrations entirely:
database:
main:
migrations: false

# Or supply a custom runner:
database:
main:
migrations:
command: ["npm", "run", "migrate"]
runs_on: [prev, stg, prod]

Preview-only seed

The optional seed block loads fixture data into preview databases only — fake data must never reach staging or production, so the runner is hard-scoped to prev. When set, forge emits a seed-job.yaml ArgoCD Sync hook into the preview overlay only, running command as the app's runtime service role (fixtures are DML inserts, not DDL — they don't use the migration identity) after the app Deployment is healthy.

FieldTypeNotes
commandarray of stringsRequired, non-empty. Argv array the seed Job runs (e.g. ["npm", "run", "seed:preview"]). Literal strings, no shell expansion. Make it idempotent (e.g. INSERT ... ON CONFLICT DO NOTHING) — ArgoCD re-syncs re-run it.
runs_onarray of enumsOnly prev is accepted; stg / prod are rejected at parse. Optional, defaults to [prev] — the field mostly exists to document intent.
database:
main:
seed:
command: ["npm", "run", "seed:preview"]

Temporal (resources.temporal)

temporal:
retention_days: 30
api_key_expiry: "2027-05-03T00:00:00Z"
FieldTypeNotes
retention_daysintWorkflow history retention. Default: 30.
api_key_expirystringRequired. ISO 8601 timestamp. Pinned at scaffold; runtime re-reads from forge.yaml (deterministic re-render: byte-identical input produces byte-identical output). Rotate by editing this value and re-rendering.

Common gotchas: rejected keys include enabled, namespace, regions, search_attributes, enable_delete_protection. Presence of the block is the opt-in. Namespace is derived from app_name.

Resolved namespace: {app}-{env}.<your-temporal-cloud-namespace>.

Firehose (resources.firehoses)

firehoses:
webhook-archive:
destination: s3
bucket: webhook-events # MUST match a key in resources.s3
prefix: "events/"
buffer_size_mb: 5
buffer_interval_seconds: 300
compression: GZIP
FieldTypeNotes
destinationstringRequired, only s3 today. Redshift / OpenSearch / Splunk are deferred.
bucketstringRequired. Must match a key declared in resources.s3 (cross-validated at parse time).
prefixstringS3 key prefix for delivered records. Default: "".
buffer_size_mbint1-128. Default: 5.
buffer_interval_secondsint60-900. Default: 300.
compressionenumUNCOMPRESSED, GZIP (default), SNAPPY, ZIP, HADOOP_SNAPPY.

Key cap: 16 chars (enforced at the provisioning layer — IAM role {prefix}-fh-{key}-role hits 64-char ceiling; not enforced at schema parse).

Resolved name: {env}-use1-{app}-{key}.

Emitted env var: FIREHOSE_STREAM_{KEY} → e.g. FIREHOSE_STREAM_WEBHOOK_ARCHIVE.

KMS (resources.kms)

Per-app customer-managed KMS keys (CMKs) for app-initiated envelope encryption — e.g., an auth service that envelope-encrypts upstream IdP tokens before writing them to DynamoDB. Distinct from the AWS-managed SSE-KMS that already covers DDB and S3 at rest (alias/aws/dynamodb / alias/aws/s3); that's transparent to the app. The case here is app code calling kms:Encrypt / Decrypt / GenerateDataKey directly against a CMK the app controls.

resources:
kms:
token-envelope:
description: "Envelope-encrypt upstream IdP tokens before DDB writes"
actions:
- Encrypt
- Decrypt
- GenerateDataKey
- DescribeKey
rotation: enabled

Key name (the map key) is kebab-case, 2-20 chars, must start with a letter and end alphanumeric. Same reserved-prefix list as app_name. Pick a name that describes the encryption purpose (token-envelope, secrets), not the resource type.

FieldTypeNotes
descriptionstringOptional human-readable description (shown in the KMS console). When rotation: disabled is set, the description should mention the compliance reason — auditor-friendly. Max 8192 chars.
key_specenumSYMMETRIC_DEFAULT (default; only value accepted today). Asymmetric specs (RSA_*, ECC_*) are deferred — they need a different action allowlist (Sign/Verify/GetPublicKey) and will arrive in a future release.
actionsarray of stringsRequired, non-empty. Data-plane KMS actions to grant to the pod role on this key. Allowlist: Encrypt, Decrypt, GenerateDataKey, GenerateDataKeyWithoutPlaintext, ReEncryptFrom, ReEncryptTo, DescribeKey. Unknown verbs fail at parse time. Key administration verbs (CreateKey, ScheduleKeyDeletion, PutKeyPolicy, ...) are deliberately omitted — key lifecycle is managed by the platform, not the app.
rotationenumenabled (default — annual KMS rotation) or disabled. Set disabled only with a compliance reason in description.
tagsobject (string→string)Optional pass-through tags applied to the KMS key. Standard k/v string map; no platform-side validation beyond the type. Use for cost-allocation (cost_center: ...) or compliance markers.
access.team_grantsarrayPer-key team grants (accepted at the schema level; platform-side consumption deferred to a follow-on release — declaring entries today has no runtime effect).

Resolved name: {env}-{region_code}-{app}-{key_name}-key (e.g., prod-use1-auth-service-token-envelope-key). Resolved alias: alias/{env}-{region_code}-{app}-{key_name}-key.

Auto-emitted env vars (KMS_KEY_ is a reserved env-var prefix):

  • KMS_KEY_{KEY_NAME_UPPER_SNAKE}_ID (e.g. KMS_KEY_TOKEN_ENVELOPE_ID)
  • KMS_KEY_{KEY_NAME_UPPER_SNAKE}_ARN

App code reads these from the pod environment — never construct a KMS key ID or alias in app code.


Cross-app access (per-resource policy + consumes)

Cross-app resource access is a two-sided contract: producers declare who is allowed to consume each resource (the per-resource access / allowed_teams / allowed_apps / tags fields documented above); consumers declare which producer resources and services they want to use, via the top-level consumes: block.

# Consumer side: my-app declares it wants to read rates' embeddings cache
consumes:
resources:
- producer_app: rates
resource_kind: s3
resource_key: embeddings-cache
actions: [read]
services:
- producer_app: rates
service_name: api
FieldTypeNotes
consumes.resourcesarrayCross-app resource declarations. Each entry: { producer_app, resource_kind, resource_key, actions: [string] }. The resolver matches each entry against the producer's resources.{kind}.{key} and the producer's access / allowed_teams / allowed_apps policy at scaffold/modify time.
consumes.servicesarrayCross-app service declarations. Each entry: { producer_app, service_name }. Data-only today — no Istio AuthorizationPolicy is emitted yet (deferred to a future release). Records intent so future wiring lands without a schema migration.

producer_app follows the same kebab-case rules as app_name (reserved prefixes rejected). resource_kind is one of: s3, sqs, sns, dynamodb, eventbridge, stepfunctions, firehoses, database.

actions: is high-level, not raw IAM verbs — entries like read, write, consume, produce, admin. The action-expansion utility maps each high-level action to per-kind IAM actions at emit time; unknown high-level actions for a kind throw at scaffold/modify time (NOT at schema parse — the schema treats actions as opaque strings to keep the kind/action coupling in one place).

Same-account only today. Cross-account consumes raise cross_account_not_supported at resolve time rather than at parse time.

Preview-environment plumbing: when the consumer is itself running in a per-PR preview environment, the resolver injects the consumer's pr_number into non-S3 ARNs to disambiguate per-PR resources (S3 is per-app, not per-PR, so its ARN doesn't carry the segment).


Authentication (auth)

The auth field controls who can reach the app at the identity layer — whether the app requires a logged-in @conservice.com user. It pairs with authz: auth is authentication (WHO can reach the app), authz is AVP Cedar authorization (WHO is permitted, and with which roles, once authenticated). See How auth and authz fit together below.

auth is a union — either the literal string "none" or an auth: block.

auth: "none" — public, no authentication

auth: "none"

A no-auth app. Forge skips tier-group creation, Auth Service registration, and AVP setup. The app is served via a private ALB (not the Istio gateway) — there is no authentication enforcement. No-auth internal apps are VPN-only; no-auth public apps are not yet supported.

Two combinations are rejected at validation when auth: "none":

  • authz: cannot be set. A public app has no authenticated principal, so AVP grants are meaningless.
  • Routing is platform-derived. A no-auth app is automatically kept OFF the authenticated (ext_authz-enforcing) routing tier and served via a private ALB — there is nothing to configure. (A legacy gateway: key is accepted for back-compat and stripped; never set it.)

auth: block — authenticated

auth:
access_mode: restricted
strict: false
hidden: false
FieldTypeRequiredNotes
access_modeenumno"restricted" (default) allows only users granted access via AVP Cedar policies (your team is seeded at scaffold; everything after that is managed in auth-portal or via forge's grant tools). "all" allows any authenticated @conservice.com user — authentication is still enforced, per-user authorization is not.
strictbooleannoDefault false. When true, the Auth admin UI cannot grant access beyond what's declared — forge.yaml is the sole authority. Recommended for prod-critical or Aurora-touching apps.
hiddenbooleannoDefault false. Hides the app's tile from the Portal dashboard. The app stays reachable at its hostname; it just doesn't appear in the user's tile grid.

Migrating from auth.tiers? The per-app tier-group model (tiers:, self_register:, per-service auth.tier) has been retired — those fields now fail validation. Role and access management moved to authz grants (AVP Cedar): declare authz.initial_grants for scaffold-time seeds and manage everything else in auth-portal. Your app reads the caller's identity and roles from the x-verified-* headers.

How auth and authz fit together

These are two distinct layers, both enforced at the Istio gateway before the request reaches your app:

  • auth — authentication. Validates the user's Google session. Answers WHO can reach the app at the identity layer.
  • authz — AVP (AWS Verified Permissions) Cedar authorization. Checks whether the authenticated user holds a grant for this app. Answers WHO is permitted once authentication succeeds. Fail-closed: no grant means 403.

Your app reads the authenticated caller's identity and roles from the injected x-verified-* headers (see Reading user identity and roles in your app) — role logic lives in your code against those headers, not in per-service config.

MCP servers (and any caller that can't follow a browser login redirect) use the per-service auth.kind: bearer field instead of the cookie front door. See § Authenticating an MCP server.


Authorization (authz)

All authentication and authorization for apps on the greenfield platform flows through the Istio gateway:

  1. Authentication — validates the user's Google session. Any @conservice.com employee is authenticated. No app code needed.
  2. Authorization — checks if the authenticated user has a grant for this app. Fail-closed: no grant = 403.

You don't write auth code. The platform handles both layers at the gateway before the request reaches your app. If your app needs to know who the user is or what roles they have, read the x-verified-* headers from the request.

How it works

User → Google login → Authorization check → Your app

ALLOW → request forwarded with x-verified-* headers (identity + roles)
DENY → 403 Forbidden (user never reaches your app)

forge.yaml setup

The simplest setup — your team automatically gets access:

services:
- name: api
port: 8080
expose: internal
# gateway auto-derived: istio (authenticated apps get ext_authz enforcement)
health_path: /health
dns:
zone: conservice.ai
authz: {}

That's it. At scaffold time, forge automatically creates a group-based grant for Group::"team-{team}@conservice.com" on Application::"{app_name}" — the Admin role in preview, the Access role in staging/production. Every member of your team can access the app immediately — no individual grants needed.

With additional individual grants (optional — for people outside your team):

authz:
initial_grants:
- alice@conservice.com
- bob@conservice.com
FieldTypeRequiredNotes
initial_grantsarray or per-env mapnoDefault: []. Grants seeded at scaffold time, beyond the automatic team grant. Two accepted shapes — see below.

initial_grants accepts either of two shapes:

  1. Legacy flat list — an array of @conservice.com emails. Each listed user is granted the access role in every env (prev / stg / prod).

    authz:
    initial_grants:
    - alice@conservice.com
    - bob@conservice.com
  2. Per-env, per-role map — keys are env names (prev / stg / prod), each mapping a role name to a list of principals. A principal is either a team slug in team-<slug> form or a @conservice.com email.

    authz:
    initial_grants:
    prev:
    admin: [team-ai]
    access: [alice@conservice.com]
    stg:
    access: [team-ai]
    prod:
    access: [team-ai]

    Every role key used in the map must be a built-in (access / admin) or a name declared under authz.roles[] — an undeclared-role grant is rejected at parse (it would otherwise be silently dropped and the principal would never get access). Each env is optional; omit envs you don't want to seed.

Key points:

  • Auth is automatic. Authenticated apps (the default) auto-derive to the Istio gateway, which runs oauth2-proxy (authentication) + AVP validator (authorization) via ext_authz. Users must log in with Google and have an AVP grant to access the app. No gateway: field needed.
  • Your team always has access. Forge creates a team-group grant automatically at scaffold time. You don't need initial_grants for your own team members.
  • initial_grants is scaffold-time only. Entries are seeded once for people outside your team. After that, auth-portal manages all grants (create, revoke, access requests).
  • No-auth apps use auth: "none". This auto-derives off Istio to a private ALB (no authentication). No-auth internal apps are VPN-only; no-auth public apps are not yet supported.

How authorization works

The platform handles authorization automatically. You declare what you need in forge.yaml; the platform creates the authorization policies and enforces them at the gateway.

Roles

Every app starts with two built-in roles:

RoleDescription
AccessCan use the application
AdminCan manage grants and settings for the app via auth-portal or forge's grant tools

Your team automatically gets the Admin role in preview and the Access role in staging/production at scaffold time. Use auth-portal to grant roles to additional people or groups.

Custom roles

Apps can declare custom roles beyond the built-in Access and Admin:

authz:
roles:
- name: editor
description: "Can edit content but not manage access"
- name: viewer
description: "Read-only access to dashboards"

Custom roles appear automatically in auth-portal's grant management UI. Admins can assign them to individuals or groups — no code changes needed.

Reserved role names. The names access, admin, and ping are reserved platform actions and cannot be used as a custom authz.roles[].name — declaring one is rejected at parse. access is the default individually-grantable role; admin is group-scoped (used for auth-portal admin gating) and is not individually grantable; ping is a platform health action. To give a single user elevated access, declare a custom role (e.g. editor) and grant that.

Roles are additive — add a new role to authz.roles and it's available in auth-portal immediately. No re-scaffold needed.

Reading user identity and roles in your app

The gateway sets these headers on every authenticated + authorized request:

HeaderValueExample
x-verified-emailUser's emailalice@conservice.com
x-verified-nameDisplay name (may be empty)Alice Smith
x-verified-groupsComma-separated group membershipsteam-sre@conservice.com
x-verified-rolesComma-separated roles the user has on this appaccess,editor

Example — display user info (TypeScript/Express):

app.get("/api/me", (req, res) => {
res.json({
email: req.headers["x-verified-email"],
name: req.headers["x-verified-name"],
roles: req.headers["x-verified-roles"]?.split(",") ?? [],
});
});

Example — gate a feature by role:

app.put("/api/content/:id", (req, res) => {
const roles = req.headers["x-verified-roles"]?.split(",") ?? [];
if (!roles.includes("editor")) {
return res.status(403).json({ error: "Editor role required" });
}
// ... update content
});

That's it. No auth libraries, no JWT verification, no session management. The platform handles authentication, authorization, and role resolution before the request reaches your app.

Managing grants after scaffold

Use auth-portal — or forge's grant tools (forge_grant_role / forge_revoke_role / forge_list_grants) — to:

  • View grants — see who has access to each app and their role (Access / Admin)
  • Create grants — grant an individual or group access to an app
  • Revoke grants — remove access
  • Request access — employees can request access; admins approve (auth-portal only)

Who can manage grants. Grants are authored by app admins (anyone holding the Admin role on the app in the target environment) or platform admins — the same rule in auth-portal and the forge grant tools. Team membership alone grants access to the app, not grant-authoring: your team is seeded Admin in preview (self-service — manage your own preview grants) but Access-only in staging/production, so creating or revoking staging/production grants requires an app admin or a platform admin. Viewing an app's grant list has the same bar as authoring. If a grant call is denied, you don't hold Admin on the app in that environment — ask an app admin or a platform admin, or file an access request in auth-portal.

Grants are per-environment — a grant in staging doesn't automatically exist in production. The team-group grant and any initial_grants are created in all envs at scaffold time, so your team has access everywhere from day one. Additional grants created via auth-portal are per-env (a grant added in staging must be separately added in prod if needed).

Per-environment grants

Authorization is per-environment — a grant in staging doesn't automatically exist in production. The team grant and initial_grants are created in all environments at scaffold time. Additional grants created via auth-portal are environment-scoped.


Authenticating an MCP server

If you're building an MCP server (a tool surface that Claude, an IDE, or an agent connects to), declare one field and the platform handles auth. You write zero auth code — no OAuth flow, no JWT verification, no token store. Same contract as a web app, different front door.

The web front door (auth / authz) is browser-shaped: oauth2-proxy sets a session cookie and bounces the user through a Google login redirect. An MCP client is a programmatic HTTP client with no cookie jar and nowhere to follow a redirect. So MCP servers authenticate over the OAuth 2.1 bearer path instead, per the platform's MCP auth model. The mechanics differ, but what reaches your handler is identical: the same x-verified-* headers the cookie path injects.

What to declare

Set auth.kind: bearer on the MCP service. That's the entire dev-facing surface.

forge_version: 3.0.0
app_name: google-sheets-mcp
team: ai
language: typescript
dns:
zone: conservice.ai
required: true
services:
- name: mcp
port: 8080
expose: internal # the bearer gateway is the public entry — the service itself stays off the public edge
health_path: /health
auth:
kind: bearer # OAuth 2.1 bearer path — no auth code in your app
authz:
initial_grants:
- alice@conservice.com

kind: bearer flags the service as one whose callers can't follow OIDC redirects. Forge routes it through the dedicated bearer gateway instead of the cookie chain. Everything else in your forge.yaml (resources, DNS, monitors) works the same as any other app.

What forge emits

Declaring auth.kind: bearer drives the scaffold to wire the full OAuth 2.1 resource-server contract on your behalf:

  • An RFC 9728 discovery document at /.well-known/oauth-protected-resource, the file a compliant MCP client fetches to find the authorization server.
  • A route through the dedicated bearer gateway to avp-validator Path B (JWT verification), kept off the human-web cookie chain. The gateway strips any client-supplied x-auth-request-* and x-verified-* headers so a caller can't spoof an identity past the validator.
  • A per-app Cedar authorization policy in AWS Verified Permissions (AVP), the same Application::"{app}" resource model the web authz block uses. Your team is seeded automatically at scaffold time, exactly like a web app (Admin in preview, Access in staging and production).

How a client authenticates

You don't implement any of this — it's what happens at the gateway when an MCP client connects. Knowing the shape helps when you read a 401 or wonder where the identity header comes from:

  1. The client calls your server with no token. avp-validator returns 401 with WWW-Authenticate: Bearer resource_metadata="https://<your-host>/.well-known/oauth-protected-resource".
  2. The client fetches that discovery document and finds the central platform authorization server.
  3. The client does Dynamic Client Registration plus a PKCE authorization-code flow against the platform AS. It receives a token scoped to your server's audience, carrying the caller's Google-group claims.
  4. The client retries with Authorization: Bearer <token>.
  5. avp-validator verifies the JWT signature and audience, evaluates your app's Cedar policy, and on success injects the identity headers before the request reaches your handler. On failure it returns 401 with the WWW-Authenticate challenge again.

Tokens are deliberately short-lived: the access token expires after 15 minutes, and the client holds a 30-day refresh token to renew it silently — a compliant MCP client refreshes before expiry, so your app never handles token renewal.

Caveat — brand-new bearer MCP previews and token exchange

A brand-new auth.kind: bearer service being exercised in a per-PR preview could fail step 3 with invalid_target ("resource does not name a registered MCP server"): the preview ingress raises the bearer wall from your PR branch, but the authorization server's registration was only emitted from main — so until your auth.kind: bearer declaration merged, the AS refused to mint tokens for the preview hostname. A platform fix making the two emits symmetric has landed — previews hydrated before it deployed may need a fresh hydrate (push a commit to the PR) to pick it up. Steady-state (stg/prod, and previews of services whose bearer declaration is already on main) is unaffected. If you hit invalid_target on a new bearer MCP preview, it's this gap — not your client.

What your handler reads

On an authorized request, your handler reads the same x-verified-* headers documented in Reading user identity and roles in your app — the bearer path and the cookie path converge on the identical header contract:

HeaderValue
x-verified-emailAuthenticated caller's email
x-verified-groupsComma-separated Google-group memberships
x-verified-rolesComma-separated roles the caller holds on this app

(This is the subset most MCP handlers need — the full header set, including x-verified-name, is in the linked canonical table.)

Read the header. Write tool logic. Don't verify the JWT yourself — the platform already did, and re-verifying in-app re-introduces exactly the per-app auth code this path eliminates.

What you must NOT do

  • Don't put an MCP server behind auth: "none". That serves it off a private ALB with no authentication. An unauthenticated tool surface is exactly the failure mode the platform's MCP auth model exists to close.
  • Don't try to use the cookie front door for MCP. A programmatic client can't complete the Google login redirect — it gets a 302 it can't follow. auth.kind: bearer is the only supported MCP auth path.
  • Don't verify the bearer token in your app. avp-validator does it at the gateway. Read x-verified-* and trust it.
  • Don't hold MCP session state in a per-pod in-memory map. It split-brains at replicas > 1. Use a shared store or a stateless transport. (Per-pod session state forces a service down to a single replica — use a shared store instead.)
forge's own MCP is different

forge's own /mcp runs a standalone authorization server as platform infrastructure — that's not the pattern here. A dev scaffolding a new workload MCP server (google-chat-mcp, google-sheets-mcp, and the like) uses auth.kind: bearer and gets the central platform AS described above. You never stand up your own authorization server.

Machine / agent callers

The flow above is the human-in-the-loop path: a person's Claude or IDE connecting on their behalf, resolving to their Google identity. Laptop agents (dev Claude Code, agent cockpits) are human-attended and use this same path — they authenticate as the dev, with no machine credentials. A pure machine caller — an EKS workload calling an MCP with no human in the loop — does not use client credentials or any stored key: the pod proves its identity with its existing AWS Pod Identity (no stored credential), and the platform mints a short-lived bearer scoped to the target MCP's audience, authorized as a machine principal for the pod's team. Same-team machine access is seeded automatically at scaffold; there is no forge.yaml field to set — cross-team machine access needs a platform-team grant.


GitHub access (github)

Your app can read (and, in one lane, write) GitHub at runtime without holding any GitHub credential. You declare the capability in forge.yaml; at runtime your pod mints a short-lived, narrowly-scoped GitHub App installation token from forge — authenticated purely by its Pod Identity AWS role. There is no PAT, no App private key, and nothing in a secret to rotate.

The declaration is a top-level github field; this section is the how-to for using the access once you've declared it — the mint call, what comes back, and how to consume it.

Declaring access

The github block is a top-level, opt-in field. Every sub-field is optional; omit the block entirely for no GitHub access.

github:
read: true # READ your own team's forge-created repos
# write: [issues] # ONE write lane (see below) — exactly one element
# projects: read # org GitHub Projects — read is self-serve; write is SRE-gated
FieldTypeNotes
readbooleantrue grants a read capability over your own team's forge-created repos. Breadth + permission scope are server-side and platform-managed — you can't name a target.
writearray (exactly 1)[code] or [issues]exactly one lane per app (.min(1).max(1); write: [] is a loud parse error, omit the field to grant nothing). code = contents + pull-requests, issues = issues only. Targets/breadth/permissions are server-derived from the verified caller — never from this file.
projectsenumread = org-wide Projects-V2 read (self-serve; read can't mutate a board). write = org-wide board write (records intent only; activates only on an explicit SRE grant). This is a sibling of the write lanes above, not one of them.

write includes read for its own repos — GitHub permissions are leveled, so a write token can also read. metadata:read is always included. To grant no write, omit write (an empty write: [] is rejected).

Using a read token

Declaring github: { read: true } provisions a mint endpoint your pod calls at runtime. Your pod authenticates with a presigned sts:GetCallerIdentity request (the standard AWS SigV4 mechanism its Pod-Identity role already gives it) — forge replays it, reads the caller ARN from the STS response, maps it to your team, and mints a token clamped to your team's repos. You never send a secret and you never name a repo.

# From inside your pod. The mint base is https://forge.conservice.ai
# (internal mesh — not on the public edge). The body carries a presigned
# GetCallerIdentity blob (SigV4). IMPORTANT: the presign must include AND
# SIGN the header `x-forge-mint-serverid: forge-github-read` — a default
# STS presign does not, and without it the mint returns 401. Full envelope
# recipe + copy-paste signers:
# recipe + copy-paste signers: ask the platform team for the forge mint-APIs guide.
curl -sS -X POST "https://forge.conservice.ai/ci/mint-github-read-token" \
-H 'content-type: application/json' \
-d "$PRESIGNED_STS_IDENTITY_JSON"

The response is the token and its expiry:

{ "token": "ghs_xxxxxxxxxxxxxxxxxxxx", "expires_at": "2026-07-02T18:30:00Z" }

Consume it like any GitHub installation token — set it as the bearer for the REST API, or as the credential for a clone:

# Enumerate exactly what this token can reach:
curl -sS -H "Authorization: token $TOKEN" \
https://api.github.com/installation/repositories

# Clone one of them:
git clone "https://x-access-token:$TOKEN@github.com/conservice-ai/<your-team-repo>.git"
import { Octokit } from "@octokit/rest";
const octokit = new Octokit({ auth: token }); // token from the mint call
const { data } = await octokit.request("GET /installation/repositories");
// data.repositories == the repos this token can read (your team's forge repos)

GET /installation/repositories is the authoritative "what can I reach" call — the token is clamped server-side, so this is how you discover its actual reach rather than guessing. The read token carries contents, metadata, pull_requests, checks, and statuses — all read-only — which covers cloning and PR/CI-status polling (e.g. reading a PR or its check-runs).

Using a write token

Same authentication (presigned Pod-Identity ARN), a different endpoint:

curl -sS -X POST "https://forge.conservice.ai/ci/mint-github-write-token" \
-H 'content-type: application/json' \
-d "$PRESIGNED_STS_IDENTITY_JSON"

The response shape is identical — { token, expires_at } — and the token is scoped to the single lane you declared, clamped to your own team's repos:

  • issues — mints issues:write only. Open, comment on, and label issues in your team's repos. Live and self-serve today.
  • code — mints contents:write + pull_requests:write. Push branches and open PRs; merge stays human-gated (branch protection on the target repos — the platform never auto-merges app-token PRs). Live and self-serve — the platform GitHub App's permission ceiling now includes pull_requests:write, closing the earlier mint-422 gap.

The request body may also carry an optional lane field ("code" | "issues" | "projects_read"). Omitted, it defaults to your single declared lane; if your app holds more than one lane (e.g. write: [issues] plus projects: read), the field is required — omitting it returns a 400 ambiguous — name one via the "lane" field. Note that projects: read tokens mint from this same write endpoint, with lane: "projects_read".

Consume the token exactly like the read token (Authorization: token <minted>, or as the git credential). Example — open an issue with octokit:

const octokit = new Octokit({ auth: token }); // token from mint-github-write-token
await octokit.rest.issues.create({
owner: "conservice-ai",
repo: "<your-team-repo>",
title: "Automated report",
body: "Filed by the app at runtime.",
});

Scope, lanes, and environments

What read reaches: your own team's forge-created repos — not just the calling app's repo, and not fleet-wide. The clamp is derived server-side from your pod-role ARN → team; you cannot name a target repo, and the restricted infra-* control-plane repos are always excluded. (Fleet-wide read is a separate, SRE-gated cross-fleet tier granted case-by-case — not something a forge.yaml field turns on.)

What write reaches: your own team's repos, team-scoped, for the one declared lane. Merge stays human-gated — branch protection on the target repos requires human review, and the platform never auto-merges app-token PRs.

Lane status at a glance:

LaneSelf-serveStatus
readLive
write: [issues]Live
write: [code]Live
projects: readLive (org-wide Projects read; can't mutate a board)
projects: writeSRE-gated

Environments: the mint endpoints admit preview (prev), staging (stg), and production (prod) pods — both lanes. (Preview access is a deliberate, ratified tradeoff: a preview pod runs pre-merge code and can mint an own-team-scoped token, so review what your PR code does with it like any other change.) A pod whose role ARN doesn't resolve to a known platform env is refused fail-closed.

TTL: minted tokens are short-lived GitHub installation tokens (~1 hour). Mint on demand, cache within the returned expires_at, and re-mint when it nears expiry — never persist a minted token.

Common errors

ResponseMeaningFix
403 — github read-mint is available only to preview, staging, and production workloads (or the write analog)Your pod's role ARN didn't resolve to a known platform env (prev/stg/prod) — the env gate is fail-closed on unrecognized identities.Mint from a platform pod running under its standard Pod-Identity role.
401 — server-ID header missing or invalidYour presigned STS blob didn't include and sign the x-forge-mint-serverid: forge-github-read header — a default STS presign omits it.Presign with the header in SignedHeaders — the platform team's mint-APIs guide has the full envelope recipe and copy-paste signers.
403 — this app has not declared github: { read: true } / has not declared a github: { write: ... } profileThe capability isn't declared in your forge.yaml, or the change hasn't been re-registered.Add the github block and re-scaffold / re-register the deployment so the capability lands.
403 — no repos resolved for callerThe server resolved zero repos for your team — e.g. your team has no forge-created repos yet, so there's nothing to scope to (empty clamps are refused, never widened).Ensure your team owns at least one forge-created repo before minting.
400 — ambiguous — app holds multiple lanesYou hold more than one lane (e.g. write: [issues] + projects: read) and omitted the lane body field.Name the lane explicitly in the request body.
422 on the code laneShould no longer occur — the platform GitHub App's permission ceiling includes pull_requests:write. Seeing it means the ceiling regressed below the lane's requested permissions.Escalate to the platform team.
403 — the 'projects_write' lane is SRE-gatedprojects: write is not self-serve.Request an explicit SRE grant.
429Rate-limited — mints are budgeted per app/team.Cache the token and re-mint near expires_at, not per-request.

Scheduled services (schedule)

Any service can run on a schedule instead of as an always-on Deployment: declare services[].schedule and forge renders that service as a Kubernetes CronJob. The container's CMD is the job — it runs to completion and exits 0 on success (nonzero = the run failed). Everything else about the service is normal: its own Dockerfile and ECR repo, the same env vars and secrets (envFrom the same ConfigMap/Secret), and the same pod IAM role — a scheduled service reaches the app's S3 buckets, queues, and database exactly like its always-on siblings.

services:
- name: nightly-sync
schedule:
cron: "0 2 * * *" # 02:00 daily, evaluated in `timezone`
timezone: America/Denver # REQUIRED — no default, wall-clock intent is always explicit
# concurrency: forbid # default — skip the tick if the previous run is still going
# active_deadline_seconds: 1800 # default — kill a hung run after 30 min

app_kind: cron does not do this. That top-level field only classifies which Datadog monitor pack the app gets — it schedules nothing. services[].schedule is what creates the CronJob.

Fields

FieldTypeRequiredNotes
cronstringyesStandard 5-field cron expression (minute hour day-of-month month day-of-week, e.g. 0 2 * * *) or one of @hourly / @daily / @weekly / @monthly. Lists (1,15), ranges (1-5), and range steps (*/15, 0-59/15) are supported. Rejected by design: seconds/Quartz forms (? L W #), @every, @yearly/@reboot, bare value steps (5/2 — write 5-59/2), and day-of-week 7 (use 0 for Sunday).
timezonestringyesUTC or a Region/City IANA name (America/Denver). No default — the intended wall-clock time is always explicit. Raw offsets (+05:00), Local, and abbreviations (EST) are rejected. Zones with daylight-saving time shift the job's UTC firing time twice a year — use UTC for a DST-free schedule.
concurrencyenumnoWhat happens when the previous run is still active at the next tick: forbid (default) skips the new run, allow runs them concurrently, replace cancels the running job and starts fresh.
active_deadline_secondsintegernoWall-clock kill switch for a hung run. Default 1800 (30 min); range 60–86400. A run exceeding it is terminated and counted as failed. Size it to the job's worst-case healthy runtime plus headroom.

What a scheduled service cannot declare

A scheduled service has no long-running or routed shape — each of these is a loud parse error on it: port, expose, health_path, dns, scaling, replicas.

When it fires (and when it deliberately doesn't)

  • Per environment, the schedule is OFF until that env has a promoted image AND a nonzero top-level replicas.{env} count — the same lever that activates Deployments on first promotion. Set replicas.{env}: 0 to pause an env's scheduled runs without touching the config.
  • Per-PR preview environments never fire scheduled runs — the CronJob ships suspended there, by design. Test the job logic by running the container directly; the schedule only ticks in stg/prod.

Common errors

SymptomMeaningFix
Parse error on schedule.cronThe expression uses a rejected form (seconds field, Quartz token, @every, bare 5/2 step, day-of-week 7).Rewrite in the standard 5-field grammar (5-59/2, Sunday = 0) or use an @ macro.
Parse error on schedule.timezoneOffset/abbreviation/Local given, or the field omitted.Use UTC or a Region/City IANA name. The field is required.
Job never fires in an envThat env has no promoted image yet, or replicas.{env} is 0.Promote once and set a nonzero replicas.{env}.
Job never fires in a previewPreviews ship the CronJob suspended — always.Expected; run the container manually to test job logic.
Runs killed at 30 minutesDefault active_deadline_seconds (1800) hit.Raise it (up to 86400) to worst-case healthy runtime + headroom.
A tick was skippedPrevious run still active and concurrency: forbid (default).Expected under the default; use allow/replace if overlap is safe/desired.

Environment opt-out (disabled_envs)

Apps deploy to all platform envs by default (prev, stg, prod). Use disabled_envs: to opt an app out of one or more — useful for tooling-only apps that don't need a preview env, or single-env utilities.

disabled_envs: [prev]
FieldTypeNotes
disabled_envsarray of enumsEach entry must be prev, stg, or prod. Omit the field entirely (or set []) to deploy to all envs.

Per-env resource overrides (different SQS retention in prod vs stg, etc.) aren't supported in this schema today. Tune at the resource level instead.

environments: is rejected at parse — remove the block; use disabled_envs: instead. Earlier schema versions accepted a top-level environments: { prev: {...}, stg: {...}, prod: {...} } block carrying per-env account_id. The current schema is strict and has no environments key, so a file that still carries the block fails validation outright. Account IDs are platform constants the renderer supplies, and the enabled-env list is derived from the platform default minus disabled_envs: — there was never anything dev-meaningful in the old block. Delete it and express env opt-out with disabled_envs:.


Preview

Per-PR ephemeral environment opt-in.

preview:
enabled: true
FieldTypeNotes
enabledboolDefault: false.
stale_daysintPositive integer. Days of PR inactivity before the scaffolded stale-PR workflow applies the stale label. Default: 60 when omitted. To turn the workflow off entirely, omit preview or set preview.enabled: false.
close_grace_daysintPositive integer. Days after the stale label is applied before the PR is auto-closed (which tears down the preview env via the PR-closed trigger). Default: 7 when omitted — i.e. close at 67 days inactive.

When true, every PR gets a preview env at pr-{N}-{app}.prev.conservice.ai (VPN-only). When dns.hostname is set, the leading label honors the override (zone-stripped) — e.g. dns.hostname: demo.conservice.ai → previews at pr-{N}-demo.prev.conservice.ai. Per-PR ephemeral resources (DBs, SQS queues, S3 buckets keyed pr-{N}-{name}) are provisioned automatically per PR. The pod IAM role gains anchored-wildcard ARNs for pr-* resources.

An abandoned PR holds its preview environment indefinitely, so forge scaffolds a stale-PR workflow: after stale_days of inactivity it labels the PR stale, and after a further close_grace_days it auto-closes the PR — which fires the per-PR teardown for free. The two values default to 60 and 7 (stale at 60 days, close at 67).


Image tags

Per-env image tags managed by Kargo after each promotion. You don't set these manually — Kargo's argocd-update step writes the promoted commit SHA into forge.yaml after each successful promotion.

image_tags:
stg: bd55eef2ffe08f1c7ad67b7ac14f1b2d69e1fc9a
prod: PROMOTION_PENDING-prod
KeyDescription
stgImage tag for staging. Written by Kargo after stg promotion.
prodImage tag for production. Written by Kargo after prod promotion.

Before the first Kargo promotion, the value is PROMOTION_PENDING-{env} — a sentinel that renders an obviously-unhealthy state rather than silently pulling latest.


Replicas

Per-env fixed replica counts for the Deployment. This sets a static pod count per environment — it is not autoscaling. For services that should scale with load, use the per-service scaling block instead; an autoscaled service is excluded from this block entirely (its HPA owns the count).

Valid keys are stg and prod onlyprev is not accepted (the block is rejected at validation if you add other keys). A new app scaffolds with { stg: 0, prod: 0 }; Kargo bumps the value 0 → 2 on the first promotion to each env, so the app stays provisioned-but-idle until a real image is promoted.

replicas:
stg: "2"
prod: 0

Values are integers or quoted strings (YAML scalars — both 2 and "2" work). 0 means the env is provisioned but not running pods (e.g., prod before the first promotion). The platform renderer emits these into the kustomize overlay's Deployment patch.


Render channel (render_channel)

Selects which renderer pin channel the app's CI workflows track. Supersedes the deprecated canary boolean.

render_channel: canary
ValueNotes
generalDefault. The app's CI tracks the post-bake-promoted (stable) renderer version. New renderer releases reach general apps only after the bake window.
canaryThe app's CI tracks the advance-on-every-publish renderer version. Reserved for SRE-owned canary apps (forge-canary-*); general apps should omit the field or set general explicitly. Useful for catching template-rendering regressions before fleet-wide rollout.

Omit for general apps — the default is general. The deprecated canary: true boolean is still accepted as a backwards-compatible alias for render_channel: canary; when both are set, render_channel wins.


Canary (deprecated)

Deprecated

Use render_channel: canary instead. The canary boolean is retained as a backwards-compatible alias only; new apps and edits should use render_channel. When both fields are set, render_channel wins.

canary: true # equivalent to: render_channel: canary

When true, the app's CI workflows track the canary renderer channel instead of the general channel. Same semantics as render_channel: canary — kept solely so existing forge.yaml files don't have to be rewritten in lockstep with the schema change.


Monitoring

Three related top-level fields configure the app's Datadog monitor pack. app_kind and slo_tier gate which monitors emit from the default catalog; monitors configures where they route and page. All three are optional at the schema level — an app that targets stg/prod without a monitors: block gets a non-blocking preflight warning rather than a hard failure.

App kind (app_kind)

app_kind: web-service
ValueNotes
web-serviceHTTP-serving app — gets 5xx, latency, and request-rate monitors.
workerBackground processor — gets queue-lag and throughput monitors.
cronScheduled job — gets run-success / missed-run monitors.
batchBatch job — gets completion / duration monitors.

Optional. Drives the Datadog monitor catalog gate — the value selects which kind-specific monitors emit. A typo here means those monitors silently fail to emit, so set it deliberately when the app targets stg/prod.

SLO tier (slo_tier)

slo_tier: tier-2
ValueNotes
tier-1Full catalog, including the P3 monitors.
tier-2Default. Drops the most-sensitive P3s.
tier-3P1/P2 only — the internal-tool profile.

Optional. This is a catalog-membership lever: it controls how much of the monitor catalog the app receives, not where monitors route. Apps that need paging behavior on a non-prod env declare that in monitors.routing, not here. When unset, the downstream module defaults to tier-2.

Monitors (monitors)

monitors:
pagerduty_service: billing
google_chat:
name: billing-alerts
space_id: AAAAabcdefg
token: xyz123_token-value
dev_channel: alice-debug
FieldTypeRequiredNotes
pagerduty_servicestringnoLowercase alphanumeric / underscore / hyphen, 1-64 chars (e.g. billing, forge-runtime). Routes prod monitors via @pagerduty-${name}. Optional — omit only with an explicit routing.prod override (e.g. Datadog-event-only routing).
pagerduty_create_newbooleannoWhen true, Forge auto-PRs a new PagerDuty service; when false or omitted, it reuses an existing one. Requires pagerduty_escalation_policy when true.
pagerduty_escalation_policystringnoEscalation-policy name, 1-64 chars. Required when pagerduty_create_new: true — the auto-PR references the policy to wire the new service. Forge does not create escalation policies; they stay team-owned in the PagerDuty UI.
google_chatobjectnoGoogle Chat webhook config: name + space_id + token. See the shape below. Optional — omit only with an explicit routing.stg override (e.g. routing: { stg: ["datadog-event"] }).
dev_channelstringnoKebab-case prev-env opt-in. When set, prev monitors route to @webhook-${dev_channel}. When omitted, prev emits zero monitors (prev is silent by default).
routingobjectnoPer-env routing override. A map keyed by prev / stg / prod; each value is a non-empty array of @-handles (e.g. @pagerduty-billing). When set for an env, it replaces the default routing for that env (prod → PagerDuty, stg → Google Chat, prev → dev_channel or silent). Envs not listed fall back to the default. This is the only surface for multi-handle destinations.
extraarraynoTeam-supplied extra monitors, unioned with the default catalog at emit time. See the entry shape below.

The google_chat block:

FieldTypeRequiredNotes
namestringyesKebab-case webhook name. Datadog routes to it as @webhook-${name}.
space_idstringyesGoogle Chat space ID — alphanumeric / underscore / hyphen, 1-64 chars. Plaintext (useless without the org-level API key held in AWS Secrets Manager).
tokenstringyesPer-webhook token — alphanumeric / underscore / hyphen, 1-128 chars. Plaintext (defense-in-depth: the token + space pair is useless without the org-level key).

Each entry in extra:

FieldTypeRequiredNotes
namestringyesHuman-readable monitor name, 1-200 chars. Shown in the Datadog UI and alert previews.
typeenumyesA Datadog monitor type literal — metric alert, query alert, log alert, service check, slo alert, etc. Composite monitors are not permitted.
querystringyesRaw Datadog query string. Authors own the env: / service: scoping; the module emits the query verbatim.
priorityintegeryes1-5 (1 = highest, page-worthy; 5 = lowest, FYI). Stamped as the native Datadog priority and duplicated into a priority:P{n} tag.
categorystringyesKebab-case category slug. Tagged as category:${value} for filter-view grouping. Reuse a default-catalog category name when the extra is a conceptual peer; otherwise pick an app-specific slug.
thresholdsobjectyes{ critical?, warning?, critical_recovery? } — numeric threshold values for the monitor's monitor_thresholds {} block.
tagsarraynoAdditional tags appended to the module's common tag set (e.g. subsystem:invoice-render).

Observability (observability)

Optional per-app observability tuning. Dev-owned; not machine-written. The block is strict-keyed (unknown keys are rejected at parse).

observability:
trace_sample_rate_prod: 0.1
FieldTypeNotes
trace_sample_rate_prodnumber (0.0–1.0)Datadog APM trace sample rate for the production overlay only. Optional opt-in cost control; omit to use the platform / Datadog default. prev and stg always sample at 1.0 (full-fidelity traces in lower envs). Sets DD_TRACE_SAMPLE_RATE on the prod overlay.

Use this when prod trace volume is creating non-trivial Datadog cost and the app is willing to give up per-request trace fidelity for sampled aggregates. Lower envs are unaffected — debugging-time visibility stays full-fidelity.


Reserved naming

Reserved key prefixes (resource map keys)

  • pr- — reserved for per-PR ephemeral resources. A static bucket keyed pr-foo would produce {prefix}-pr-foo, colliding with the per-PR wildcard {prefix}-pr-*-*. Schema rejects all pr-* keys in s3, sqs, dynamodb, etc.

Reserved env-var name prefixes

These are platform-managed; do NOT include them in app_config_keys or shadow them in services[].env. They're injected automatically:

  • DATABASE_* — Aurora connection metadata
  • S3_BUCKET_* — bucket names
  • SQS_QUEUE_* — queue URLs
  • SNS_TOPIC_* — topic ARNs
  • EVENTBRIDGE_BUS_* — bus names
  • DYNAMODB_TABLE_* — table names
  • SFN_ARN_* — state machine ARNs
  • FIREHOSE_STREAM_* — firehose stream names
  • BEDROCK_* — Bedrock config
  • KMS_KEY_* — per-app CMK ID + ARN env vars

And the exact-match reserved names (whole name reserved, not a prefix):

  • AWS_REGION
  • TEMPORAL_ADDRESS
  • TEMPORAL_NAMESPACE

Reserved app-name prefixes (forge schema rejection list)

Forge rejects app names starting with these:

  • csvc- — legacy prefix; retained in the rejection list to prevent resurrection
  • conservice- — reserved for global-namespace S3 buckets only
  • aws- — group prefix
  • infra- — repo prefix for SRE-owned repos
  • forge- — forge platform itself (carve-out: forge-canary-* allowed for SRE canary fleet)
  • k8s- — Kubernetes-managed

app- is no longer a reserved app-name prefix. Dev-owned repos use bare names (rates-agent, not app-rates-agent); classification is via a GitHub repo custom property rather than the name. Note that the auth tier-group prefix app-{app}-admins@conservice.com is a separate concept (a Google-group identifier) and is unaffected.

Reserved exact app names (rejected regardless of prefix): forge, argocd, kargo, platform, admin, infrastructure, terraform, staff, workloads, gateway. (auth-service is deliberately not on the list — it is a live forge-managed app with that exact slug.)

The same list is reused to validate resources.kms.<key_name> — pick a logical name that describes the encryption purpose, not a platform prefix.


Naming patterns (resolved at scaffold)

ResourcePatternExample
Repoconservice-ai/{app}conservice-ai/my-app (bare names)
Namespace{app}my-app
ECR repoapps/{app}-{service}apps/my-app-api
Aurora DB name{app_underscored}my_app
Aurora service user{app_underscored}_svcmy_app_svc
S3 bucketconservice-{env}-{app}-{key}conservice-prod-my-app-history
SQS queue{env}-use1-{app}-{key}-queueprod-use1-my-app-jobs-queue
SNS topic{env}-use1-{app}-{key}-topicprod-use1-my-app-events-topic
EventBridge bus{env}-use1-{app}-{key}prod-use1-my-app-domain
Step Function{env}-use1-{app}-{key}prod-use1-my-app-flow
DynamoDB table{env}-use1-{app}-{key}prod-use1-my-app-sessions
Firehose stream{env}-use1-{app}-{key}prod-use1-my-app-events
KMS key{env}-use1-{app}-{key}-keyprod-use1-auth-service-token-envelope-key
KMS aliasalias/{env}-use1-{app}-{key}-keyalias/prod-use1-auth-service-token-envelope-key
Pod IAM role{env}-use1-{app}-pod-roleprod-use1-my-app-pod-role
Secrets path{app}/{key}my-app/api-token
Temporal namespace{app}-{env}.<temporal-cloud-namespace>my-app-prod.<temporal-cloud-namespace>
Per-team Permission Set (Identity Center)team-{team}-{env_short}-{tier}team-ai-prod-admin — kebab-case. Drives team AWS Console access AND per-app DB IAM access (via rds-db:connect ARNs in the inline policy, enumerated from the SRE-managed team-to-app mapping).
PG login role (per-app, cluster-scoped)aws-{app}-db-{tier}aws-rates-db-admin — created by the platform when database.{name}.team_grants is declared. One role per (app, tier) at the Aurora cluster level; multiple teams sharing a tier share one login role. The -db- infix marks it as DB-scoped (the role gates ONLY PostgreSQL access — every other AWS service uses per-team Permission Sets).
PG tier role (per-app, NOLOGIN, holds object grants){app}_admin / {app}_readonlyrates_admin, rates_readonly — table/sequence + DEFAULT PRIVILEGES on each DB the app owns. The login role above inherits from these.
Authorization entity{app}rates-agent — registered in each env's authorization store at scaffold time. Grants managed via auth-portal.
Kargo project{app}my-app (literal — no env suffix)

App env vars are uniform: {TYPE}_{KEY} matches the AWS name's {key}. A bucket keyed history becomes env var S3_BUCKET_HISTORY containing conservice-prod-my-app-history. A KMS key named token-envelope becomes env vars KMS_KEY_TOKEN_ENVELOPE_ID and KMS_KEY_TOKEN_ENVELOPE_ARN. Always read names from env vars; never construct them in app code.


Schema versioning

Two distinct version lines apply to forge.yaml:

  1. forge_version is the YAML schema-contract value users put in their file. Current: 3.0.0. It declares which schema contract the file expects. Forge carries N and N-1 renderers in parallel for a deprecation window; apps on the old version see a deprecation warning on forge_status until they migrate. Use forge_migrate_yaml to auto-upgrade your forge.yaml to the latest shape.
  2. The URL major version (v3) is the schema's published-contract major. It is independent of the platform's internal implementation versions — the URL versions the schema contract, not the implementation. The published JSON Schema artifact on this site is regenerated from the same validator the platform runs at parse time, so the file you point your editor at always matches what CI enforces.

Versioning policy for this doc

  • Each major schema version (v1, v2, v3, ...) gets its own URL path: schemas.conservice.ai/forge/v1/, schemas.conservice.ai/forge/v2/, schemas.conservice.ai/forge/v3/.
  • Old majors stay live forever. An app pinned to v1 keeps validating against the v1 schema even after v2 ships.
  • Within a major (v1.0 → v1.x), this doc updates in place. Additive changes (new fields, relaxed constraints) don't bump the major.
  • Breaking changes (renamed fields, removed fields, tightened constraints) bump the major.

Where to file requests

  • Need a new resource type (DocumentDB, OpenSearch, etc.) — file a platform request. Don't add raw resource blocks to your repo — the IaC guardrail rejects PRs.
  • Need an existing knob exposed (S3 lifecycle, DynamoDB autoscaling, etc.) — same. The platform team adds it to the renderer + this schema doc.
  • Need to tune a per-env value that's currently uniform — file a request; per-env override support is a roadmap item.
  • Found a bug in this doc — let the Conservice platform team know.

More information

  • JSON Schema for IDE auto-validation: forge.yaml.schema.json — the machine-readable contract. Auto-published on every schema change; drop it into .vscode/settings.json yaml.schemas for autocomplete + validation.
  • Need to read the validator source? The schema is generated from a runtime validator; the Conservice platform team can point you at it. The JSON Schema linked above is the same shape and is what your editor and CI validate against.
  • Questions or access requests — reach the Conservice platform team.