GitOps · Ledger
The desired state of the platform
Every commit to main is reconciled by Argo CD. No kubectl apply — everything goes through Git.
1 · Three principles
Adding a service = dropping a YAML file. Updating a version = changing image.tag. Removing a service = deleting the file (but prune: false protects against accidents).
Each environment has a Root App that watches its clusters/<env>/ directory. Any *.yaml in that directory becomes an Argo CD Application. Zero manual registration.
Four standardised charts: api-workload, grpc-workload, worker-workload, stateful-workload. Services reference them — they don't copy them.
Open Ledger in Azure DevOps ↗
2 · Repository structure
ledger/
├── clusters/ # Per-environment Argo CD Application manifests
│ ├── dev/ # 65 services, organised by namespace
│ │ ├── root-app.yaml # App of Apps — discovers all YAMLs (recurse: true)
│ │ ├── business-services/ # Core APIs + gRPC (28 YAMLs)
│ │ ├── business-workers/ # Background workers (14 YAMLs)
│ │ ├── practice-trading/ # Practice trading (9 YAMLs)
│ │ ├── iam/ # Keycloak, OpenFGA
│ │ ├── intelligence/ # RAG AI
│ │ ├── mfe/ # Micro-frontends (6 YAMLs)
│ │ └── web/ # Web portals (5 YAMLs)
│ ├── qa/ # rds-iam-mvp + Kyverno + NetworkPolicies
│ └── prod-wealth/ # Ready, waiting for cluster deployment
│ └── root-app.yaml # syncPolicy: {} (manual only)
├── components/ # Reusable Helm charts & Kustomize bases
│ ├── platform/
│ │ ├── charts/
│ │ │ ├── api-workload/ # REST API: deployment, service, ingress, HPA, NetworkPolicy
│ │ │ ├── grpc-workload/ # gRPC: deployment, service, HPA
│ │ │ ├── worker-workload/ # Background: deployment, KEDA ScaledObject
│ │ │ └── stateful-workload/ # StatefulSet: PVC, headless service, PDB
│ │ ├── forge-workload-lib/ # Shared Helm library
│ │ ├── kyverno-policies/base/ # Kustomize: 6 ClusterPolicies
│ │ ├── network-policies/base/ # Kustomize: baseline NetworkPolicies
│ │ └── argocd/ # Argo CD Kustomize overlay
│ └── business-services/
│ └── rds-iam-mvp/chart/ # Service-specific chart (custom NetworkPolicy)
└── docs/
3 · Root App · App of Apps
Each environment has a root-app.yaml that tells Argo CD: "watch this directory, treat every YAML as an Application."
| Environment | Path watched | Auto-sync | Prune | Self-heal |
| dev | clusters/dev/ | Yes | No (safety) | Yes |
| qa | clusters/qa/ | Yes | No (safety) | Yes |
| prod-wealth | clusters/prod-wealth/ | No · manual only | No (safety) | No |
prune: false is deliberate across all environments. Deleting a YAML file from Git does NOT auto-delete the Application from the cluster. Deletion must be a deliberate action (Argo CD UI or kubectl delete), never an accidental commit.
Two-phase handoff
- Terraform (bootstrap) — the EKS module creates the initial Root App when the cluster is first deployed.
ignore_changes = [yaml_body, yaml_incluster] ensures Terraform doesn't overwrite later changes.
- Ledger (steady-state) — after bootstrap, the
root-app.yaml in the Ledger is the source of truth. Both copies must stay in sync.
4 · Application manifest anatomy
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: rds-iam-mvp-dev # Unique name (service + env)
namespace: argocd
spec:
project: default
source:
repoURL: git@ssh.dev.azure.com:v3/TaPP-Engine/Forge/ledger
targetRevision: main
path: components/platform/charts/api-workload # Shared chart
helm:
values: |
forge:
environment: "dev"
component: "rds-iam-mvp"
image:
repository: 303026955634.dkr.ecr.us-east-1.amazonaws.com/dev/rds-iam-mvp-app
tag: "38076" # ← Pipeline updates this
serviceAccount:
create: false # Blueprint manages the SA
name: "app-service-account"
networkPolicy:
enabled: false # No Kyverno on dev
destination:
server: https://kubernetes.default.svc
namespace: business-services
syncPolicy:
automated:
prune: true
selfHeal: true
Key conventions
image.tag is the only field the release pipeline changes. Immutable Build ID, no latest.
serviceAccount.create: false — Blueprint creates the ServiceAccount via eks-identity. The chart references it by name.
networkPolicy.enabled — false on dev (no Kyverno baseline), true on QA / Prod.
path points to a shared chart. Only use a service-specific chart if you need custom templates (e.g. custom NetworkPolicy rules).
5 · Platform Helm charts
| Chart | Use case | Resources created |
api-workload | REST APIs, web backends | Deployment, Service, Ingress, HPA, ServiceAccount, NetworkPolicy |
grpc-workload | gRPC services | Deployment, Service (ClusterIP), HPA, ServiceAccount |
worker-workload | Background workers, queue consumers | Deployment, ServiceAccount, KEDA ScaledObject |
stateful-workload | Stateful services (Keycloak, OpenFGA) | StatefulSet, Service, Headless Service, PVC, HPA, PDB, ServiceAccount |
All charts inherit from forge-workload-lib — a shared Helm library that provides common templates for deployments, services, HPA and resource tiers.
Extension points · what devs can customise
| Extension | Values key |
| Pod annotations / labels | podAnnotations, podLabels |
| Environment variables | additionalEnvVars |
| Init containers | initContainers |
| Extra volumes / mounts | extraVolumes, extraVolumeMounts |
| Scheduling | nodeSelector, topologySpreadConstraints |
| Network egress | networkPolicy: {rds: true, redis: true} |
| Resource tier | resourceTier: "small" |
| Autoscaling | autoscaling: {minReplicas, maxReplicas, targetCPU} |
Forbidden · platform-managed
securityContext — non-root enforced by Kyverno (UID 1000)
extraContainers — sidecars managed by platform
hostPath — blocked by Kyverno
6 · Environment differences
| Dev | QA | Prod-Wealth |
| Services | 65 (8 namespaces) | 1 (rds-iam-mvp) | 1 (ready, not deployed) |
| Kyverno | No (open for devs) | Yes (6 policies) | Yes (6 policies) |
| NetworkPolicy | Disabled | Enabled | Enabled |
| Argo CD sync | Auto + selfHeal | Auto + selfHeal | Manual only |
| ECR path | dev/<name> | qa/<name> | prod-wealth/<name> |
| Replicas | 1 | 1 | 2 minimum |
7 · How to add a service
- Blueprint first — deploy infrastructure via
blueprint/accounts/<env>/us-east-1/services/eks/<namespace>/<name>/ (ecr, eks-identity, config, optionally secrets)
- Create Application YAML — copy an existing manifest into
clusters/dev/<namespace>/, update name, image.repository, forge.component
- Add namespace — if the namespace is new, add it to
services/eks/namespaces/terragrunt.hcl
- Push to main — Root App auto-discovers and deploys within 30 seconds
- Verify — check Argo CD UI for sync status and pod health
Common failure: ServiceAccount not found means the Blueprint step was skipped. Run terragrunt run-all apply on the service's Blueprint directory first.