GitOps · Ledger

The desired state of the platform

Every commit to main is reconciled by Argo CD. No kubectl apply — everything goes through Git.

1 · Three principles

G
Git is the API
Adding a service = dropping a YAML file. Updating a version = changing image.tag. Removing a service = deleting the file (but prune: false protects against accidents).
A
App of Apps
Each environment has a Root App that watches its clusters/<env>/ directory. Any *.yaml in that directory becomes an Argo CD Application. Zero manual registration.
H
Shared Helm charts
Four standardised charts: api-workload, grpc-workload, worker-workload, stateful-workload. Services reference them — they don't copy them.

Open Ledger in Azure DevOps ↗

2 · Repository structure

ledger/
├── clusters/                              # Per-environment Argo CD Application manifests
│   ├── dev/                               # 65 services, organised by namespace
│   │   ├── root-app.yaml                  # App of Apps — discovers all YAMLs (recurse: true)
│   │   ├── business-services/             # Core APIs + gRPC (28 YAMLs)
│   │   ├── business-workers/              # Background workers (14 YAMLs)
│   │   ├── practice-trading/              # Practice trading (9 YAMLs)
│   │   ├── iam/                           # Keycloak, OpenFGA
│   │   ├── intelligence/                  # RAG AI
│   │   ├── mfe/                           # Micro-frontends (6 YAMLs)
│   │   └── web/                           # Web portals (5 YAMLs)
│   ├── qa/                                # rds-iam-mvp + Kyverno + NetworkPolicies
│   └── prod-wealth/                       # Ready, waiting for cluster deployment
│       └── root-app.yaml                  # syncPolicy: {} (manual only)
├── components/                            # Reusable Helm charts & Kustomize bases
│   ├── platform/
│   │   ├── charts/
│   │   │   ├── api-workload/              # REST API: deployment, service, ingress, HPA, NetworkPolicy
│   │   │   ├── grpc-workload/             # gRPC: deployment, service, HPA
│   │   │   ├── worker-workload/           # Background: deployment, KEDA ScaledObject
│   │   │   └── stateful-workload/         # StatefulSet: PVC, headless service, PDB
│   │   ├── forge-workload-lib/            # Shared Helm library
│   │   ├── kyverno-policies/base/         # Kustomize: 6 ClusterPolicies
│   │   ├── network-policies/base/         # Kustomize: baseline NetworkPolicies
│   │   └── argocd/                        # Argo CD Kustomize overlay
│   └── business-services/
│       └── rds-iam-mvp/chart/             # Service-specific chart (custom NetworkPolicy)
└── docs/

3 · Root App · App of Apps

Each environment has a root-app.yaml that tells Argo CD: "watch this directory, treat every YAML as an Application."

EnvironmentPath watchedAuto-syncPruneSelf-heal
devclusters/dev/YesNo (safety)Yes
qaclusters/qa/YesNo (safety)Yes
prod-wealthclusters/prod-wealth/No · manual onlyNo (safety)No
prune: false is deliberate across all environments. Deleting a YAML file from Git does NOT auto-delete the Application from the cluster. Deletion must be a deliberate action (Argo CD UI or kubectl delete), never an accidental commit.

Two-phase handoff

  1. Terraform (bootstrap) — the EKS module creates the initial Root App when the cluster is first deployed. ignore_changes = [yaml_body, yaml_incluster] ensures Terraform doesn't overwrite later changes.
  2. Ledger (steady-state) — after bootstrap, the root-app.yaml in the Ledger is the source of truth. Both copies must stay in sync.

4 · Application manifest anatomy

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: rds-iam-mvp-dev              # Unique name (service + env)
  namespace: argocd
spec:
  project: default
  source:
    repoURL: git@ssh.dev.azure.com:v3/TaPP-Engine/Forge/ledger
    targetRevision: main
    path: components/platform/charts/api-workload    # Shared chart
    helm:
      values: |
        forge:
          environment: "dev"
          component: "rds-iam-mvp"
        image:
          repository: 303026955634.dkr.ecr.us-east-1.amazonaws.com/dev/rds-iam-mvp-app
          tag: "38076"                # ← Pipeline updates this
        serviceAccount:
          create: false               # Blueprint manages the SA
          name: "app-service-account"
        networkPolicy:
          enabled: false              # No Kyverno on dev
  destination:
    server: https://kubernetes.default.svc
    namespace: business-services
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Key conventions

5 · Platform Helm charts

ChartUse caseResources created
api-workloadREST APIs, web backendsDeployment, Service, Ingress, HPA, ServiceAccount, NetworkPolicy
grpc-workloadgRPC servicesDeployment, Service (ClusterIP), HPA, ServiceAccount
worker-workloadBackground workers, queue consumersDeployment, ServiceAccount, KEDA ScaledObject
stateful-workloadStateful services (Keycloak, OpenFGA)StatefulSet, Service, Headless Service, PVC, HPA, PDB, ServiceAccount

All charts inherit from forge-workload-lib — a shared Helm library that provides common templates for deployments, services, HPA and resource tiers.

Extension points · what devs can customise

ExtensionValues key
Pod annotations / labelspodAnnotations, podLabels
Environment variablesadditionalEnvVars
Init containersinitContainers
Extra volumes / mountsextraVolumes, extraVolumeMounts
SchedulingnodeSelector, topologySpreadConstraints
Network egressnetworkPolicy: {rds: true, redis: true}
Resource tierresourceTier: "small"
Autoscalingautoscaling: {minReplicas, maxReplicas, targetCPU}

Forbidden · platform-managed

6 · Environment differences

DevQAProd-Wealth
Services65 (8 namespaces)1 (rds-iam-mvp)1 (ready, not deployed)
KyvernoNo (open for devs)Yes (6 policies)Yes (6 policies)
NetworkPolicyDisabledEnabledEnabled
Argo CD syncAuto + selfHealAuto + selfHealManual only
ECR pathdev/<name>qa/<name>prod-wealth/<name>
Replicas112 minimum

7 · How to add a service

  1. Blueprint first — deploy infrastructure via blueprint/accounts/<env>/us-east-1/services/eks/<namespace>/<name>/ (ecr, eks-identity, config, optionally secrets)
  2. Create Application YAML — copy an existing manifest into clusters/dev/<namespace>/, update name, image.repository, forge.component
  3. Add namespace — if the namespace is new, add it to services/eks/namespaces/terragrunt.hcl
  4. Push to main — Root App auto-discovers and deploys within 30 seconds
  5. Verify — check Argo CD UI for sync status and pod health
Common failure: ServiceAccount not found means the Blueprint step was skipped. Run terragrunt run-all apply on the service's Blueprint directory first.