Access · Break-Glass

SSO, emergency access & audit trail

No one has standing access to production. All changes go through pipelines. Break-glass is for active incidents only — automatically tracked, time-bound, and immutably audited.

1 · How to connect

SSO login

# One-time setup
aws configure sso

# Login
aws sso login --profile Dev-tapp
export AWS_PROFILE=Dev-tapp

# Verify
aws sts get-caller-identity

Connecting to EKS

aws eks update-kubeconfig --name dev-eks-cluster --region us-east-1
kubectl get nodes

Database access (IAM auth)

Use the connect.sh pattern — no static passwords:

# 1. Assume the DB access role
TEMP_CREDS=$(aws sts assume-role \
  --role-arn "arn:aws:iam::<ACCOUNT_ID>:role/developer-rds-access" \
  --role-session-name "dev-psql-$(date +%s)")
export AWS_ACCESS_KEY_ID=$(echo "$TEMP_CREDS" | jq -r '.Credentials.AccessKeyId')
export AWS_SECRET_ACCESS_KEY=$(echo "$TEMP_CREDS" | jq -r '.Credentials.SecretAccessKey')
export AWS_SESSION_TOKEN=$(echo "$TEMP_CREDS" | jq -r '.Credentials.SessionToken')

# 2. Generate RDS auth token
DB_TOKEN=$(aws rds generate-db-auth-token \
  --hostname "<RDS_ENDPOINT>" --port 5432 \
  --username "<DB_USER>" --region us-east-1)

# 3. Connect
PGPASSWORD="$DB_TOKEN" psql --host="<RDS_ENDPOINT>" --port=5432 \
  --user="<DB_USER>" --dbname="<DB_NAME>"

2 · IAM groups & VPN access

Access is managed via IaC (Terraform) in the Management account. Users are added to groups via PR. No click-ops.

Console access groups

GroupConsole accessMembers
grp-forge-platform-adminsFull · all accountsPlatform engineers
grp-forge-developersFull on Dev + QASoftware engineers
grp-forge-qa-engineersFull on QA, ReadOnly on DevQA team
grp-forge-9squid-teamFull on Dev9Squid product team
grp-forge-sre-oncallPowerUser + break-glass on ProdOn-call engineers
grp-forge-wealth-prodReadOnly on Prod-WealthCTO, key stakeholders
grp-forge-data-analystsS3 Data Access on Prod (Confidential)Business / BI team

VPN access groups (per account)

VPN access is separate from console access. Not every engineer needs console, but all need VPN to reach internal services.

GroupNetwork accessMembers
grp-forge-vpn-devDev VPCAll engineers
grp-forge-vpn-qaQA VPCQA + relevant devs
grp-forge-vpn-prod-wealthProd-Wealth VPCWealth team + SRE only
grp-forge-vpn-prod-cashProd-Cash VPCCash team + SRE only
grp-forge-vpn-prod-9squidProd-9Squid VPC9Squid team + SRE only
Principle: a wealth dev cannot VPN into the cash management network. VPN groups are scoped per account. Platform admins are in all VPN groups.

3 · Break-glass model

No one has standing access to production. All changes go through pipelines. Break-glass is for active incidents only.

TierPermission setWhenApprovalDuration
Tier 0Forge-Prod-EmergencyP1 — site downSelf-serve + auto-alert1h
Tier 1Scenario-specificP2/P3 — investigationManager approval2-4h
Tier 2Forge-Prod-ReadOnlyInvestigation onlySenior engineer4h
Tier 3IAM user in BitwardenIdentity Center is downPhysical, 2-personuntil rotated

4 · Tier 1 scenarios

EC
A · EC2 shell access

SSM session to check logs, processes, env vars. aws ssm start-session --target <ID>. 2-hour max.

DB
B · Database access

IAM-authenticated SQL connection (connect.sh) or SSM port-forward fallback. 2-hour max.

EK
C · EKS pod inspection

kubectl logs, kubectl exec, kubectl describe. 2-hour max.

CL
D · AWS Console

ReadOnly access to CloudWatch, EC2, RDS dashboards. 4-hour max.

5 · Automated audit trail Deployed

Every break-glass session is automatically archived to an immutable S3 bucket:

AssumeRole BreakGlass / Emergency EventBridge detects pattern Lambda S3 folder + Slack alert SSM logs stream automatically Step Function CloudTrail export S3 (WORM) Object Lock COMPLIANCE

How it works: EventBridge detects AssumeRole for *BreakGlass* / *Emergency* roles → Lambda creates S3 folder + writes metadata + sends Slack alert → SSM logs stream automatically → Step Function exports CloudTrail after session expires.

Bucket policy: WORM (Object Lock COMPLIANCE, 365-day retention), KMS encrypted, Glacier after 90 days, deleted after 7 years.

6 · Quick reference

TIER 0 (P1 — SITE DOWN):
  Assume Forge-Prod-Emergency → MFA → work → auto-expires 1h
  Auto-alerts #platform-incidents. Ticket within 1h.

TIER 1 (P2/P3 — INVESTIGATION):
  1. Create ticket
  2. Post request in #platform-support
  3. Wait for manager approval
  4. Platform Admin provisions access in Identity Center
  5. Log in via SSO → verify with sts:GetCallerIdentity
  6. Diagnose and fix — document everything
  7. Log out → access auto-revokes

EC2:     aws ssm start-session --target <instance-id>
RDS:     connect.sh (IAM auth) or SSM port-forward (fallback)
EKS:     aws eks update-kubeconfig → kubectl
Console: SSO portal → Forge-Prod-ReadOnly

DURATIONS: Tier 0 = 1h · EC2/DB/EKS = 2h · Console = 4h
APPROVERS: Platform Leads, Eng Managers
CHANNELS:  #platform-support (requests), #platform-incidents