Secrets management
Production secrets management: from .env files to KMS-backed secret managers, IAM auth, automated rotation, leak response, and CI/CD pipeline security.
What Counts As A Secret
A secret is anything that grants access. The list is longer than you think.
- API keys for third-party services (Stripe, SendGrid, OpenAI).
- Database passwords.
- Signing keys (JWT secrets, webhook secrets, VAPID private keys, SSH keys).
- OAuth client secrets.
- Encryption keys (data-at-rest keys, KMS master keys).
- Service-to-service authentication tokens.
- Admin tokens, root tokens.
- TLS private keys for internal services.
- Cookie signing keys.
- Recovery codes and seed phrases.
Anything that, if leaked, would let someone do something on your behalf. If in doubt, treat it as a secret.
The Threat Model
Secrets get exposed through.
- Accidental commit to git. Found by GitHub's secret scanner or a curious researcher.
- Logs. App logs the secret directly, or echoes it in error messages.
- Crash dumps and core dumps. Memory contents written to disk include env vars.
- CI/CD pipeline leaks. Build logs, artifact uploads, PR previews from outside contributors.
- Insider access. Engineer with database read access reads the config table.
- Misconfigured IAM. S3 bucket containing config files set to public.
- Backups. Database backups include the secrets table, backups stored unencrypted.
- Dependency compromise. Malicious npm package reads
process.envand exfiltrates. - Memory leaks. Heap dumps in production debugging tools.
- Subprocess inheritance. Spawned processes inherit parent env vars including secrets unrelated to them.
Defense in depth. No single technique stops all of these.
Storage Hierarchy
Worst to best.
Level 0: hardcoded in source
Catastrophic. Git history is forever. Even after you remove it and force-push, copies exist in forks, in clone caches, in GitHub's archive. Treat as compromised the moment it's committed, even if the repo is private.
Level 1: .env in git
Same as level 0. Some teams convince themselves "the repo is private" makes this safe. It does not. Repo settings change, contractors get added, the repo gets forked into someone's personal account.
Level 2: .env in .gitignore
Better but still flawed. The file lives on every developer's laptop, every CI runner, every Docker build context. It gets copied into Docker images if you COPY . .. Logs echoing env vars leak the contents. Use only for dev secrets that don't match production.
Level 3: deploy-time env vars
Set by your deploy tool (Vercel, Heroku, Kubernetes secrets, ECS task definitions). The standard for most teams. Drawbacks.
- Visible in /proc/PID/environ on the machine.
- Inherited by child processes.
- Often readable by anyone with deploy permission, not just SREs.
- No rotation story unless you redeploy.
- No audit log of who read which secret when.
For small teams this is acceptable. For anything regulated or larger than 10 engineers, move to level 4.
Level 4: secret manager
AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, HashiCorp Vault, Doppler. The app fetches secrets at runtime from a central service.
Benefits.
- No static credentials in app config. The app authenticates with its cloud workload identity (EC2 instance profile, EKS service account, Cloud Run service identity).
- Audit log of every read.
- Automatic rotation with helper Lambdas/functions for common services.
- Versioning. Roll back a bad secret update.
- Fine-grained IAM. The billing service can read the Stripe key but not the database password.
Cost is real. AWS Secrets Manager charges $0.40 per secret per month plus $0.05 per 10k API calls. For an app with 50 secrets you spend $20/month plus API costs. Not nothing, but trivial compared to one breach.
Level 5: HSM
Hardware Security Module. The key never leaves the device. You send data to be signed, the HSM signs it, sends back the signature. Used for root CA keys, code signing certs, cryptocurrency cold storage.
Cloud HSMs (AWS CloudHSM, GCP Cloud HSM, Azure Dedicated HSM) cost hundreds of dollars per month. Worth it for the keys that, if leaked, would end your company. Overkill for app-level API keys.
IAM and Workload Identity
The right pattern is workload identity. Your service has an identity assigned by the cloud (EC2 instance profile, EKS pod identity, Cloud Run service account). It uses that identity to authenticate to the secret manager. No static credentials anywhere.
Anti-pattern: an AWS access key in env vars used to fetch from Secrets Manager. You replaced one static secret with two. The access key has the same problems as any other env var.
Cross-cloud and on-prem: use OIDC federation. The workload has an OIDC identity (Kubernetes service account, GitHub Actions OIDC token), trades it for cloud credentials via STS, fetches secrets. The federation server is the only thing with cloud creds and it's locked down.
Rotation
Secrets that never change are only marginally safer than secrets in env vars. The leak window is "forever."
Rotation cadence.
- Database passwords: 30-90 days.
- API keys to third parties: when team members leave, or quarterly.
- Signing keys: yearly (longer because rotation breaks existing tokens).
- Root credentials: yearly minimum, with break-glass procedures.
- Service-to-service tokens: short-lived (hours), automated.
AWS Secrets Manager has built-in rotation for RDS, Aurora, DocumentDB, Redshift. You enable rotation, set a cadence, AWS runs a Lambda that creates new credentials, tests them, updates the secret. Your app picks up the new value on next refresh.
For custom services, write your own rotation function. The pattern.
- Create new credential alongside the old one (dual-active period).
- Update the secret value to the new credential.
- Wait for cache TTLs to expire and clients to pick up new value.
- Verify new credential works.
- Revoke old credential.
The dual-active period is critical. Without it, the moment you change the secret, all running app instances with cached old credentials fail until they restart.
Caching and Refresh
Fetching from the secret manager on every request is too slow. Cache in memory with a TTL.
class SecretCache {
private cache = new Map<string, { value: string; expires: number }>();
private ttlMs = 5 * 60 * 1000; // 5 minutes
async get(name: string): Promise<string> {
const cached = this.cache.get(name);
if (cached && cached.expires > Date.now()) return cached.value;
const value = await secretsManager.getSecretValue(name);
this.cache.set(name, { value, expires: Date.now() + this.ttlMs });
return value;
}
}5-minute TTL is a reasonable default. Trades freshness (rotation takes up to 5 min to propagate) for cost (5x fewer API calls than 1-min TTL).
For long-lived processes, listen for rotation events via SNS or Eventbridge and invalidate the cache immediately when a secret changes. Faster propagation than TTL alone.
Dev Workflow
Production secrets must never reach dev machines. But devs need credentials to run the app locally. Three patterns.
- Separate dev tier credentials. Stripe test keys, dev database, mock email service. These can sit in .env files because their leak doesn't compromise prod.
- Personal credentials. Each dev has their own AWS profile with read-only access to a dev-only set of secrets. Their CLI fetches them on demand.
- Secret broker. Tools like Doppler, 1Password CLI, Infisical inject secrets into the shell on demand. Centralized, audited, no .env files needed.
The rule: prod secrets exist only in prod. The dev environment has its own credentials with limited blast radius.
CI/CD Security
Pipelines are a major leak vector. Best practices.
- Use the platform's secret store. GitHub Encrypted Secrets, GitLab CI variables, CircleCI contexts. Never put secrets in plain
env:in YAML. - Mask secrets in logs. GitHub does this automatically for secrets but not for derived values (don't construct strings that include secrets).
- Restrict secrets to specific environments and branches. Production secrets only available to deploys from main, not PR builds.
- Reject PRs from outside contributors running with internal secrets. GitHub Actions has
permissions: read-alland you can require approval for first-time contributors. - Use OIDC for cloud auth from CI. GitHub Actions can federate to AWS/GCP without storing long-lived cloud credentials.
The big risk: a compromised CI step (malicious npm install, prompt injection in an action) can exfiltrate secrets from the environment. Limit secret scope to the minimum, use ephemeral credentials.
Leak Response
When a secret leaks. Rotate immediately. Audit what was accessed during the leak window. Notify if required.
Triage order.
- Rotate the leaked secret. Most important. Stops further damage.
- Identify scope. Where else might this secret have been? In Docker images? In CI logs? In backups?
- Review access logs. What did this credential access during the exposure window? Was anything unusual touched?
- Notify. Customers if their data was exposed. Compliance if regulations require disclosure.
- Post-mortem. How did it leak. How do we prevent the pattern.
Don't try to recall git commits. Once a secret hits git, treat it as fully public. GitHub's secret scanning alerts on common secret formats and Stripe/AWS/GitHub watch for leaked credentials in public repos and disable them automatically.
Secret Scanning In CI
Pre-commit and pre-push hooks. Tools like gitleaks, trufflehog, ggshield (GitGuardian) scan for high-entropy strings and known patterns. Catch leaks before they hit the remote.
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaksCI-side scanning runs on every PR. GitHub Advanced Security includes secret scanning. Snyk, Semgrep have secret rules.
Custom secret formats (your internal API keys) won't be detected by generic scanners. Configure your custom prefixes in the scanner config.
Common Pitfalls
- Secrets in Docker layers.
RUN export SECRET=fooputs the secret in the layer cache forever. Use BuildKit--mount=type=secretfor build-time secrets. - Secrets in environment variables logged on startup. Some frameworks log all env vars at debug level. Audit your boot logs.
- Hardcoding the secret manager region/account ID, then sharing the config across environments. The dev config in prod fetches dev secrets, runs against prod data.
- Letting one service have access to all secrets. Scope by service, least privilege.
- No rotation. Secrets older than a year should be assumed compromised.
- Static .env files for production. Even on a "secure server" they get backed up, copied during incident response, read by ops.
Interview Soundbites
- "Secrets manager with workload identity for prod. No static credentials anywhere."
- "Rotation cadence: DB passwords quarterly, signing keys yearly, third-party API keys when people leave."
- "Cache fetched secrets in-memory with a 5-minute TTL. Listen for rotation events to invalidate faster."
- "CI secrets use OIDC federation, scoped to the minimum needed. Forks from outside contributors don't run with internal secrets."
Learn more
- Docs
- DocsHashiCorp Vault docsHashiCorp
- Docs
- Docs
- Article