The Problem CI/CD Solves
Imagine a team without CI/CD. Developers write code on their laptops. When they think a feature is done, they manually run tests (sometimes). They merge into the main branch (sometimes after review). Once a week, someone "does a release": copies files to a server, runs migrations by hand, restarts services. Things break. Rollback means SSHing into production and undoing changes.
This was normal in 2005. It is malpractice in 2026. Every meaningful improvement in how software ships in the last two decades reduces to "automate this, run it on every change, fail loudly when something is wrong."
That automation is CI/CD. It is the assembly line that takes raw code commits and produces running production software with tests, security scans, deployments, and verification all happening automatically. Done well, it lets teams ship dozens of times per day with confidence. Done poorly, it becomes a slow, flaky obstacle that everyone resents.
This article walks through what CI and CD actually mean, what a real pipeline looks like, and what separates good pipelines from bad ones.
Step 1: The Three Letters
The acronym is overloaded. Three concepts hide in two letters:
CI: Continuous Integration
Every code change is automatically built and tested. The goal is fast feedback: catch problems within minutes of writing them, not weeks later when context is lost.
The original meaning, going back to the 1990s. The hard part is not running tests; it's making the tests reliable, fast, and comprehensive enough that passing them actually means something.
CD: Continuous Delivery
Every change that passes tests is automatically prepared for release. The artifact is built, signed, deployed to staging, validated. A human still clicks a button to deploy to production.
The "release" is decoupled from the deployment. You can release new code anytime; deploying is a separate, explicit action.
CD: Continuous Deployment
Every change that passes tests is automatically deployed to production. No human button. The pipeline is the only path to production.
This is the more advanced practice. It requires high test confidence, feature flags for incomplete features, monitoring to catch regressions, and the ability to roll back automatically.
What Teams Actually Mean by "CI/CD"
Usually CI plus one of the two CDs. Most teams sit somewhere on this spectrum:
Level 1: CI only. Tests run on every PR.
Level 2: CI plus continuous delivery to staging. Production deploys are manual.
Level 3: CI plus continuous deployment to production. Many deploys per day.
Level 4: Continuous deployment with feature flags and progressive rollouts. Multiple changes per hour.
Higher levels give faster feedback and smaller changes per deploy (which means easier debugging). Most companies should aim for at least Level 2; ambitious ones target Level 3 or 4.
Step 2: The Pipeline Stages
A typical pipeline runs these stages in order. Each fails fast: if any stage fails, subsequent stages don't run.
1. Trigger. A push, a pull request, a schedule, a manual button. The pipeline starts.
2. Checkout. Clone the repo at the right commit. Set up the workspace.
3. Install dependencies. Restore from cache when possible. Fresh install when not.
4. Lint and static analysis. Catch obvious issues fast (style violations, syntax errors, unused imports).
5. Unit tests. The bulk of testing. Fast and isolated.
6. Build. Compile, bundle, create a container image, package the artifact.
7. Integration tests. Test against real(ish) dependencies (database in a container, mock APIs).
8. Security scans. Dependency vulnerabilities, secrets in code, license violations.
9. Push artifact. Upload the container image to a registry. Tag with commit SHA.
10. Deploy to staging. Automated. The artifact is now running in a real environment.
11. End-to-end tests. Run integration tests against staging.
12. Deploy to production. Automatic (continuous deployment) or manual approval (continuous delivery).
13. Smoke tests in production. Verify the deploy didn't break anything obvious.
14. Monitor and rollback if needed. Automated rollback on metric regressions.
Stage Order Matters
Cheap, fast checks early. Expensive checks later. If the lint stage takes 10 seconds and unit tests take 5 minutes, run lint first. Failing fast means developers see issues sooner.
Tests should be ordered by speed within their category. The 100ms tests run before the 30-second tests.
Step 3: Architecture of a Pipeline
tagged by SHA
Step 4: Tools
The CI/CD tooling space is crowded. Major players:
GitHub Actions. Integrated with GitHub. The default for many small/medium projects. Free for public repos. Easy YAML-based configuration. Strong ecosystem of pre-built actions.
GitLab CI/CD. Integrated with GitLab. Powerful, well-designed. The DSL is more capable than GitHub Actions in some ways. Strong on monorepos.
CircleCI. Classic third-party CI service. Fast. Strong macOS support (rare). Used by many tech companies.
Jenkins. The venerable self-hosted option. Maximum flexibility. Plugin ecosystem is enormous. Operational overhead is real; Jenkins servers need care and feeding. Still the right answer for some enterprise environments.
Argo CD / Flux. GitOps-style continuous deployment specifically for Kubernetes. The pipeline finishes by updating a Git repo; Argo or Flux watch that repo and apply changes to the cluster. Pull-based instead of push-based.
Spinnaker. Netflix-grade deployment platform. Multi-cloud, sophisticated rollout strategies. Heavyweight but powerful.
Buildkite. Hybrid: hosted control plane, self-hosted runners. Good cost characteristics for high-volume teams.
Tekton. Cloud-native CI/CD on Kubernetes. Built around Kubernetes primitives.
Picking a Tool
For a new project on GitHub: GitHub Actions. Easy and good enough.
For a Kubernetes-heavy org: Argo CD plus another tool for CI. GitOps is real.
For monorepo with sophisticated needs: GitLab CI or Buildkite.
For maximum control with operational capacity: Jenkins.
Step 5: What Makes a Good Pipeline
Some pipelines feel like accelerators. Some feel like obstacles. The difference comes from a few habits.
Fast Feedback
If your CI takes an hour, developers stop paying attention. They merge anyway, hoping for the best. They lose context by the time results come back. The whole point of CI is undermined.
Targets: lint and unit tests under 5 minutes. Full pipeline under 20 minutes. If you hit those numbers, developers actually wait for the result before context-switching.
Reliability
Tests pass when code is good, fail when code is bad. Flaky tests destroy trust. Once developers see "oh, that test fails sometimes for no reason," they start ignoring failures. Then a real failure slips by. Disaster.
Treat flakiness like a bug. Either fix the test or quarantine it (run separately, don't block builds). Track flake rates; ratchet down over time.
Reproducibility
The same commit always produces the same artifact. Pin dependencies (lock files), pin base images by hash (not by tag), pin tool versions. If your build varies by what time it ran, debugging will eventually become impossible.
Hermetic builds (no internet access during build) are the gold standard. Most teams settle for "near-hermetic": cached dependencies, version pinning, deterministic outputs.
Parallelism
Independent stages run concurrently. Tests should be parallelizable across multiple machines (some test runners do this automatically; some require explicit sharding).
A pipeline that takes 20 minutes serial might run in 5 minutes parallel.
Caching
Dependency installation is incremental, not from scratch every time. Test fixtures cached when possible. Docker layer caching enabled.
The first build of a fresh repo might take 10 minutes; subsequent builds should take 2-3 minutes due to caching.
Same Environment
CI environment matches production as closely as possible. Same OS, same database version, same library versions. Containers help here: build once, test in a container, deploy that container.
Visibility
Failures are obvious. Logs are easy to access. Test reports show what failed and why. Developers can debug without digging through cryptic build output.
Cost-Awareness
CI is expensive. Compute time, parallel jobs, large dependency caches. At scale, optimizing cost matters: skip running on doc-only changes, cache aggressively, run only relevant tests for the changes.
Step 6: Build Once, Promote
One of the most important patterns. Build a single artifact (e.g., a container image) once. Promote that exact same artifact through environments: dev, staging, production. Don't rebuild between environments.
Why This Matters
Rebuilding might produce slightly different artifacts due to dependency changes or non-determinism. The thing you tested in staging is not the thing you deployed to production. Subtle differences cause production-only bugs.
Build once, promote means "what passed staging is exactly what runs in production." Bug-for-bug identical.
How It Works
The CI step builds a container image and tags it with the commit SHA. The image is pushed to a registry.
Each environment's deployment pulls the same image by SHA. Staging gets myapp:abc123. Production also gets myapp:abc123. Identical bytes.
Configuration differs per environment (database URLs, secrets, feature flags) and is injected at deploy time, not baked into the image.
Step 7: Branch Strategy
How code flows from feature to production matters as much as the pipeline itself.
Trunk-Based Development
Short-lived feature branches merge frequently into main (or trunk). Days, not weeks. Code is reviewed and merged. CI runs on every push.
Production is always close to main. Releases happen by tagging a commit on main and deploying.
This is the modern web team default. Combined with feature flags (covered next), it enables continuous deployment.
Git Flow
Long-lived develop branch. Feature branches off develop. Release branches for releases. Hotfix branches for emergencies.
Heavier process. Suits release-cadence software (mobile apps, on-prem products) where releases happen less frequently. Mostly out of fashion for web.
Release Flow (a.k.a. GitHub Flow)
Hybrid. Main is always stable. Release branches cut for shipped versions. Hotfixes go to release branches and cherry-pick to main.
Common at companies that ship web continuously but also need to support older versions for some users.
Modern Default
Trunk-based with feature flags. Most modern web teams use this. The code is always shippable; flags hide incomplete features. Decoupling deployment from release is the key insight.
Step 8: Feature Flags
Continuous deployment means code ships even when features aren't done. Feature flags let you ship the code disabled, then enable it for specific users (internal first, then beta, then everyone).
What Flags Decouple
Without flags, deploying a feature means making it visible to all users at the same time. With flags, deploys are separate from releases:
Deploy means "the code is running in production."
Release means "users see the feature."
You can deploy 20 times a day; release a feature once, when ready.
What Flags Enable
Trunk-based development at scale. Engineers merge incomplete code daily without breaking users.
Gradual rollouts. 1% of users, then 5%, then 50%, then everyone. Watch metrics at each step.
A/B testing. Half of users see version A, half see version B. Compare engagement.
Kill switches. If a feature breaks in production, disable the flag without deploying. Recovery in seconds.
Tools
LaunchDarkly, Split, GrowthBook, Statsig. Or in-house systems backed by a config service.
Costs
Code complexity. Every feature behind a flag has two paths. After release, the flag should be removed; teams forget. Old flags accumulate.
Treat flag removal as part of feature completion. Have a process to audit and clean up old flags.
Step 9: Security in the Pipeline
The CI/CD pipeline has god-mode access: it can deploy anything. Compromise the pipeline and an attacker owns production. The 2021 Codecov, SolarWinds, and 3CX incidents all involved supply-chain attacks via CI/CD.
Practices
Limit who can modify the pipeline. Pipeline changes require review. Branch protection on the workflow files.
Use OIDC for cloud credentials, not long-lived secrets. Short-lived tokens issued per-job. Compromise of a single job doesn't expose long-term keys.
Sign artifacts cryptographically. Sigstore, Notary, in-toto. Verifiable provenance.
Scan dependencies for known vulnerabilities. Snyk, Dependabot, Trivy, npm audit. Block deploys with high-severity issues.
Use SAST tools to detect security issues in code. Static analysis catches some classes of bugs before runtime.
Audit logs for every deploy. Who deployed what, when, with what config.
Approval gates for production. At minimum, require two-human approval for production-changing deploys.
Secrets in a vault, not in env files. HashiCorp Vault, AWS Secrets Manager, Doppler. Pipelines fetch secrets at run time.
Least-privilege service accounts. The deploy account can deploy; it cannot read your customer database.
Supply Chain Awareness
Modern apps depend on hundreds of packages, each with their own dependencies. Each is a potential attack surface. Supply chain security is now a first-class concern.
Track an SBOM (Software Bill of Materials). Know what's in your build. Detect unauthorized changes to your dependencies.
Step 10: Testing Strategy
The Test Pyramid
Many cheap tests at the bottom (unit), fewer slower tests in the middle (integration), few slow tests on top (end-to-end).
Unit tests: tens of thousands. Run in milliseconds each.
Integration tests: hundreds. Run in seconds each.
E2E tests: dozens. Run in minutes each.
The pyramid means most failures get caught quickly by cheap tests. E2E catches the few classes of bugs that escape lower levels.
Common Anti-Patterns
The ice cream cone. Lots of E2E tests, few unit tests. Tests are slow, flaky, hard to debug. The opposite of what you want.
The test trophy. Mostly integration tests. Modern alternative to the pyramid that some teams adopt; can work if integration tests are fast enough.
No tests on the "happy path." Tests cover edge cases but the main flow has no test. A change to the happy path breaks production.
Beyond Functional Tests
Performance tests, load tests, chaos tests, security tests. Each catches different categories of bugs. Larger teams run these on schedules (nightly load tests, weekly chaos tests) outside the main CI pipeline.
Step 11: Rollback and Recovery
Every deploy will eventually go wrong. The pipeline must support quick recovery.
Rollback Strategies
Re-deploy the previous artifact. If you deployed v123 and it broke, re-deploy v122. Build-once-promote makes this clean: v122 is already built; just point production at it.
Database migrations are tricky. If v123 changed the schema, rolling back to v122 requires either reverse migrations or making schemas backward-compatible. The Deployment Strategies article goes deeper.
Feature flag rollback. If the bad code is behind a flag, disable the flag. No redeploy needed. Recovery in seconds.
Automated Rollback
Mature setups monitor production metrics after deploys (error rates, latency, key business KPIs). If metrics regress beyond threshold, rollback automatically. The deploy is reverted before humans even notice.
This requires careful threshold tuning (false positives interrupt good deploys; loose thresholds miss real issues) and a fast rollback mechanism.
Forward Fix vs Rollback
Sometimes rollback is the wrong move. If the bug is small and the fix is fast, rolling forward (deploying a fix) might be quicker than rolling back. Especially for additive features where rollback would cause UX glitches.
Mature teams choose case by case.
Step 12: Monitoring and Observability
The pipeline only ends when the deployed code is verified working. Monitoring is the closing step of CI/CD.
What to Monitor
Error rates. Spike after deploy = bad deploy.
Latency. Regression in p99 latency.
Business metrics. Sign-ups, conversions, transaction counts.
Resource usage. CPU, memory, disk on production hosts.
External dependencies. Are downstream APIs returning errors?
Deployment Annotations
When a deploy happens, annotate dashboards with the time and version. Easy to correlate "this metric regressed at 3:42 PM, exactly when v123 deployed."
Most modern monitoring (Datadog, Grafana, Honeycomb) supports this.
Synthetic Monitoring
Automated probes hit production endpoints continuously. If a probe fails, alert. Catches outages even when no users are around.
Real User Monitoring
JavaScript in the user's browser reports performance and errors. Captures actual user experience, not just synthetic.
Step 13: Operational Concerns
The Long-Lived Branch Problem
If feature branches live for weeks, integrating becomes painful. Conflicts. Tests that broke during the branch's lifetime. Resolution is painful and risky.
Mitigation: trunk-based development. Merge frequently. Branches that live more than a few days are a smell.
Pipeline as Code
Pipeline definitions live in the repo, not in a separate UI. Versioned alongside the code. Reviewable. Changes follow the same process as code changes.
Modern tools (GitHub Actions YAML, Jenkinsfile, GitLab CI YAML, Tekton) all support this.
Self-Hosted Runners
For sensitive workloads, run the pipeline on your own infrastructure rather than the CI provider's shared runners. More secure, more control. More operational work.
Build Cache Sharing
For monorepos and large projects, sharing build artifacts across runs and developers cuts build time dramatically. Bazel and similar tools enable this.
Database Migrations in CI/CD
Migrations are part of the deploy. Common pattern: migrations run before code rolls out, code is backward-compatible with both old and new schema, then the next deploy can use the new schema fully.
Backward-incompatible migrations require careful multi-step rollouts. Many bugs in deploy pipelines come from migration failures.
Pipeline Sprawl
Over time, pipelines accumulate complexity. Steps that once made sense, now don't. Tests that flake. Tools that nobody knows. Eventually you do a cleanup pass: remove dead steps, retire flaky tests, modernize.
Treat pipeline maintenance as a real engineering activity. It accumulates technical debt like any code.
Onboarding
A new engineer should be able to push code and see CI run on their first day. If onboarding requires "talk to Bob about the build system," your CI is too complex.
Step 14: Recap of Key Decisions
CI catches problems within minutes. Fast feedback is the entire point.
CD ships changes automatically (delivery) or all the way (deployment). Pick based on test confidence.
Build once, promote. Same artifact moves through environments.
Trunk-based with feature flags. The modern default for web teams.
Test pyramid: many cheap tests at the bottom. Pyramids beat ice cream cones.
Reliable tests are non-negotiable. Flakiness kills trust.
Pipeline as code. Versioned, reviewed, in the repo.
Pipeline security is supply chain security. Compromise here owns everything.
Automated rollback closes the loop. Bad deploys reverted automatically.
Monitor production. Deploys aren't done until production confirms healthy.
The One Thing to Remember
A good CI/CD pipeline is a feedback machine. Push code, get truth in minutes. Failed tests, broken builds, security issues, performance regressions. The faster the feedback, the smaller the issues that need fixing. The slower it is, the worse the bugs that pile up. Investing in pipeline speed and reliability is investing in your team's velocity. Most teams have CI/CD that works fine and could be twice as fast with two weeks of optimization. The dividends are immediate: faster ships, fewer bugs in production, happier engineers. The pipeline is infrastructure; treat it like infrastructure, not an afterthought.