The Problem
We had a CD workflow, but it wasn’t effective — the Next.js build process is memory-heavy, and our EC2 runs multiple services on the same machine. So in practice, deployments were manual. SSH into the server, stop services that could interfere, pull code, install dependencies, build the app, restart PM2. It worked, but it wasn’t fun. And with AI-assisted development increasing our velocity, the number of feature branches grew fast, making this manual routine harder to keep up with.
This setup had three core issues:
- Server crashes during builds. Our staging EC2 runs the Django API, Celery workers, Gunicorn, and Prefect (our orchestration engine, which is memory-hungry). The Next.js build would consume so much CPU and RAM that we had to manually stop these services before every deploy. Forget one? The staging machine would crash and need a restart from the AWS console. Production never crashed, but we had to be careful — always checking that Prefect had no active flows and that RAM looked reasonable before kicking off a build.
- No feature branch testing. Testing a branch on staging meant SSHing in,
git checkout, install, build, restart. Since there was no coordination, you’d sometimes deploy your branch while the team was still reviewing someone else’s. The usual confusion — “wait, whose branch is on staging right now?” - CI didn’t gate CD. CI and CD were separate parallel workflows — a race condition by design. Broken code could deploy to staging before tests even finished running.
The Solution: Build Off-Server, Ship Images
One principle: the server never builds anything. The build runs on GitHub Actions runners, produces a Docker image, and pushes it to GitHub Container Registry (GHCR). The server only pulls and runs pre-built images — a container swap that takes a couple of minutes and doesn’t touch other services.
Dockerizing the Next.js App
We use a multi-stage Dockerfile to keep the final image lean:
- Dependencies stage — installs
node_modulesin an isolated layer, cached untilpackage-lock.jsonchanges. - Builder stage — copies source code, injects
NEXT_PUBLIC_*environment variables as build args (Next.js bakes these into the JavaScript bundle at compile time), and runs the build. - Runner stage — starts from a clean Alpine image, copies only the Next.js standalone output and static assets. Next.js standalone mode bundles the app and only the
node_modulesit actually needs into a self-contained folder — no fullnode_modules, no source code. The final image just runsnode server.js.
The first two stages run on the GitHub runner’s native architecture for speed. Only the final stage targets linux/arm64 (our EC2 architecture), avoiding cross-compilation overhead during install and build.
On the server, a docker-compose.yml reads the image tag from a .env file. Deploying means updating this .env, pulling the image, and restarting the container. That’s it — no build step on the server, ever.
Why GHCR and Not Docker Hub
Two reasons:
Security. The built images contain NEXT_PUBLIC_* API keys (Pendo, Amplitude, etc.) baked into the JavaScript bundle. A public image would expose all of them — anyone could pull our image and start sending data to our analytics, intentionally or not. GHCR lets us store images as private packages.
Zero credential management. GHCR integrates natively with GitHub Actions — the auto-generated GITHUB_TOKEN handles push and pull without any PATs or stored credentials. During deployment, this short-lived token is passed to the server via SSH, used for the pull, and the server logs out immediately. Nothing stays on disk.
Secrets with GitHub Environments
We use GitHub Environments (staging and production) to scope secrets. The same secret name — NEXT_PUBLIC_BACKEND_URL, SSH_HOST — holds different values per environment. The environment: field on each workflow job determines which set is used. Same workflow file, same Dockerfile — different secrets injected at build time.
The Workflow Architecture
Three workflow files cover our deployment scenarios.
Flow 1: Pull Request — CI Checks
Trigger: PR opened against any branch.
The dalgo-ci-cd.yml workflow runs formatting (Prettier), linting (ESLint), tests (Jest + Codecov), and a Docker build verification (build but don’t push). Any failure blocks the PR.
Flow 2: Merge to Main — Auto Deploy to Staging
Trigger: Push to main.
The same dalgo-ci-cd.yml file, but now CD activates after CI. Three jobs run in sequence:
checks — runs the full CI suite. Only if it passes do the next jobs run. CI finally gates CD — no more race conditions.
staging-guard — queries the GitHub Deployments API to check what branch is currently on staging. If a feature branch is deployed (someone is testing), the guard blocks the auto-deploy instead of swapping out the branch under them.
deploy-staging — builds the image with staging secrets, pushes to GHCR (tagged staging-main-<sha>), deploys to the server, and records the deployment.
Flow 3: Feature Branch to Staging
Trigger: Manual via GitHub Actions UI.
From the dalgo-manual-deploy.yml workflow, the operator selects “deploy-staging,” picks a branch, and types deploy-staging to confirm. The image is built from that branch, tagged staging-<branch>-<sha>, and deployed.
Once a feature branch is on staging, the staging-guard in Flow 2 blocks all auto-deploys. Merges to main still pass CI but skip CD — the feature branch stays undisturbed until someone manually deploys main back. Testing a branch went from “SSH, checkout, build, restart, hope nobody pushes over you” to picking it from a dropdown.
See it in action: https://jam.dev/c/abef3b43-fc76-4fb9-a6e9-803b3618bd25
Flow 4: Deploy to Production
Trigger: Manual via GitHub Actions UI, main branch only enforced.
Same workflow, “deploy-production” action. Builds with production secrets (scoped via the production GitHub Environment), tags as prod-main-<sha>, deploys to the production server. The previous deployment’s tag is preserved for rollback.
No auto-deploy, no surprises. Production is always deployed manually from the UI.
Flow 5: Rollback and Releases
Rollback — the “rollback-production” action. The operator can specify an image tag, or leave it empty to auto-rollback. The workflow queries the GitHub Deployments API filtered to environment=production only, picks the second-most-recent production deployment record, and pulls that image from GHCR. No rebuild, no guessing — the kind of recovery speed you appreciate at 11 PM on a Friday.
Release tagging — when a GitHub Release is published, dalgo-release.yml finds the production image built from that release’s commit and tags it with the version number (e.g., v1.2.0). No rebuild — just an alias so releases have clean, findable image tags in the registry.
The Deploy Action
All deploy paths (staging, production, rollback) use one shared composite action. It SSHs into the server, updates the .env with the new image tag, logs into GHCR with the ephemeral token, pulls the image, restarts the container, and logs out. Every environment deploys the exact same way — only the image tag differs.
Images follow the pattern <env>-<branch>-<sha> (e.g., staging-main-5e5bee6, prod-main-def5678). The SHA comes from the triggering commit, giving full traceability from any running container back to the exact code.
What Changed
| Before | After |
|---|---|
The full webapp repo with node_modules used to take up ~1.7 GB on the machine | The Docker image that actually runs webapp is now 225 MB |
| Build on EC2, server crashes | Build on GitHub runners, server only pulls images |
| Stop Prefect/Gunicorn before deploying | No need to stop prefect |
| SSH and manual checkout for feature branches | One-click feature branch deploy from GitHub UI |
| CI and CD race each other | CI gates CD — deploy only after tests pass |
| No rollback mechanism | Instant rollback to previous image, no rebuild |
| No visibility into what’s deployed | Deployment state tracked in GitHub Environments |
The server no longer builds anything — it just pulls and runs containers. Deployments are traceable, rollbacks are instant, feature branch testing is a dropdown menu away, and nobody has to SSH in and stop Prefect before hitting build. Merging to main and seeing it live on staging in minutes — finally.