How We Rebuilt Our webapp CI/CD with Docker and GitHub Actions

Mar 2026

The Problem

We had a CD workflow, but it wasn’t effective — the Next.js build process is memory-heavy, and our EC2 runs multiple services on the same machine. So in practice, deployments were manual. SSH into the server, stop services that could interfere, pull code, install dependencies, build the app, restart PM2. It worked, but it wasn’t fun. And with AI-assisted development increasing our velocity, the number of feature branches grew fast, making this manual routine harder to keep up with.

This setup had three core issues:

Server crashes during builds. Our staging EC2 runs the Django API, Celery workers, Gunicorn, and Prefect (our orchestration engine, which is memory-hungry). The Next.js build would consume so much CPU and RAM that we had to manually stop these services before every deploy. Forget one? The staging machine would crash and need a restart from the AWS console. Production never crashed, but we had to be careful — always checking that Prefect had no active flows and that RAM looked reasonable before kicking off a build.
No feature branch testing. Testing a branch on staging meant SSHing in, git checkout, install, build, restart. Since there was no coordination, you’d sometimes deploy your branch while the team was still reviewing someone else’s. The usual confusion — “wait, whose branch is on staging right now?”
CI didn’t gate CD. CI and CD were separate parallel workflows — a race condition by design. Broken code could deploy to staging before tests even finished running.

The Solution: Build Off-Server, Ship Images

One principle: the server never builds anything. The build runs on GitHub Actions runners, produces a Docker image, and pushes it to GitHub Container Registry (GHCR). The server only pulls and runs pre-built images — a container swap that takes a couple of minutes and doesn’t touch other services.

Dockerizing the Next.js App

We use a multi-stage Dockerfile to keep the final image lean:

Dependencies stage — installs node_modules in an isolated layer, cached until package-lock.json changes.
Builder stage — copies source code, injects NEXT_PUBLIC_* environment variables as build args (Next.js bakes these into the JavaScript bundle at compile time), and runs the build.
Runner stage — starts from a clean Alpine image, copies only the Next.js standalone output and static assets. Next.js standalone mode bundles the app and only the node_modules it actually needs into a self-contained folder — no full node_modules, no source code. The final image just runs node server.js.

The first two stages run on the GitHub runner’s native architecture for speed. Only the final stage targets linux/arm64 (our EC2 architecture), avoiding cross-compilation overhead during install and build.

On the server, a docker-compose.yml reads the image tag from a .env file. Deploying means updating this .env, pulling the image, and restarting the container. That’s it — no build step on the server, ever.

Why GHCR and Not Docker Hub

Two reasons:

Security. The built images contain NEXT_PUBLIC_* API keys (Pendo, Amplitude, etc.) baked into the JavaScript bundle. A public image would expose all of them — anyone could pull our image and start sending data to our analytics, intentionally or not. GHCR lets us store images as private packages.

Zero credential management. GHCR integrates natively with GitHub Actions — the auto-generated GITHUB_TOKEN handles push and pull without any PATs or stored credentials. During deployment, this short-lived token is passed to the server via SSH, used for the pull, and the server logs out immediately. Nothing stays on disk.

Secrets with GitHub Environments

We use GitHub Environments (staging and production) to scope secrets. The same secret name — NEXT_PUBLIC_BACKEND_URL, SSH_HOST — holds different values per environment. The environment: field on each workflow job determines which set is used. Same workflow file, same Dockerfile — different secrets injected at build time.

The Workflow Architecture

Three workflow files cover our deployment scenarios.

Flow 1: Pull Request — CI Checks

Trigger: PR opened against any branch.

The dalgo-ci-cd.yml workflow runs formatting (Prettier), linting (ESLint), tests (Jest + Codecov), and a Docker build verification (build but don’t push). Any failure blocks the PR.

Flow 2: Merge to Main — Auto Deploy to Staging

Trigger: Push to main.

The same dalgo-ci-cd.yml file, but now CD activates after CI. Three jobs run in sequence:

checks — runs the full CI suite. Only if it passes do the next jobs run. CI finally gates CD — no more race conditions.

staging-guard — queries the GitHub Deployments API to check what branch is currently on staging. If a feature branch is deployed (someone is testing), the guard blocks the auto-deploy instead of swapping out the branch under them.

deploy-staging — builds the image with staging secrets, pushes to GHCR (tagged staging-main-<sha>), deploys to the server, and records the deployment.

Flow 3: Feature Branch to Staging

Trigger: Manual via GitHub Actions UI.

From the dalgo-manual-deploy.yml workflow, the operator selects “deploy-staging,” picks a branch, and types deploy-staging to confirm. The image is built from that branch, tagged staging-<branch>-<sha>, and deployed.

Once a feature branch is on staging, the staging-guard in Flow 2 blocks all auto-deploys. Merges to main still pass CI but skip CD — the feature branch stays undisturbed until someone manually deploys main back. Testing a branch went from “SSH, checkout, build, restart, hope nobody pushes over you” to picking it from a dropdown.

See it in action: https://jam.dev/c/abef3b43-fc76-4fb9-a6e9-803b3618bd25

Flow 4: Deploy to Production

Trigger: Manual via GitHub Actions UI, main branch only enforced.

Same workflow, “deploy-production” action. Builds with production secrets (scoped via the production GitHub Environment), tags as prod-main-<sha>, deploys to the production server. The previous deployment’s tag is preserved for rollback.

No auto-deploy, no surprises. Production is always deployed manually from the UI.

Flow 5: Rollback and Releases

Rollback — the “rollback-production” action. The operator can specify an image tag, or leave it empty to auto-rollback. The workflow queries the GitHub Deployments API filtered to environment=production only, picks the second-most-recent production deployment record, and pulls that image from GHCR. No rebuild, no guessing — the kind of recovery speed you appreciate at 11 PM on a Friday.

Release tagging — when a GitHub Release is published, dalgo-release.yml finds the production image built from that release’s commit and tags it with the version number (e.g., v1.2.0). No rebuild — just an alias so releases have clean, findable image tags in the registry.

The Deploy Action

All deploy paths (staging, production, rollback) use one shared composite action. It SSHs into the server, updates the .env with the new image tag, logs into GHCR with the ephemeral token, pulls the image, restarts the container, and logs out. Every environment deploys the exact same way — only the image tag differs.

Images follow the pattern <env>-<branch>-<sha> (e.g., staging-main-5e5bee6, prod-main-def5678). The SHA comes from the triggering commit, giving full traceability from any running container back to the exact code.

What Changed

Before	After
The full webapp repo with `node_modules` used to take up ~1.7 GB on the machine	The Docker image that actually runs webapp is now 225 MB
Build on EC2, server crashes	Build on GitHub runners, server only pulls images
Stop Prefect/Gunicorn before deploying	No need to stop prefect
SSH and manual checkout for feature branches	One-click feature branch deploy from GitHub UI
CI and CD race each other	CI gates CD — deploy only after tests pass
No rollback mechanism	Instant rollback to previous image, no rebuild
No visibility into what’s deployed	Deployment state tracked in GitHub Environments

The server no longer builds anything — it just pulls and runs containers. Deployments are traceable, rollbacks are instant, feature branch testing is a dropdown menu away, and nobody has to SSH in and stop Prefect before hitting build. Merging to main and seeing it live on staging in minutes — finally.