Deploying Containers with SST

Before switching to using SST, our startup was using Kubernetes on Google Cloud. None of us are Kubernetes experts. Whenever we needed to do something new, like add monitoring or custom scaling logic, it was a real challenge and slowed us down. We weren’t using infrastructure as code, so any changes/testing was extremely tedious.

Since switching to SST, it has become easier to test infrastructure changes because we can spin up testing environments with ease. We also reduced our complexity by scrapping Kubernetes and using Elastic Container Service (ECS), which is the AWS Service that SST spins up for the Service component. ECS is basically managed containers with autoscaling and rolling deploys.

Silent Deployment Failures

By default, sst deploy builds the Docker image, uploads it to ECS, and then moves on. However, that image isn’t actually live until the deployment finishes in ECS. If you pass in wait: true to the Service component, sst deploy will wait for the ECS deployment to complete. But, the deploy command will still succeed even if ECS automatically rolls back the image you’re trying to deploy due to it failing to start up or pass the health check that you’ve specified.

In ECS, the deployment status for the service will be ROLLBACK_SUCCESFUL. Automatic rollbacks are nice, but in an ideal world, if the code in our new deployment isn’t rolled out, sst deploy would fail. This way, we are notified in CI / Slack by default.

Adding a Deployment Status Check to CI

Since we want CI to fail if the “Service Deployment” status is anything other than success, we added a check in CI to catch this.

The check does the following:

Get the “Cluster” containing our ECS services filtering by sst stage.
Get the “Services” we’re trying to deploy from the cluster.
Loop waiting for the “Deployment” status for the service to be a terminal state
- If this is anything other than SUCCESS, exit with an error so that CI fails.

Status Check Script

Here is a link to the full script that we run after sst deploy finishes in our GitHub action: check-ecs-deployment-status.js

Example Output:

Checking cluster: org-production-Cluster
----------------------------------------

  Service: ServiceA

    Deployment status: SUCCESSFUL
    ✅ Deployment succeeded

  Service: ServiceB

    Waiting for IN_PROGRESS state.
    Waiting for IN_PROGRESS state.
    Waiting for IN_PROGRESS state.
    Deployment status: SUCCESSFUL
    ✅ Deployment succeeded

==================================================
✅ All deployments completed successfully!
Finished!

Setting up Github Action for `sst deploy`

If you haven’t set up GitHub Actions yet with SST, here’s a great article about how to configure it: https://craig.madethis.co.uk/2024/sst-github-actions

Troubleshooting Failed Deploys

The two main reasons we’ve seen failed deploys so far are:

Differences in the production environment, like connecting to Elasticache instead of local Redis
Differences in our production build, like a library not playing nice with ESM or forgetting to add a new monorepo package to our Dockerfile.

When they fail, you’ll see “Rollback successful” status in the “Deployments” tab of your service in ECS. However, the UI doesn’t surface what error caused the deployment to fail. It will just say something nebulous about health checks failing, and you will see a rollback event in the “Events” tab.

The only way I’ve found to troubleshoot these is to find the error logs. I’m not aware of a way to see the logs specifically for the failed deploys. To get around this, I added a log to our services when they’re starting up before we do anything else. I’ll then search for that log and try to find the error preventing the service from starting from there. Also, at that point, if you’re logging the hostname or some identifiable information of the machine, you can query that way.