Deploying Containers with SST
2025-06-18
Before switching to using SST, our startup was using Kubernetes on Google Cloud. None of us are Kubernetes experts. Whenever we needed to do something new, like add monitoring or custom scaling logic, it was a real challenge and slowed us down. We weren’t using infrastructure as code, so any changes/testing was extremely tedious.
Since switching to SST, it has become easier to test infrastructure changes because we can spin up testing environments with ease. We also reduced our complexity by scrapping Kubernetes and using Elastic Container Service (ECS), which is the AWS Service that SST spins up for the Service component. ECS is basically managed containers with autoscaling and rolling deploys.
Silent Deployment Failures
By default, sst deploy
builds the Docker image, uploads it to ECS, and then moves on. However, that image isn’t actually live until the deployment finishes in ECS. If you pass in wait: true
to the Service component, sst deploy
will wait for the ECS deployment to complete. But, the deploy command will still succeed even if ECS automatically rolls back the image you’re trying to deploy due to it failing to start up or pass the health check that you’ve specified.
In ECS, the deployment status for the service will be ROLLBACK_SUCCESFUL
. Automatic rollbacks are nice, but in an ideal world, if the code in our new deployment isn’t rolled out, sst deploy
would fail. This way, we are notified in CI / Slack by default.
Adding a Deployment Status Check to CI
Since we want CI to fail if the “Service Deployment” status is anything other than success, we added a check in CI to catch this.
The check does the following:
- Get the “Cluster” containing our ECS services filtering by sst stage.
- Get the “Services” we’re trying to deploy from the cluster.
- Loop waiting for the “Deployment” status for the service to be a terminal state
- If this is anything other than
SUCCESS
, exit with an error so that CI fails.
- If this is anything other than
Status Check Script
Here is a link to the full script that we run after sst deploy
finishes in our GitHub action: check-ecs-deployment-status.js
Example Output:
Checking cluster: org-production-Cluster
----------------------------------------
Service: ServiceA
Deployment status: SUCCESSFUL
✅ Deployment succeeded
Service: ServiceB
Waiting for IN_PROGRESS state.
Waiting for IN_PROGRESS state.
Waiting for IN_PROGRESS state.
Deployment status: SUCCESSFUL
✅ Deployment succeeded
==================================================
✅ All deployments completed successfully!
Finished!
Setting up Github Action for `sst deploy`
If you haven’t set up GitHub Actions yet with SST, here’s a great article about how to configure it: https://craig.madethis.co.uk/2024/sst-github-actions
Troubleshooting Failed Deploys
The two main reasons we’ve seen failed deploys so far are:
- Differences in the production environment, like connecting to Elasticache instead of local Redis
- Differences in our production build, like a library not playing nice with ESM or forgetting to add a new monorepo package to our Dockerfile.
When they fail, you’ll see “Rollback successful” status in the “Deployments” tab of your service in ECS. However, the UI doesn’t surface what error caused the deployment to fail. It will just say something nebulous about health checks failing, and you will see a rollback event in the “Events” tab.
The only way I’ve found to troubleshoot these is to find the error logs. I’m not aware of a way to see the logs specifically for the failed deploys. To get around this, I added a log to our services when they’re starting up before we do anything else. I’ll then search for that log and try to find the error preventing the service from starting from there. Also, at that point, if you’re logging the hostname or some identifiable information of the machine, you can query that way.