Saving Money on the AWS platform
Note: This article is a work in progress and is updated regularly.
Maybe your company is already on AWS and the monthly bill is giving you a stomach ache. Or you want to move from on-prem to the cloud but are afraid of surprises.
Moving from on premise to the cloud can save a lot of money and headaches. Fewer people needed for infrastructure, security is (more or less) handled by the cloud provider and not some overworked operation person and you can just turn stuff off you don’t need anymore.
Still, cloud can become expensive fast, when you’re not careful. If you hand the keys to everyone in your organisation to commission some servers or databases or what have you, you might be in for a surprise at the end of the month.
But even when you and your teams are careful about using only what you need, there are some hidden cost drivers you might not have thought about:
S3 storage is super cheap. Data transfer from S3 not so much.
EC2 servers have clear pricing
per instance but don’t forget about all the additional costs: load balancing, elastic IPs, data transfer out, block storage, monitoring.
Here I’m going to collect learnings, separated by service.
General considerations
Shut down everything that is not needed
- Can you turn off dev and staging at night, weekends?
- Can you scale down prod at night?
- Can you remove stored data? Glacier Deep Archive super cheap and still get it in a few days
- Purge backups that are no longer needed
- Are there servers running (maybe assigned as a dev environment assigned to someone who already left the company) doing nothing?
- Use Infrastructure as Code. Makes it easy to shut things down and re-create them when you actually need them.
- Look into setting up a cloud saving KPI Dashboard .
Tag your resources
There are two types of tags: AWS-generated and user-defined tags. The first one are auto-generated, for example if a resource was generated by CloudFormation, it gets a corresponding aws:cloudformation:..
tag. They are not great for this task, though, since they differ from service to service.
User-defined tags are defined by the user, either during creating or later. You can enforce newly-created resources to have a specified tag via Service Control Policies.
Now, which tags do we want? It depends on your company. Examples are: Environment
(Dev, staging, qa, prod), Owner
(the creator’s email address or a team name), Product
or Service
.
Remember to active the tag as a cost allocation tag
.
Expose your devs, PMs to the cost
Maybe they didn’t know that playing around with the newest AI toy costs 5k per month.
There are third-party services that can send regular emails with a cost breakdown to interested parties and might even trigger their competitive spirit to get the numbers down.
Talk to AWS
- When you hit high six figures in monthly spend AWS might be willing to cut prices like egress rates, often by a significant amount
- Ask your account manager to have a conversation with a cost optimization expert, it won’t cost you anything.
Turn off premium AWS Support
- If you don’t need it. Can be turned on later again
Taxes
- You might be eligible to get tax credits for R&D in pre-production environments
EC2
Use gp2—not io1 or now io2—for almost everything. Provisioned IOPS are expensive
Look into graviton instances. If you run interpreted languages and your workload is compatible with ARM you can save around 30%.
Spot if it can withstand interruption
- It’s cheaper than even reserved instances and there is no commitment.
- Don’t run a spot ASG with just a single instance type which decreases the chance of nothing being available. Use a list of candidates that include smaller instance types which can be run in parallel with capacity weights (instead of 1
2xlarge
run 2xlarge
)
Savings Plan
- More flexible: not tied to a specific region or instance family
- Also applies to Fargate
- Reserved instances can have a higher discount, though
Reserved instances
- Don’t overbuy. Buy one for 20% of needed capacity first, then adjust after a few weeks
- Reach out to support if you overbought, they might help.
- Look into convertible instances
Right-sizing your EC2 instances
- Might not be so easy .
Elastic Block Storage (EBS)
Do you have unused EBS volumes? Oversized ones? Unneeded snapshots?
Load-Balancers
Consider using one ALB for multiple, low-traffic routes
RDS
- Read optimizing costs in RDS .
Cloudwatch
Logging can be seriously expensive.
External tools pulling Cloudwatch data unnecessarily?
Outdated logs that can be deleted, moved to cheaper storage?
Excessive logging in applications? (Maybe prod doesn’t need to log
info
)
Lambda
Increasing RAM can reduce costs. Check out Lambda Power Tuner
Rewrite low-volume EC2 services to lambda?
S3
Here’s a great post about S3 cost optimization.
Turn on S3 analytics in largest s3 buckets.
Use lifecycle policies. Move stuff to cheaper storage tiers.
Use private endpoints to reduce data charges
Remember that request charges can be multipe times the storage charges.
Use CloudFront if you use S3 outside of AWS. Requests are much cheaper.
Use S3 Storage Lens . Check out this article about 5 ways to reduce costs .
Elastic Container Registry (ECR)
- If you are using ECR, and you’re running inside a private subnet, then you should definitely look into setting up VPC endpoints for ECR (and S3 for the image layers). Pulling images via a NAT gateway is pricy.
DynamoDB
- On-demand capacity is cheaper for less-frequently accessed tables. Make sure you are not using provisioned capacity for these (provisioned capacity is often the default when using Terraform or CDK).
- Look into VPC endpoints
API Gateway
- If you’re just proxying to lambda, you’re overpaying. Look into HTTP API instead of REST for up to 70% cost reduction.
Network Address Translation (NAT)
Set up NAT gateway per AZ to reduce cross-AZ data transfer
NAT gateway at scale is terribly expensive ($0.045 per GB in data processing plus between $0 and $0.09 per GB depending where it is going). Look into NAT instances and fck-nat .
Can you use Gateway endpoints , interface endpoints or public subnets instead of NAT?
Consider consolidating your NAT gateways by using an egress VPC and transit gateways
Step Functions
Express workflow can be cheaper by order of magnitudes. Watch this video .
Data Transfer
Get an overview of all data transfer costs .
For a visual overview, take a look at this graphic from the duckbill group.
Shows up in EC2-other in cost explorer
Avoid public network routes for internal resources.
Set up vcp flow logs and analyse with AWS insights.
Regions
- Keep your workloads in one or two regions unless you have reasons not to.
- Some regions are cheaper than others. If latency and regulations are no problem move to, e.g., Mumbai.