Mark Pollmann's blog

Saving Money on the AWS platform

Note: This article is a work in progress and is updated regularly.

Maybe your company is already on AWS and the monthly bill is giving you a stomach ache. Or you want to move from on-prem to the cloud but are afraid of surprises.

Moving from on premise to the cloud can save a lot of money and headaches. Fewer people needed for infrastructure, security is (more or less) handled by the cloud provider and not some overworked operation person and you can just turn stuff off you don’t need anymore.

Still, cloud can become expensive fast, when you’re not careful. If you hand the keys to everyone in your organisation to commission some servers or databases or what have you, you might be in for a surprise at the end of the month.

But even when you and your teams are careful about using only what you need, there are some hidden cost drivers you might not have thought about:
S3 storage is super cheap. Data transfer from S3 not so much.
EC2 servers have clear pricing per instance but don’t forget about all the additional costs: load balancing, elastic IPs, data transfer out, block storage, monitoring.
Here I’m going to collect learnings, separated by service.

General considerations

Shut down everything that is not needed

  • Can you turn off dev and staging at night, weekends?
  • Can you scale down prod at night?
  • Can you remove stored data? Glacier Deep Archive super cheap and still get it in a few days
  • Purge backups that are no longer needed
  • Are there servers running (maybe assigned as a dev environment assigned to someone who already left the company) doing nothing?
  • Use Infrastructure as Code. Makes it easy to shut things down and re-create them when you actually need them.
  • Look into setting up a cloud saving KPI Dashboard .

Tag your resources

There are two types of tags: AWS-generated and user-defined tags. The first one are auto-generated, for example if a resource was generated by CloudFormation, it gets a corresponding aws:cloudformation:.. tag. They are not great for this task, though, since they differ from service to service.

User-defined tags are defined by the user, either during creating or later. You can enforce newly-created resources to have a specified tag via Service Control Policies.

Now, which tags do we want? It depends on your company. Examples are: Environment (Dev, staging, qa, prod), Owner (the creator’s email address or a team name), Product or Service. Remember to active the tag as a cost allocation tag .

Expose your devs, PMs to the cost

Maybe they didn’t know that playing around with the newest AI toy costs 5k per month.

There are third-party services that can send regular emails with a cost breakdown to interested parties and might even trigger their competitive spirit to get the numbers down.

Talk to AWS

  • When you hit high six figures in monthly spend AWS might be willing to cut prices like egress rates, often by a significant amount
  • Ask your account manager to have a conversation with a cost optimization expert, it won’t cost you anything.

Turn off premium AWS Support

  • If you don’t need it. Can be turned on later again

Taxes

  • You might be eligible to get tax credits for R&D in pre-production environments

EC2

  • Use gp2—not io1 or now io2—for almost everything. Provisioned IOPS are expensive

  • Look into graviton instances. If you run interpreted languages and your workload is compatible with ARM you can save around 30%.

Spot if it can withstand interruption

  • It’s cheaper than even reserved instances and there is no commitment.
  • Don’t run a spot ASG with just a single instance type which decreases the chance of nothing being available. Use a list of candidates that include smaller instance types which can be run in parallel with capacity weights (instead of 1 2xlarge run 2 xlarge)

Savings Plan

  • More flexible: not tied to a specific region or instance family
  • Also applies to Fargate
  • Reserved instances can have a higher discount, though

Reserved instances

  • Don’t overbuy. Buy one for 20% of needed capacity first, then adjust after a few weeks
  • Reach out to support if you overbought, they might help.
  • Look into convertible instances

Right-sizing your EC2 instances

Elastic Block Storage (EBS)

Load-Balancers

Consider using one ALB for multiple, low-traffic routes

RDS

Cloudwatch

Logging can be seriously expensive.

  • External tools pulling Cloudwatch data unnecessarily?

  • Outdated logs that can be deleted, moved to cheaper storage?

  • Excessive logging in applications? (Maybe prod doesn’t need to log info)

Lambda

  • Increasing RAM can reduce costs. Check out Lambda Power Tuner

  • Rewrite low-volume EC2 services to lambda?

S3

  • Here’s a great post about S3 cost optimization.

  • Turn on S3 analytics in largest s3 buckets.

  • Use lifecycle policies. Move stuff to cheaper storage tiers.

  • Use private endpoints to reduce data charges

  • Remember that request charges can be multipe times the storage charges.

  • Use CloudFront if you use S3 outside of AWS. Requests are much cheaper.

  • Use S3 Storage Lens . Check out this article about 5 ways to reduce costs .

Elastic Container Registry (ECR)

  • If you are using ECR, and you’re running inside a private subnet, then you should definitely look into setting up VPC endpoints for ECR (and S3 for the image layers). Pulling images via a NAT gateway is pricy.

DynamoDB

  • On-demand capacity is cheaper for less-frequently accessed tables. Make sure you are not using provisioned capacity for these (provisioned capacity is often the default when using Terraform or CDK).
  • Look into VPC endpoints

API Gateway

  • If you’re just proxying to lambda, you’re overpaying. Look into HTTP API instead of REST for up to 70% cost reduction.

Network Address Translation (NAT)

  • Set up NAT gateway per AZ to reduce cross-AZ data transfer

  • NAT gateway at scale is terribly expensive ($0.045 per GB in data processing plus between $0 and $0.09 per GB depending where it is going). Look into NAT instances and fck-nat .

  • Can you use Gateway endpoints , interface endpoints or public subnets instead of NAT?

  • Consider consolidating your NAT gateways by using an egress VPC and transit gateways

Step Functions

Express workflow can be cheaper by order of magnitudes. Watch this video .

Data Transfer

  • Avoid public network routes for internal resources.

  • Set up vcp flow logs and analyse with AWS insights.

Regions

  • Keep your workloads in one or two regions unless you have reasons not to.
  • Some regions are cheaper than others. If latency and regulations are no problem move to, e.g., Mumbai.

External Tools