Cloudthread Y Combinator
December 14, 2023

Automation and Workflow Integrations: Closing the Loop on Usage Optimization

Learn how automation and workflow integrations complement each other to conquer usage optimization while maintaining enterprise standards of security and change management processes.

Cloudthread recently launched One Click Automation, allowing teams to lower the barrier of action and convert Savings Opportunities into actualized savings easily. This article is dedicated to highlighting how automation and workflow integrations complement each other to close the loop on usage optimization while maintaining enterprise standards of security and change management processes.

Usage Optimization Introduction

What is cloud cost usage optimization?

Usage optimization focuses on controlling cloud costs by minimizing unnecessary or inefficient resource consumption. Think removing idle resources, rightsizing, and taking advantage of cost saving mechanisms (e.g. S3/EFS intelligent tiering in AWS). For a deeper dive into how this is difference from rate optimization, see here.

Overview of the challenges in achieving usage optimization

Acting on usage optimization is inherently more complicated than rate optimization because it requires input from engineering teams managing those workloads. Rate optimization can be centrally managed by Cloud FinOps without any engagement from engineering teams.

Acting on usage optimization is inherently more complicated than rate optimization because it requires input from engineering teams managing those workloads.

It’s no surprise that “Engaging Engineers” is consistently rated toughest challenge by FinOps folks.

Typical usage optimization status quo

At many organizations usage optimization doesn’t get active attention and opportunities to save money go unresolved which causes ongoing waste.

Sometimes over-provisioning is an intentional choice for production workloads where confidence in stability is worth “wasting” a bit of money. Sometimes the waste is egregious and unnecessary, particularly in dev and test environments.

A typical cycle: The monthly AWS bill comes in, the Cloud FinOps Team is frustrated at spend increases, they look at Trusted Advisor or Compute Optimizer for savings recommendations, they ping the relevant application owner about an underutilized instance in Slack, the application owner may or may not act on that ping, the Cloud FinOps Team won’t have the bandwidth to manually track progress on the savings remediation,  and the opportunity to save money often will go without action, incurring wasteful spend until the next fire drill.

Sometimes over-provisioning is an intentional choice for production workloads where confidence in stability is worth “wasting” a bit of money. Sometimes the waste is egregious and unnecessary, particularly in dev and test environments.

The Value and Risks of Automation

Definition of Automation in the Context of Usage Optimization

One Click Automation

One Click Automation is the ability to take action on a Savings Opportunity (or many of them) with a single click directly from Cloudthread’s platform. Savings Opportunity can be anything as simple as an unused Elastic IP or as complex as an RDS instance rightsizing.

Predefined Automation Rules

Coming soon to the Cloudthread platform, this is the ability to define rules that eventually take action in your environment. An example would be:

CONDITION: If an unattached EBS volume remains unattached for 7 days

SCOPE: In development environments

ACTION: Snapshot and delete the unattached EBS volume

The Value

In short, the value is saving time and money. But how this manifests for Cloud FinOps and Engineering Teams differs.

In short the value is saving time and money.

For Central Cloud FinOps Teams

In the status quo of many organizations described above, the Central FinOps doesn’t have the ability to make any actual changes to AWS infrastructure.

At organizations that are embracing rigorous usage optimization, leadership defines parts of infrastructure (typically dev and test) where FinOps has the permission to make changes through Cloudthread.

In this case, automation can be a powerful tool for the FinOps team to take direct action in accounts where they have permission to make changes and the risk of making changes without engineering sign off is low. No more relentless followups with engineering and wasted spend lingering.

No more relentless followups with engineering and wasted spend lingering.

For Engineering Teams

When an engineer is made aware of a Savings Opportunity they want to act on, it still requires precious bandwidth to actually go and take action. Whether via AWS CLI or ClickOps, it takes time to isolate the relevant resources and take action.

One Click Automation easily allows for bulk action across accounts and regions.

Less time required. More realized cloud savings.

The Risks

Stability and Security

A very reasonable hesitation around automation is the idea that it opens the door to accidentally automating an action that shouldn’t be taken.

As described further below, it’s essential for your organization to decide where it has appetite to risk automation. Typically enterprises choose to allow automation only in dev and test environments.

Infrastructure as Code Drift

For organizations that use Infrastructure as Code (IaC), automating infrastructure changes via One Click Automation creates drift.

Some organizations we’ve worked with still like to automate to take immediate action, realize savings immediately, and then go make permanent IaC changes so the change persists.

If your organization doesn’t use IaC, this isn’t a concern.

In the future, Cloudthread will have automation of usage optimization integrated with Terraform IaC.

Workflow Integrations Explained

Integrating with engineering workflows means integrating with the tools they use to plan and execute infrastructure changes.

At a minimum, this is integrating with Jira and ServiceNow to create tickets that are used to prioritize and assign work for engineering sprints. Especially for more significant cloud cost savings efforts (e.g. database rightsizing), carving out bandwidth from engineering sprints will be essential.

A more involved version of this means integrating with Jira Service Management and ServiceNow IT Operations Management so that changes are enacted through Jira/ServiceNow and their change log is the source of truth for infrastructure changes.

How to Safely Combine Automation and Workflow Integration

For the purposes of this example we’re discussing AWS as the cloud provider and Jira as the workflow management platforms. In terms of cloud providers, Cloudthread also supports GCP and Azure. In terms of workflow management platforms, Cloudthread supports ServiceNow in beta.

Setup Steps

Step 0: Decide where your company has appetite to use automation

A prerequisite to diving into automation. Decide where your organization is comfortable having FinOps and Engineering use automation to reduce wasted spend.

Step 1: Selectively enable automation only in accounts chosen in Step 0

Cloudthread makes it easy to augment your existing integration and opt in to automation for only a subset of accounts. You can setup the integration for the accounts you’ve chosen in Step 0.

Step 2: Integrate with your workflow management platform

You need to give Cloudthread the ability to create tickets in your workflow management platform and enable the two way sync between workflow management and Cloudthread. To do this go to your Integrations page to integrate with Jira. ServiceNow coming soon.

Ongoing Process

Assuming you’ve enabled automation in dev and test:

In Dev and Test

Whenever waste comes up in these account, the central Cloud FinOps Team can easily act on the waste as soon as they see it in their weekly Savings Opportunity Report.

Don’t let waste linger where it really shouldn’t.  No more ignored pings to engineering and relentless followups. Saving time. Saving money.

All automatic will be captured in Cloudthread’s Change Log.

In Production

Whoever is responsible for cloud cost management for each business unit / application (at some companies this is a central FinOps Team, at other companies this is a distributed cloud cost owner on the engineering team) will review available savings in their weekly Savings Opportunity Report and make the decision on what to engage engineering teams with.

Action on savings opportunities will flow through engineering workflow processes and be prioritized and acted on according to bandwidth and incentives at your organization. Incentive programs and tools like Leaderboards can be particularly effective in motivating engineering teams to act on Savings Opportunities.

Automation won’t touch or impact your production workloads and all changes will flow through your production change management system.

Conclusion

If you’re interested in implementing a usage optimization strategy at your organization, Cloudthread is excited to help - don’t hesitate to reach out.

Make cloud costs a first class metric for your engineering organization.
Copyright © 2024 CloudThread Inc.
All rights reserved.
Copyright © 2024 CloudThread Inc. All rights reserved