![](https://www.costory.io/wp-content/uploads/2024/09/1_4cBNH91kJzFg3HGO2GcFCQ-1024x204.webp)
By Mike Aidane
History
The Prototype: Quick Wins with SQL BQ View
Our initial approach to managing cloud costs was building a complex SQL BigQuery view to track key cost components. For example, we filtered DataFlow Beam Jobs using labels like beam_job_id LIKE 'beam_%'
. This provided quick visibility into our costs without requiring the full involvement of the development team.
Pros:
- Immediate results: We could analyze historical data and begin addressing cost issues right away, without a major team effort.
- Created a cost-responsibility culture: Early on, we implemented quick daily investigations and launched cost-saving initiatives, fostering a culture of cost efficiency.
- Identified “low-hanging fruit” for savings, which increased our visibility from 25% to 66%.
Cons:
- Limited visibility: It was difficult to track every component, as we lacked the dimensions needed for a full overview.
- Labor-intensive maintenance: Outdated rules needed regular updates, and queries required frequent adjustments as new components were introduced.
- Continuous manual tagging: Detecting new values and updating tags was an ongoing challenge.
Additionally, during this period, teams began adopting their own tagging methods — using DBT metadata, Beam labels, etc. — which led to fragmented systems of cost tracking. Despite everyone’s efforts, this lack of cohesion made it difficult to maintain a clear, unified view of cloud costs.
The Native Tagging Period: Overcoming Fragmentation and Delays
As our cloud usage grew, it became clear that we needed a more comprehensive solution. We launched an ambitious program to tag all costs by client and feature, including subcomponents for deeper troubleshooting.
We developed a taxonomy with two key axes:
- By client (aka Domain)
- By App / Feature / Step Name (A Hierarchy )
We also aimed to track responsible teams, ensuring clear ownership when cost anomalies occurred. Our goal was to achieve 80–90% tagging coverage within six months.
In reality, the process took closer to 12–18 months to complete, with frequent checks for untagged components. Despite strong C-level support for controlling costs, progress was slower than anticipated.
However, the implementation of technical labels during this period enabled quicker investigations, allowing us to pinpoint root causes of cost increases more efficiently. Nevertheless, collaboration across teams was limited, as different teams used separate dashboards. The complexity of modifying these dashboards made it difficult to create a unified, actionable view.
Benefits of Our FinOps Investment
Despite the challenges, our overall FinOps journey — including the native tagging effort — delivered significant advantages:
- Informed budget discussions: We were able to have smarter, data-driven conversations about the next year’s budget, taking into account new client growth, data storage needs, and new features.
- Early detection of cost issues: Previously, it could take up to 90 days to detect a 20% cost increase. With weekly tracking and better visibility, we reduced detection time to just two weeks, enabling faster corrective action.
- A cost-saving culture: By introducing cost-saving sprints and integrating cost reviews into our architecture discussions, we fostered a culture where cost implications were considered alongside revenue projections for new features.
- Prepared for due diligence: During the acquisition of Tinyclues, having a detailed understanding of our costs enabled more productive discussions with our CFO and potential buyers. Even though we could track only 70–80% of our costs at that point, we were able to estimate marginal costs per client more accurately.
What Could Have Made This Journey Faster
In hindsight, several things could have accelerated our FinOps journey:
- Tagging progress tracking: A system to monitor untagged components, detect new values, and track trends over time would have saved us from manual checks.
- Clear ownership of tags: Assigning clear responsibility for each tag, and documenting its purpose, would have reduced confusion and streamlined troubleshooting. Teams often struggled to determine who was responsible for specific tags.
- Scalable tools: A more industrialized tool for managing what we now call Virtual Dimensions (our initial SQL query) , reallocating costs, and detecting unused tags would have made the process more efficient.
- Marginal cost visibility: At times, costs increased while marginal costs decreased due to higher client usage. A system to easily track marginal costs alongside real costs would have been invaluable.
- Interactive cost navigation: A tool that allowed us to explore monthly cost changes and identify contributing factors would have reduced investigation times from weeks to minutes.
- Flexible alerts: A dynamic alert system could have helped us catch anomalies and unexpected cost spikes more quickly.
5 Key Lessons Learned
1. Don’t Expect to Fix Everything with a Holistic Native Tag Program
Implementing a full-blown native tag program takes too long and becomes outdated quickly. However, there are shortcuts that can capture 80% of the value of native tags in one go. For example:
- Capturing the dag_id if you’re using Airflow,
- Using a tagging macro in DBT,
- Capturing basic Kubernetes information like namespace and podname.
We’ll share more in an upcoming article on the tricks we’ve discovered.
2. Use a Hybrid Approach for Flexibility and Speed
A hybrid approach, combining native tags with system-generated tags and virtual dimensions, provides far more flexibility. It allows you to start gaining visibility immediately, without waiting for full native tag coverage, and lets you adapt to changes in real time.
3. Building a Cost-Responsibility Culture Takes Time
Creating a culture where cost efficiency is a priority requires consistent effort. Regular cost-saving sprints and holding teams accountable for actual savings, rather than projections, are essential to embedding this mindset across teams.
4. Imperfect Visibility Now Beats Waiting for Perfection
It’s better to start with what you have, even if it’s not perfect. Early visibility will help you identify quick wins and prioritize where native tags are most needed. For example, a service that’s only used by one feature can immediately have its costs attributed to that feature.
5. Don’t Wait for Your CFO to Ask
Take control of your cloud costs before your CFO asks you to. Proactively managing cloud costs takes time, but it’s crucial for long-term success and preparedness.
How Costory Can Help
All the tools and processes I wish we had during our FinOps journey are now part of Costory’s solution. Whether it’s hybrid tagging, marginal cost tracking, interactive analytics, or scalable cost management, Costory provides everything you need to take control of your cloud costs and make smarter financial decisions.