Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
What is Observability?
Observability is understanding a system’s internal state by examining its outputs. In software systems, this means understanding what’s happening inside your applications and infrastructure through the data they generate. This field has evolved significantly and can be understood through two distinct generations of observability approaches. The first generation, often called Observability 1.0, was built around the traditional “three pillars” approach of metrics, logs, and traces. This approach required multiple tools and data stores for different types of telemetry. It often forced engineers to pre-define what they wanted to measure, making it costly and complex to maintain multiple systems. Modern observability, or Observability 2.0, takes a fundamentally different approach. It’s based on collecting wide, structured events for each unit of work (e.g., an HTTP request and response) in our system. This approach captures high-cardinality data, such as user IDs, request IDs, Git commit hashes, instance IDs, Kubernetes pod names, specific route parameters, and vendor transaction IDs. A rule of thumb is adding a piece of metadata if it could help us understand how the system behaves. This rich data collection enables dynamic slicing and dicing of data without pre-defining metrics. Teams can derive metrics, traces, and other visualizations from this base data, allowing them to answer complex questions about system behavior that weren’t anticipated when the instrumentation was first added. However, implementing modern observability capabilities presents its challenges. Organizations need reliable ways to collect, process, and export this rich telemetry data across diverse systems and technologies. While modern approaches have evolved beyond traditional boundaries, understanding the fundamental building blocks of observability remains crucial.The three pillars of observability
To better understand how observability has evolved and works in practice, let’s examine the three pillars of observability - logs, metrics, and traces. While modern observability has moved beyond treating these as separate concerns, they remain fundamental concepts for understanding different aspects of system behavior.- Logs - Text-based records of discrete events that occur within a system. These provide detailed context about specific occurrences, errors, and state changes.
- Metrics - Numerical measurements collected over time. These include counters, gauges, and histograms that help track system performance, resource usage, and business KPIs.
- Traces - Records that track the journey of requests as they flow through distributed systems. These help understand the relationships between services and identify performance bottlenecks.
The benefits of observability
While the technical aspects of observability - logs, metrics, and traces - are well understood, the business benefits are equally important to consider. In their book “Observability Engineering” (O’Reilly, 2022), Charity Majors, Liz Fong-Jones, and George Miranda draw from industry research and anecdotal feedback to identify four key business benefits that organizations can expect from implementing proper observability practices. Let’s examine these benefits:Higher incremental revenue
The authors note that observability tools that help teams improve uptime and performance can lead to increased incremental revenue through improved code quality. This manifests in several ways:- Improved customer experience: Fast problem resolution and prevention of service degradation leads to higher customer satisfaction and retention
- Increased system reliability: Better uptime means more successful transactions and fewer lost business opportunities
- Enhanced performance: The ability to identify and optimize performance bottlenecks helps maintain responsive services that keep customers engaged
- Competitive advantage: Organizations that can maintain high service quality through comprehensive monitoring and quick issue resolution often gain an edge over competitors
Cost Savings from faster incident response
One of the most immediate benefits of observability is reduced labor costs through faster detection and resolution of issues. This comes from:- Reduced Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR)
- Improved query response times, enabling faster investigation
- Quicker identification of performance bottlenecks
- Reduced time spent on-call
- Fewer resources wasted on unnecessary rollbacks
Cost savings from incidents avoided
Observability doesn’t just help resolve issues faster - it helps prevent them entirely. The authors emphasize how teams can prevent critical issues by:- Identifying potential problems before they become critical
- Analyzing patterns to prevent recurring issues
- Understanding system behavior under different conditions
- Proactively addressing performance bottlenecks
- Making data-driven decisions about system improvements
Cost savings from decreased employee churn
One of the most overlooked benefits is the impact on team satisfaction and retention. The authors highlight how observability leads to:- Improved job satisfaction through better tooling
- Decreased developer burnout from fewer unresolved issues
- Reduced alert fatigue through better signal-to-noise ratio
- Lower on-call stress due to better incident management
- Increased team confidence in system reliability
“I couldn’t believe it. I actually had to go back a couple of times just to make sure that I was querying it properly… this is coming back too fast. This doesn’t make sense.”As the authors emphasize, while the specific measures of these benefits may vary depending on the tools and implementation, these fundamental improvements can be expected across organizations that adopt robust observability practices. The key is choosing and implementing the right tools effectively to maximize these benefits. Achieving these benefits requires overcoming several significant hurdles. Even organizations that understand the value of observability often find that implementation presents unexpected complexities and challenges that must be carefully navigated.