Skip to Content

Observability

Observability is the ability to understand the internal state of a system from its external outputs. The three pillars are logs, metrics, and traces – together enabling deep insights into distributed systems.

What Is Observability?

Observability is the ability to understand and diagnose the internal state of an IT system from its external outputs (logs, metrics, traces). While traditional monitoring tells you THAT something is not working, observability helps you understand WHY it is not working.

For mid-market companies increasingly running distributed systems, microservices, and cloud infrastructure, observability is indispensable. Without it, troubleshooting in complex systems is like searching for a needle in a haystack.

The Three Pillars of Observability

Logs

Logs are timestamped records of discrete events. They tell the story of what happened in your system: an error message, a successful login, a failed database access. Structured logs (JSON format) are significantly more searchable than unstructured text logs.

Practical example: A logistics company identifies from aggregated logs that order processing slows down every Monday at 9 AM for 15 minutes – exactly when the weekly import of legacy ERP data runs.

Metrics

Metrics are numerical values describing your system's state at a specific point in time: CPU utilisation, response time, error rate, number of active connections. Unlike logs, metrics are aggregated and require little storage, making them ideal for long-term trend analysis.

Essential metrics for any application: requests per second (throughput), response time (latency), error rate, and saturation. These four signals – known as Google's "Four Golden Signals" – uncover most problems.

Traces (Distributed Tracing)

Traces follow a single request through multiple services. In a microservices system, a single API call can trigger ten or more internal service calls. A trace shows which service took how long and where bottlenecks exist.

Practical example: An e-commerce company discovers that the product page takes 3 seconds to load. The trace reveals that 2.5 seconds are spent on a slow recommendation service being called sequentially rather than in parallel.

Observability vs. Monitoring

Monitoring and observability are often confused but complement each other:

  • Monitoring: You define in advance what to observe (e.g., CPU > 80% → alert). Monitoring answers known questions.
  • Observability: You can ask any question of your system, including ones you did not anticipate when setting up. Observability answers unknown questions.

Monitoring is a subset of observability. Good monitoring is necessary but not sufficient for true observability.

Observability Tools and Stacks

  • Open-Source Stack: Prometheus (metrics) + Grafana (dashboards) + Loki (logs) + Tempo (traces). Cost-effective but with operational overhead.
  • ELK/EFK Stack: Elasticsearch + Logstash/Fluentd + Kibana. Strong for log analysis but resource-intensive.
  • Cloud-native: AWS CloudWatch, Azure Monitor, Google Cloud Operations. Well-integrated but vendor-locked.
  • SaaS Solutions: Datadog, New Relic, Dynatrace. Comprehensive but more expensive with growing data volume.
  • OpenTelemetry: Open standard for telemetry data. Enables implementing instrumentation once and sending to various backends.

Recommendation for the Mittelstand

For getting started, we recommend the Prometheus + Grafana + Loki stack. It is open source, well-documented, and covers metrics, logs, and dashboards. For distributed tracing, add Tempo or Jaeger. If the team is small and operational overhead needs to be minimised, a SaaS solution like Datadog can be the better choice despite higher costs.

Frequently asked questions about Observability

Monitoring answers predefined questions (e.g., "Is CPU above 80%?"). Observability enables asking any question of your system – including ones you had not considered. Monitoring tells you something is broken; observability helps understand why.

For getting started, we recommend Prometheus + Grafana + Loki (open source, free). For comprehensive observability without operational overhead, SaaS solutions like Datadog or New Relic are an option but cost $500–$5,000/month depending on data volume.

The three pillars are logs (discrete events as text or JSON), metrics (numerical time-series data like CPU utilisation), and traces (tracking a request through distributed services). Together, they provide a complete picture of system state.

Yes, monoliths also benefit from observability. Metrics and logs help with performance analysis and troubleshooting. Distributed tracing becomes most important with microservices, but even a monolith has external dependencies (databases, APIs) that should be traced.

Interested?

Let's talk about your project. We're happy to advise you with no obligation.

Contact us

Last updated: April 2026