How to actually see, trust, and operate your Python systems in production - with Grafana, Loki, Promtail, Traefik, Docker Compose, and real-world tradeoffs

Who This Article Is For

This is written for CEOs, CTOs, and engineering leads running real businesses - not DevRel demos. If you have a Django/Python backend, you deploy with Docker Compose, and your team is small-to-mid-sized, this article is for you.

This is explicitly NOT for FAANG-scale infrastructure, teams with dedicated SRE orgs, or people who want to try Kubernetes because it sounds impressive. Observability is about reducing business risk. Full stop.

1. What "Observability" Actually Means

Observability is one of the most abused words in the industry. Let's cut through it.

Observability = Can I answer "what is broken, why, and how bad is it" in under 5 minutes?

Break it down into four simple primitives:

Logs → What happened
Metrics → How bad / how often
Errors → What users are feeling
Alerts → When a human must wake up As a CEO or CTO, the real cost of downtime is never just the server bill. It's lost customer trust, missed revenue, and engineer burnout from firefighting instead of building.

2. The Non-Negotiables of a Robust Django Backend

2.1 Deterministic Deployments

Same code + same config must equal same behavior. "It works on my machine" is a cultural and engineering failure. Docker Compose enforces determinism at the service boundary level, making it the right tool for most teams.

2.2 Visibility Over Cleverness

Prefer boring tools that work at 2 AM. Avoid infrastructure that only one developer fully understands. If your most senior engineer gets hit by a bus, can the rest of the team keep the lights on?

2.3 Human-Readable Failure

When something breaks, the logs and errors need to be readable by the on-call engineer, the CTO, and occasionally even the CEO. If your stack requires a PhD to interpret a failure, it's the wrong stack.

3. Your Logging Stack: Promtail + Loki + Grafana

Most teams start with print() statements or ad-hoc logging. This fails at scale. The modern, lightweight answer for Docker Compose environments is the PLG stack: Promtail, Loki, and Grafana.

3.1 What Each Tool Does

Promtail - The Log Shipper

Promtail runs as a sidecar or Docker container and tails your application's log files or Docker log streams. It ships log lines to Loki with labels attached, like service name, environment, and container ID. Think of it as your log collector that runs silently in the background.

# promtail-config.yaml
scrape_configs:
  - job_name: django
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        target_label: container

Loki - The Log Aggregation Engine

Loki is Grafana Labs' answer to Elasticsearch - but designed specifically for logs, not full-text search. It indexes only the labels (metadata), not the full log content, making it dramatically cheaper and faster for small to mid-sized teams.

You do not need to run Elasticsearch, Logstash, or Kibana (the ELK stack). Loki does the same job for a fraction of the operational cost and complexity.

Key insight: Loki is to logs what Prometheus is to metrics. It speaks the same query language family (LogQL vs PromQL) and integrates natively into Grafana.

Grafana - The Visualization Layer

Grafana is your single pane of glass. It connects to Loki for logs, Prometheus for metrics, and can even display Sentry error counts - all in one dashboard. For Django teams, this means you finally have one place to look when something goes wrong.

Example Grafana setup in Docker Compose:

services:
  loki:
    image: grafana/loki:2.9.0
    ports: ["3100:3100"]
    volumes:
      - ./loki-config.yaml:/etc/loki/config.yaml


  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./promtail-config.yaml:/etc/promtail/config.yaml


  grafana:
    image: grafana/grafana:10.0.0
    ports: ["3000:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=yourpassword

3.2 What to Log in Django

Use structured JSON logging. Random print() calls and unformatted strings become noise at scale. Every log entry should include at minimum:

timestamp - ISO 8601 format
request_id / correlation_id - for tracing a request across services
service name - which Django service or worker emitted this
environment - staging vs production
severity - DEBUG, INFO, WARNING, ERROR, CRITICAL Never log secrets, tokens, API keys, or PII without masking. This is both a security issue and a compliance issue.

3.3 Structured Logging Setup in Django

# settings.py
LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "json": {
            "()": "pythonjsonlogger.jsonlogger.JsonFormatter",
            "format": "%(asctime)s %(name)s %(levelname)s %(message)s"
        }
    },
    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
            "formatter": "json"
        }
    },
    "root": { "handlers": ["console"], "level": "INFO" }
}

4. Reverse Proxying: Traefik vs Nginx

When you run multiple services in Docker Compose - a Django web app, a Celery worker dashboard, Grafana, etc. - you need a reverse proxy to route traffic and handle TLS. The two main options are Nginx and Traefik.

4.1 Nginx - The Reliable Veteran

Nginx has been the standard reverse proxy for over a decade. It's battle-tested, well-documented, and every developer has seen it before. For Django, it typically sits in front of Gunicorn and handles static files, SSL termination, and rate limiting.

The limitation: Nginx configuration is static. Every time you add a new service or change a port, you need to manually update the nginx.conf and reload. For small, stable setups this is fine.

# nginx.conf example for Django + Gunicorn
server {
    listen 80;
    server_name yourdomain.com;


    location /static/ {
        alias /app/static/;
    }


    location / {
        proxy_pass http://web:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

4.2 Traefik - The Container-Native Proxy

Traefik is designed specifically for dynamic containerized environments. Instead of a static config file, Traefik reads Docker labels from your containers and automatically configures routing. Add a new service, label it, and Traefik routes to it - no config reload required.

For teams using Docker Compose, Traefik offers three major advantages over Nginx:

Automatic TLS via Let's Encrypt - zero manual certificate management
Docker-native service discovery - routes update automatically as containers start and stop
Built-in dashboard - a simple UI to see all active routes and health checks

# docker-compose.yml with Traefik
services:
  traefik:
    image: traefik:v2.10
    command:
      - "--providers.docker=true"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.le.acme.email=you@company.com"
      - "--certificatesresolvers.le.acme.storage=/certs/acme.json"
    ports: ["80:80", "443:443"]
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./certs:/certs


  web:
    image: yourdjangapp:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.web.rule=Host(`yourdomain.com`)"
      - "traefik.http.routers.web.tls.certresolver=le"

4.3 Which Should You Choose?

Use Nginx if your service topology is stable and your team already knows it. The configuration is explicit and easy to audit.

Use Traefik if you run multiple services that change frequently, want zero-touch TLS certificate management, or want a dashboard showing your live routing configuration. For teams adopting a PLG stack (Promtail + Loki + Grafana), Traefik fits more naturally because the whole stack leans into container-native tooling.

Our recommendation: Traefik for new setups running Docker Compose with multiple services. Nginx for simple single-service Django apps where you want maximum control and familiarity.

5. Error Tracking That Engineers Actually Check

5.1 Why Silent Failures Kill Businesses

Errors that don't crash your server still kill revenue. A user who hits a broken checkout flow, a payment that fails silently, a background job that stops processing - none of these necessarily trigger an alert in a naive setup. "Users complained" is not observability.

5.2 Sentry or GlitchTip

Sentry is the industry standard for error tracking. It captures full stack traces, request context, user impact counts, and environment tags (staging vs production). GlitchTip is the open-source self-hosted alternative with a compatible API, making it a viable option for teams with data residency requirements.

For Django, setup is three lines:

# pip install sentry-sdk
import sentry_sdk
sentry_sdk.init(
    dsn="https://your-dsn@sentry.io/project",
    environment="production",
    traces_sample_rate=0.1,
)

5.3 Alert Fatigue is Worse Than No Alerts

Most teams disable alerts within weeks because they fire too often. The solution is tuning, not silence. Set error rate thresholds rather than individual error counts. Alert on new errors that haven't been seen before. Build in regression detection so old bugs that re-emerge get flagged.

6. Metrics That Matter

6.1 The Only Metrics That Actually Count

Vanity dashboards are a morale boost, not a business tool. The metrics that matter to a CEO or CTO are simple:

Error rate - percentage of requests returning 5xx
Response time - p50, p95, p99 latency
Uptime - are users able to reach the service
Queue backlog - are background jobs keeping up
Failed background jobs - Celery or RQ task failure rate

6.2 Django + Celery Visibility

One of the most common blind spots for Django teams is background worker health. A growing Celery queue is a red alert that often goes unnoticed until a customer complains. Add Flower (the Celery monitoring dashboard) behind Traefik or Nginx, and route it to Grafana via a Prometheus exporter for metrics.

7. Alerts: Waking Humans Only When It Matters

7.1 Alert Channels by Severity

Not every problem deserves to wake someone up at 3 AM. Build a tiered alert system:

Low severity → logs only, review in the morning
Medium severity → Slack notification to the engineering channel
High severity → Slack + email to the on-call engineer
Critical → Slack + email + phone call (Twilio or PagerDuty)

7.2 Every Alert Must Answer Three Questions

If an alert doesn't answer all three of the following, it shouldn't be an alert:

What broke?
How bad is it?
What should I do right now? If the answer to "what should I do right now" is "nothing", that's not an alert - it's a log entry.

8. Docker Compose: Enough for Most Companies

8.1 What Docker Compose Does Well

Predictable environments, easy developer onboarding, clear service boundaries, and minimal operational overhead. For a team running a Django web service, one or two Celery workers, a scheduler, Redis, and PostgreSQL - Docker Compose handles this elegantly.

8.2 Production-Grade Compose Structure

A well-structured production Compose file separates concerns cleanly:

services:
  web:
    image: yourapp:latest
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
      interval: 30s
      retries: 3


  worker:
    image: yourapp:latest
    command: celery -A yourapp worker -l info
    restart: always


  scheduler:
    image: yourapp:latest
    command: celery -A yourapp beat -l info
    restart: always

  #
  **## It's preferable to separate monitoring stack in a separate compose file.**
  #
  loki:
    image: grafana/loki:2.9.0


  promtail:
    image: grafana/promtail:2.9.0


  grafana:
    image: grafana/grafana:10.0.0


  traefik:
    image: traefik:v2.10
   
   ##

9. Why Most Companies Do NOT Need Kubernetes

This is the section DevRel engineers don't want you to read.

9.1 Kubernetes Solves Organizational Problems, Not Code Problems

Kubernetes was designed for large teams, multiple deployment pipelines, and dedicated infrastructure roles. If you have fewer than 15 engineers and no dedicated SRE, Kubernetes will cost you more than it saves - in hiring, cognitive load, and slower delivery velocity.

9.2 The Hidden Costs CEOs Never See

Hiring cost - SREs who can operate Kubernetes are expensive and scarce
Cognitive load - every developer on the team now needs to understand Kubernetes concepts to debug production issues
Debugging complexity - simple issues become multi-layer detective work
Slower delivery - more infra to manage means less time shipping features

9.3 Resume-Driven Architecture

Developer motivation and business outcomes are not always aligned. Kubernetes is exciting to work on and looks great on a CV. Docker Compose is boring and doesn't. Boring infrastructure that works at 2 AM is what your business actually needs.

9.4 When Kubernetes Actually Makes Sense

Multiple teams deploying independently to the same infrastructure
High and unpredictable traffic volatility requiring auto-scaling
Compliance constraints requiring fine-grained workload isolation
Platform-level companies whose product IS the infrastructure

10. Observability as Business Insurance

You don't buy fire insurance hoping your office burns down. You buy it because the cost of not having it is catastrophic when things go wrong - and things always go wrong.

A good observability stack reduces Mean Time To Recovery (MTTR), eliminates the panic that compounds outages, ends hero culture (where only one person knows how to fix things), and lets engineers sleep at night.

Calm systems build calm teams. Calm teams build better products.

11. Final Checklist: Is My Django Backend Actually Observable?

Structured JSON logs with request_id, service name, and environment
Centralized log access via Loki + Grafana (not SSH-ing into servers)
Error tracking with Sentry or GlitchTip, properly tuned alerts
Background job visibility - Celery/RQ queue depth and failure rate
Real alerts that answer What/How Bad/What To Do - not noise
Predictable deployments via Docker Compose with health checks
Reverse proxy (Traefik or Nginx) handling TLS and routing
No infrastructure that only one person fully understands

12. A Note on Fractional CTO Work

If your team has Django services in production and you're not 100% sure what happens when things go wrong - how long it takes to detect, who gets notified, and how quickly you recover - that uncertainty is costing you money.

This is exactly where fractional CTO work pays for itself. Not in writing code, but in making sure the system you've built is actually observable, resilient, and owned by the whole team rather than one heroic individual.

Good infrastructure is invisible. You only notice it when it's missing.

How to actually see, trust, and operate your Python systems in production - with Grafana, Loki, Promtail, Traefik, Docker Compose, and real-world tradeoffs

Who This Article Is For

1. What "Observability" Actually Means

Observability is one of the most abused words in the industry. Let's cut through it.

Observability = Can I answer "what is broken, why, and how bad is it" in under 5 minutes?

Break it down into four simple primitives:

Logs → What happened
Metrics → How bad / how often
Errors → What users are feeling
Alerts → When a human must wake up As a CEO or CTO, the real cost of downtime is never just the server bill. It's lost customer trust, missed revenue, and engineer burnout from firefighting instead of building.

2. The Non-Negotiables of a Robust Django Backend

2.1 Deterministic Deployments

2.2 Visibility Over Cleverness

Prefer boring tools that work at 2 AM. Avoid infrastructure that only one developer fully understands. If your most senior engineer gets hit by a bus, can the rest of the team keep the lights on?

2.3 Human-Readable Failure

3. Your Logging Stack: Promtail + Loki + Grafana

Most teams start with print() statements or ad-hoc logging. This fails at scale. The modern, lightweight answer for Docker Compose environments is the PLG stack: Promtail, Loki, and Grafana.

3.1 What Each Tool Does

Promtail - The Log Shipper

# promtail-config.yaml
scrape_configs:
  - job_name: django
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        target_label: container

Loki - The Log Aggregation Engine

You do not need to run Elasticsearch, Logstash, or Kibana (the ELK stack). Loki does the same job for a fraction of the operational cost and complexity.

Key insight: Loki is to logs what Prometheus is to metrics. It speaks the same query language family (LogQL vs PromQL) and integrates natively into Grafana.

Grafana - The Visualization Layer

Example Grafana setup in Docker Compose:

services:
  loki:
    image: grafana/loki:2.9.0
    ports: ["3100:3100"]
    volumes:
      - ./loki-config.yaml:/etc/loki/config.yaml


  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./promtail-config.yaml:/etc/promtail/config.yaml


  grafana:
    image: grafana/grafana:10.0.0
    ports: ["3000:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=yourpassword

3.2 What to Log in Django

Use structured JSON logging. Random print() calls and unformatted strings become noise at scale. Every log entry should include at minimum:

timestamp - ISO 8601 format
request_id / correlation_id - for tracing a request across services
service name - which Django service or worker emitted this
environment - staging vs production
severity - DEBUG, INFO, WARNING, ERROR, CRITICAL Never log secrets, tokens, API keys, or PII without masking. This is both a security issue and a compliance issue.

3.3 Structured Logging Setup in Django

# settings.py
LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "json": {
            "()": "pythonjsonlogger.jsonlogger.JsonFormatter",
            "format": "%(asctime)s %(name)s %(levelname)s %(message)s"
        }
    },
    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
            "formatter": "json"
        }
    },
    "root": { "handlers": ["console"], "level": "INFO" }
}

4. Reverse Proxying: Traefik vs Nginx

4.1 Nginx - The Reliable Veteran

The limitation: Nginx configuration is static. Every time you add a new service or change a port, you need to manually update the nginx.conf and reload. For small, stable setups this is fine.

# nginx.conf example for Django + Gunicorn
server {
    listen 80;
    server_name yourdomain.com;


    location /static/ {
        alias /app/static/;
    }


    location / {
        proxy_pass http://web:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

4.2 Traefik - The Container-Native Proxy

For teams using Docker Compose, Traefik offers three major advantages over Nginx:

Automatic TLS via Let's Encrypt - zero manual certificate management
Docker-native service discovery - routes update automatically as containers start and stop
Built-in dashboard - a simple UI to see all active routes and health checks

# docker-compose.yml with Traefik
services:
  traefik:
    image: traefik:v2.10
    command:
      - "--providers.docker=true"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.le.acme.email=you@company.com"
      - "--certificatesresolvers.le.acme.storage=/certs/acme.json"
    ports: ["80:80", "443:443"]
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./certs:/certs


  web:
    image: yourdjangapp:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.web.rule=Host(`yourdomain.com`)"
      - "traefik.http.routers.web.tls.certresolver=le"

4.3 Which Should You Choose?

Use Nginx if your service topology is stable and your team already knows it. The configuration is explicit and easy to audit.

Our recommendation: Traefik for new setups running Docker Compose with multiple services. Nginx for simple single-service Django apps where you want maximum control and familiarity.

5. Error Tracking That Engineers Actually Check

5.1 Why Silent Failures Kill Businesses

5.2 Sentry or GlitchTip

For Django, setup is three lines:

# pip install sentry-sdk
import sentry_sdk
sentry_sdk.init(
    dsn="https://your-dsn@sentry.io/project",
    environment="production",
    traces_sample_rate=0.1,
)

5.3 Alert Fatigue is Worse Than No Alerts

6. Metrics That Matter

6.1 The Only Metrics That Actually Count

Vanity dashboards are a morale boost, not a business tool. The metrics that matter to a CEO or CTO are simple:

Error rate - percentage of requests returning 5xx
Response time - p50, p95, p99 latency
Uptime - are users able to reach the service
Queue backlog - are background jobs keeping up
Failed background jobs - Celery or RQ task failure rate

6.2 Django + Celery Visibility

7. Alerts: Waking Humans Only When It Matters

7.1 Alert Channels by Severity

Not every problem deserves to wake someone up at 3 AM. Build a tiered alert system:

Low severity → logs only, review in the morning
Medium severity → Slack notification to the engineering channel
High severity → Slack + email to the on-call engineer
Critical → Slack + email + phone call (Twilio or PagerDuty)

7.2 Every Alert Must Answer Three Questions

If an alert doesn't answer all three of the following, it shouldn't be an alert:

What broke?
How bad is it?
What should I do right now? If the answer to "what should I do right now" is "nothing", that's not an alert - it's a log entry.

8. Docker Compose: Enough for Most Companies

8.1 What Docker Compose Does Well

8.2 Production-Grade Compose Structure

A well-structured production Compose file separates concerns cleanly:

services:
  web:
    image: yourapp:latest
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
      interval: 30s
      retries: 3


  worker:
    image: yourapp:latest
    command: celery -A yourapp worker -l info
    restart: always


  scheduler:
    image: yourapp:latest
    command: celery -A yourapp beat -l info
    restart: always

  #
  **## It's preferable to separate monitoring stack in a separate compose file.**
  #
  loki:
    image: grafana/loki:2.9.0


  promtail:
    image: grafana/promtail:2.9.0


  grafana:
    image: grafana/grafana:10.0.0


  traefik:
    image: traefik:v2.10
   
   ##

9. Why Most Companies Do NOT Need Kubernetes

This is the section DevRel engineers don't want you to read.

9.1 Kubernetes Solves Organizational Problems, Not Code Problems

9.2 The Hidden Costs CEOs Never See

Hiring cost - SREs who can operate Kubernetes are expensive and scarce
Cognitive load - every developer on the team now needs to understand Kubernetes concepts to debug production issues
Debugging complexity - simple issues become multi-layer detective work
Slower delivery - more infra to manage means less time shipping features

9.3 Resume-Driven Architecture

9.4 When Kubernetes Actually Makes Sense

Multiple teams deploying independently to the same infrastructure
High and unpredictable traffic volatility requiring auto-scaling
Compliance constraints requiring fine-grained workload isolation
Platform-level companies whose product IS the infrastructure

10. Observability as Business Insurance

You don't buy fire insurance hoping your office burns down. You buy it because the cost of not having it is catastrophic when things go wrong - and things always go wrong.

Calm systems build calm teams. Calm teams build better products.

11. Final Checklist: Is My Django Backend Actually Observable?

Structured JSON logs with request_id, service name, and environment
Centralized log access via Loki + Grafana (not SSH-ing into servers)
Error tracking with Sentry or GlitchTip, properly tuned alerts
Background job visibility - Celery/RQ queue depth and failure rate
Real alerts that answer What/How Bad/What To Do - not noise
Predictable deployments via Docker Compose with health checks
Reverse proxy (Traefik or Nginx) handling TLS and routing
No infrastructure that only one person fully understands

12. A Note on Fractional CTO Work

Good infrastructure is invisible. You only notice it when it's missing.

Practical Observability for Django Backends

Who This Article Is For

1. What "Observability" Actually Means

2. The Non-Negotiables of a Robust Django Backend

2.1 Deterministic Deployments

2.2 Visibility Over Cleverness

2.3 Human-Readable Failure

3. Your Logging Stack: Promtail + Loki + Grafana

3.1 What Each Tool Does

Promtail - The Log Shipper

Loki - The Log Aggregation Engine

Grafana - The Visualization Layer

3.2 What to Log in Django

3.3 Structured Logging Setup in Django

4. Reverse Proxying: Traefik vs Nginx

4.1 Nginx - The Reliable Veteran

4.2 Traefik - The Container-Native Proxy

4.3 Which Should You Choose?

5. Error Tracking That Engineers Actually Check

5.1 Why Silent Failures Kill Businesses

5.2 Sentry or GlitchTip

5.3 Alert Fatigue is Worse Than No Alerts

6. Metrics That Matter

6.1 The Only Metrics That Actually Count

6.2 Django + Celery Visibility

7. Alerts: Waking Humans Only When It Matters

7.1 Alert Channels by Severity

7.2 Every Alert Must Answer Three Questions

8. Docker Compose: Enough for Most Companies

8.1 What Docker Compose Does Well

8.2 Production-Grade Compose Structure

9. Why Most Companies Do NOT Need Kubernetes

9.1 Kubernetes Solves Organizational Problems, Not Code Problems

9.2 The Hidden Costs CEOs Never See

9.3 Resume-Driven Architecture

9.4 When Kubernetes Actually Makes Sense

10. Observability as Business Insurance

11. Final Checklist: Is My Django Backend Actually Observable?

12. A Note on Fractional CTO Work

Practical Observability for Django Backends

Who This Article Is For

1. What "Observability" Actually Means

2. The Non-Negotiables of a Robust Django Backend

2.1 Deterministic Deployments

2.2 Visibility Over Cleverness

2.3 Human-Readable Failure

3. Your Logging Stack: Promtail + Loki + Grafana

3.1 What Each Tool Does

Promtail - The Log Shipper

Loki - The Log Aggregation Engine

Grafana - The Visualization Layer

3.2 What to Log in Django

3.3 Structured Logging Setup in Django

4. Reverse Proxying: Traefik vs Nginx

4.1 Nginx - The Reliable Veteran

4.2 Traefik - The Container-Native Proxy

4.3 Which Should You Choose?

5. Error Tracking That Engineers Actually Check

5.1 Why Silent Failures Kill Businesses

5.2 Sentry or GlitchTip

5.3 Alert Fatigue is Worse Than No Alerts

6. Metrics That Matter

6.1 The Only Metrics That Actually Count

6.2 Django + Celery Visibility

7. Alerts: Waking Humans Only When It Matters

7.1 Alert Channels by Severity

7.2 Every Alert Must Answer Three Questions

8. Docker Compose: Enough for Most Companies

8.1 What Docker Compose Does Well

8.2 Production-Grade Compose Structure

9. Why Most Companies Do NOT Need Kubernetes

9.1 Kubernetes Solves Organizational Problems, Not Code Problems

9.2 The Hidden Costs CEOs Never See

9.3 Resume-Driven Architecture

9.4 When Kubernetes Actually Makes Sense

10. Observability as Business Insurance

11. Final Checklist: Is My Django Backend Actually Observable?

12. A Note on Fractional CTO Work