Log level 'error' should mean that something needs to be fixed

📝 Discussion Summary (Click to expand)

Here is a summary of the 5 most prevalent themes from the Hacker News discussion on log levels:

1. ERROR logs indicate a required human intervention There is broad consensus that an "Error" log entry should signify a condition that requires immediate human attention or action. If no one needs to act on it, it should not be logged as an error. As jillesvangurp states, "Errors mean I get alerted. Zero tolerance on that from my side."

2. Distinguishing between program-level and operational failures Users argue that not every failure within an operation constitutes an error for the overall program. Operational failures (like transient network timeouts) should often be warnings or metrics, rather than errors, unless they represent a systemic failure of the application itself. shadowgovt explains: "Errors trigger pages. Warnings get bundled up into a daily report..."

3. The 'ownership' of the error matters A key debate involves whether the local system is responsible for the failure. If a dependency fails (e.g., a downstream service or database), it is often argued that this should not be logged as a local error if that dependency has its own monitoring. raldi notes that issues like database timeouts or ISEs in downstream services should be handled by metrics rather than error logs. However, others argue that in practice, "downstream service issues... are observed in the error logs of consuming services long before they’re detected by the owners of the downstream service" (zbentley).

4. Libraries should tread carefully The discussion highlights that libraries lack the context to determine if a failure is actionable for the application. Consequently, libraries should generally avoid logging at high levels (Error/Warn) to prevent noise. ivan_gammel suggests, "Libraries should not log on levels above DEBUG, period," while Too notes that library errors should usually be returned to the caller rather than logged directly.

5. Logs should be actionable and context-rich Regardless of the level chosen, log messages must provide sufficient context to allow for troubleshooting. This means avoiding vague messages and including relevant data (parameters, IDs) that helps identify the root cause. As uniq7 puts it, "The error doesn't need to be extremely specific or point to the actual root cause... 'Failed to read file 'foo/bar.html'' would be acceptable."

🚀 Project Ideas

Error/Warning Log Classifier

Summary

A tool that analyzes application logs and automatically reclassifies log events based on a configurable policy, moving events like "database timeout" or "downstream 5xx" from ERROR to WARNING levels.
Core value proposition: Enforces a disciplined logging policy to reduce alert fatigue by ensuring only truly actionable errors trigger pages, while still recording potential issues for review.

Key	Value
Target Audience	DevOps engineers, SREs, and development teams managing microservices with centralized logging.
Core Feature	A log processor (sidecar or pipeline plugin) that parses structured logs, applies user-defined rules (e.g., "if source is DB layer, downgrade to WARN"), and outputs re-leveled logs to the destination.
Tech Stack	Go (for performance), OpenTelemetry Collector SDK, or a Rust-based log shipper (Vector, Fluentd).
Difficulty	Medium
Monetization	Revenue-ready: SaaS tier per GB processed; Enterprise self-hosted license.

Notes

HN users like raldi and shadowgovt explicitly defined rules for downgrading "non-defects" (database timeouts, network errors) from ERROR to WARNING. zbentley countered that downstream failures often need urgent attention in practice.
The tool addresses the gap between theoretical logging discipline and the reality of organizational inertia where "backend service operators don't prioritize monitoring."

Local Context Enricher

Summary

A library wrapper or middleware that intercepts low-level library errors and enriches them with application context before logging, converting generic "Operation failed" logs into "Failed to process order ID 12345" errors.
Core value proposition: Solves the "low-level code doesn't have context" problem by injecting business logic context (e.g., user ID, transaction state) into logs, making them actionable without requiring the library author to guess the severity.

Key	Value
Target Audience	Backend developers building services that rely heavily on third-party libraries (DB clients, HTTP clients).
Core Feature	A context manager that wraps library calls, captures exceptions/codes, and constructs a structured log entry combining library metadata with application state defined by the developer.
Tech Stack	Language-specific APM agents (OpenTelemetry SDKs), middleware frameworks (Express, Gin, Django).
Difficulty	Low
Monetization	Hobby (Open Source) or included in a broader observability platform.

Notes

electroly highlighted the issue: "The low-level code might not know how the higher-level caller will classify a particular error."
layer8 argued that "only 'top-level' code" can properly classify errors, but this makes it hard to identify root causes. This tool bridges that gap by letting top-level code define how to present low-level errors.

Actionable Error Context Schema

Summary

A standardized logging schema/specification (and validation tool) that requires every ERROR log to include an action field (e.g., "Check DB replication status," "Verify API key rotation").
Core value proposition: Enforces rwmj's rule that "all your error messages [should be] actionable." If a developer writes an error log without defining an action, the CI pipeline fails.

Key	Value
Target Audience	Software development teams enforcing coding standards and reducing Mean Time To Resolution (MTTR).
Core Feature	A linter or test-suite plugin that scans source code for log calls at the ERROR level and requires a corresponding `action` metadata field in the log payload.
Tech Stack	AST parsing tools (Tree-sitter), IDE plugins (VSCode, IntelliJ), CI/CD integrations.
Difficulty	Low
Monetization	Hobby (Open Source Plugin).

Notes

rwmj specifically requested: "make all your error messages actionable. By that I mean it should tell me what action to take to fix the error."
magicalhippo emphasized including specific data (filenames, config options) to make debugging possible. This tool formalizes that requirement structurally rather than relying on developer discipline.

Intelligent Log Triage Dashboard

Summary

A dashboard that aggregates logs and applies statistical analysis to determine the "noise floor" of specific ERROR messages, automatically downgrading high-frequency, non-actionable errors to WARNING status in the UI.
Core value proposition: Addresses the "boy who cried wolf" problem (HarHarVeryFunny) where true errors are lost in noise. It allows teams to set thresholds (e.g., "If this DB timeout happens 50 times a minute, treat it as a metric, not an error").

Key	Value
Target Audience	SRE teams managing high-throughput distributed systems.
Core Feature	Anomaly detection on log streams; dynamic UI filtering that hides or demotes "expected" errors based on historical data and current system state.
Tech Stack	ELK Stack (Elasticsearch, Logstash, Kibana) plugins, or a custom React frontend over a time-series database (Prometheus/Loki).
Difficulty	High
Monetization	Revenue-ready: Enterprise Observability Platform feature.

Notes

makeitdouble argued that sparse timeouts (10-20% increase) are critical and need logging, whereas raldi suggested metrics suffice. This tool provides the middle ground: logging the events but using statistical analysis to decide when they represent a real incident.
eterm suggested that warnings should be recoverable without intervention, but errors imply an assumption didn't hold. This tool helps visualize when those "unrecoverable" errors are actually just noise.

Library Logging Boundary Proxy

Summary

A developer tool/library that audits and restricts logging from third-party dependencies, ensuring libraries only output DEBUG/TRACE level logs by default, and intercepting higher-level logs to be re-evaluated by the main application.
Core value proposition: Solves the frustration expressed by ivan_gammel and Too that "libraries should not log on levels above DEBUG," preventing transitive dependencies from spamming ERROR logs with issues the main application might consider recoverable.

Key	Value
Target Audience	Developers using complex dependency trees (e.g., Java/Spring, Python/Django) where library noise is a major issue.
Core Feature	A logging facade wrapper that sits between dependencies and the standard logger, stripping or demoting log entries based on a registry of known "noisy" libraries or generic rules.
Tech Stack	Java Agent (ByteBuddy), Python monkey-patching, or Rust crate features.
Difficulty	Medium
Monetization	Hobby (Open Source) or part of a static analysis suite.

Notes

ivan_gammel stated: "Libraries should not log on levels above DEBUG, period."
Too argued that "it's almost always wrong for library functions to log anything" because the caller has context the library lacks.
This tool provides a technical enforcement mechanism for that philosophy, giving the application owner control over the library's voice.

Log level 'error' should mean that something needs to be fixed

🚀 Project Ideas

Error/Warning Log Classifier

Summary

Notes

Local Context Enricher

Summary

Notes

Actionable Error Context Schema

Summary

Notes

Intelligent Log Triage Dashboard

Summary

Notes

Library Logging Boundary Proxy

Summary

Notes

Read Later