Here is a summary of the 5 most prevalent themes from the Hacker News discussion on log levels:
1. ERROR logs indicate a required human intervention There is broad consensus that an "Error" log entry should signify a condition that requires immediate human attention or action. If no one needs to act on it, it should not be logged as an error. As jillesvangurp states, "Errors mean I get alerted. Zero tolerance on that from my side."
2. Distinguishing between program-level and operational failures Users argue that not every failure within an operation constitutes an error for the overall program. Operational failures (like transient network timeouts) should often be warnings or metrics, rather than errors, unless they represent a systemic failure of the application itself. shadowgovt explains: "Errors trigger pages. Warnings get bundled up into a daily report..."
3. The 'ownership' of the error matters A key debate involves whether the local system is responsible for the failure. If a dependency fails (e.g., a downstream service or database), it is often argued that this should not be logged as a local error if that dependency has its own monitoring. raldi notes that issues like database timeouts or ISEs in downstream services should be handled by metrics rather than error logs. However, others argue that in practice, "downstream service issues... are observed in the error logs of consuming services long before theyβre detected by the owners of the downstream service" (zbentley).
4. Libraries should tread carefully The discussion highlights that libraries lack the context to determine if a failure is actionable for the application. Consequently, libraries should generally avoid logging at high levels (Error/Warn) to prevent noise. ivan_gammel suggests, "Libraries should not log on levels above DEBUG, period," while Too notes that library errors should usually be returned to the caller rather than logged directly.
5. Logs should be actionable and context-rich Regardless of the level chosen, log messages must provide sufficient context to allow for troubleshooting. This means avoiding vague messages and including relevant data (parameters, IDs) that helps identify the root cause. As uniq7 puts it, "The error doesn't need to be extremely specific or point to the actual root cause... 'Failed to read file 'foo/bar.html'' would be acceptable."