39 KiB
Error and Event Logging
Author: Patrick Williams <stwcx>
Other contributors:
Created: May 16, 2024
Problem Description
There is currently not a consistent end-to-end error and event reporting design for the OpenBMC code stack. There are two different implementations, one primarily using phosphor-logging and one using rsyslog, both of which have gaps that a complete solution should address. This proposal is intended to be an end-to-end design handling both errors and tracing events which facilitate external management of the system in an automated and maintainable manner.
Background and References
Redfish LogEntry and Message Registry
In Redfish, the LogEntry schema is used for a range of items that
could be considered "logs", but one such use within OpenBMC is for an equivalent
of the IPMI "System Event Log (SEL)".
The IPMI SEL is the location where the BMC can collect errors and events,
sometimes coming from other entities, such as the BIOS. Examples of these might
be "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful".
These SEL records are exposed as human readable strings, either natively by a
OEM SEL design or by tools such as ipmitool, which are typically unique to
each system or manufacturer, and could hypothethically change with a BMC or
firmware update, and are thus difficult to create automated tooling around. Two
different vendors might use different strings to represent a critical
temperature threshold exceeded: "temperature threshold exceeded"
and "Temperature #0x30 Upper Critical going high". There is
also no mechanism with IPMI to ask the machine "what are all of the SELs you
might create".
In order to solve two aspects of this problem, listing of possible events and versioning, Redfish has Message Registries. A message registry is a versioned collection of all of the error events that a system could generate and hints as to how they might be parsed and displayed to a user. An informative reference from the DMTF gives this example:
{
"@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry",
"Id": "Alert.1.0.0",
"RegistryPrefix": "Alert",
"RegistryVersion": "1.0.0",
"Messages": {
"LanDisconnect": {
"Description": "A LAN Disconnect on %1 was detected on system %2.",
"Message": "A LAN Disconnect on %1 was detected on system %2.",
"Severity": "Warning",
"NumberOfArgs": 2,
"Resolution": "None"
}
}
}
This example defines an event, Alert.1.0.LanDisconnect, which can record the
disconnect state of a network device and contains placeholders for the affected
device and system. When this event occurs, there might be a LogEntry recorded
containing something like:
{
"Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.",
"MessageId": "Alert.1.0.LanDisconnect",
"MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"]
}
The Message contains a human readable string which was created by applying the
MessageArgs to the placeholders from the Message field in the registry.
System management software can rely on the message registry (referenced from the
MessageId field in the LogEntry) and MessageArgs to avoid needing to
perform string processing for reacting to the event.
Within OpenBMC, there is currently a limited design for this
Redfish feature and it requires inserting specially formed Redfish-specific
logging messages into any application that wants to record these events, tightly
coupling all applications to the Redfish implementation. It has also been
observed that these strings, when used, are often out of date
with the message registry advertised by bmcweb. Some
maintainers have rejected adding new Redfish-specific logging messages to their
applications.
Existing phosphor-logging implementation
Note: While the word 'exception' is used in this section, the existing (and
proposed) types can be used by applications and execution contexts with
exceptions disabled. They are 'exceptions' because they do inherit from
std::exception and there is support in the sdbusplus bindings for them to be
used in exception handling.
The sdbusplus bindings have the capability to define new C++ exception types
which can be thrown by a DBus server and turned into an error response to the
client. phosphor-logging extended this to also add metadata associated to the
log type. See the following example error definitions and usages.
sdbusplus error binding definition (in
xyz/openbmc_project/Certs.errors.yaml):
- name: InvalidCertificate
description: Invalid certificate file.
phosphor-logging metadata definition (in
xyz/openbmc_project/Certs.metadata.yaml):
- name: InvalidCertificate
meta:
- str: "REASON=%s"
type: string
Application code reporting an error:
elog<InvalidCertificate>(Reason("Invalid certificate file format"));
// or
report<InvalidCertificate>(Reason("Existing certificate file is corrupted"));
In this sample, an error named
xyz.openbmc_project.Certs.Error.InvalidCertificate has been defined, which can
be sent between applications as a DBus response. The InvalidCertificate is
expected to have additional metadata REASON which is a string. The two APIs
elog and report have slightly different behaviors: elog throws an
exception which can either result in an error DBus result or be handled
elsewhere in the application, while report sends the event directly to
phosphor-logging's daemon for recording. As a side-effect of both calls, the
metadata is inserted into the systemd journal.
When an error is sent to the phosphor-logging daemon, it will:
- Search back through the journal for recorded metadata associated with the event (this is a relative slow operation).
- Create an
xyz.openbmc_project.Logging.EntryDBus object with the associated data extracted from the journal. - Persist a serialized version of the object.
Within bmcweb there is support for translating
xyz.openbmc_project.Logging.Entry objects advertised by phosphor-logging
into Redfish LogEntries, but this support does not reference a Message
Registry. This makes the events of limited utility for consumption by system
management software, as it cannot know all of the event types and is left to
perform (hand-coded) regular-expressions to extract any information from the
Message field of the LogEntry. Furthermore, these regular-expressions are
likely to become outdated over time as internal OpenBMC error reporting
structure, metadata, or message strings evolve.
Issues with the Status Quo
-
There are two different implementations of error logging, neither of which are both complete and fully accepted by maintainers. These implementations also do not cover tracing events.
-
The
REDFISH_MESSAGE_IDlog approach leads to differences between the Redfish Message Registry and the reporting application. It also requires every application to be "Redfish aware" which limits decoupling between applications and external management interfaces. This also leaves gaps for reporting errors in different management interfaces, such as inband IPMI and PLDM. The approach also does not provide comple-time assurance of appropriate metadata collection, which can lead to producing code being out-of-date with the message registry definitions. -
The
phosphor-loggingapproach does not provide compile-time assurance of appropriate metadata collection and requires expensive daemon processing of thesystemdjournal on each error report, which limits scalability. -
The
sdbusplusbindings for error reporting do not currently handle lossless transmission of errors between DBus servers and clients. -
Similar applications can result in different Redfish
LogEntryfor the same error scenario. This has been observed in sensor threshold exceeded events betweendbus-sensors,phosphor-hwmon,phosphor-virtual-sensor, andphosphor-health-monitor. One cause of this is two different error reporting approaches and disagreements amongst maintainers as to the preferred approach.
Requirements
-
Applications running on the BMC must be able to report errors and failure which are persisted and available for external system management through standards such as Redfish.
- These errors must be structured, versioned, and the complete set of errors able to be created by the BMC should be available at built-time of a BMC image.
- The set of errors, able to be created by the BMC, must be able to be
transformed into relevant data sets, such as Redfish Message Registries.
- For Redfish, the transformation must comply with the Redfish standard requirements, such as conforming to semantic versioning expectations.
- For Redfish, the transformation should allow mapping internally defined events to pre-existing Redfish Message Registries for broader compatibility.
- For Redfish, the implementation must also support the EventService mechanics for push-reporting.
- Errors reported by the BMC should contain sufficient information to allow service of the system for these failures, either by humans or automation (depending on the individual system requirements).
-
Applications running on the BMC should be able to report important tracing events relevant to system management and/or debug, such as the system successfully reaching a running state.
- All requirements relevant to errors are also applicable to tracing events.
- The implementation must have a mechanism for vendors to be able to disable specific tracing events to conform to their own system design requirements.
-
Applications running on the BMC should be able to determine when a previously reported error is no longer relevant and mark it as "resolved", while maintaining the persistent record for future usages such as debug.
-
The BMC should provide a mechanism for managed entities within the server to report their own errors and events. Examples of managed entities would be firmware, such as the BIOS, and satellite management controllers.
-
The implementation on the BMC should scale to a minimum of 10,000 error and events without impacting the BMC or managed system performance.
-
The implementation should provide a mechanism to allow OEM or vendor extensions to the error and event definitions (and generated artifacts such as the Redfish Message Registry) for usage in closed-source or non-upstreamed code. These extensions must be clearly identified, in all interfaces, as vendor-specific and not be tied to the OpenBMC project.
-
APIs to implement error and event reporting should have good ergonomics. These APIs must provide compile-time identification, for applicable programming languages, of call sites which do not conform to the BMC error and event specifications.
- The generated error classes and APIs should not require exceptions but
should also integrate with the
sdbusplusclient and server bindings, which do leverage exceptions.
- The generated error classes and APIs should not require exceptions but
should also integrate with the
Proposed Design
The proposed design has a few high-level design elements:
-
Consolidate the
sdbusplusandphosphor-loggingimplementation of error reporting; expand it to cover tracing events; improve the ergonomics of the associated APIs and add compile-time checking of missing metadata. -
Add APIs to
phosphor-loggingto enable daemons to easily look up their own previously reported events (for marking as resolved). -
Add to
phosphor-logginga compile-time mechanism to disable recording of specific tracing events for vendor-level customization. -
Generate a Redfish Message Registry for all error and events defined in
phosphor-dbus-interfaces, using binding generators fromsdbusplus. Enhancebmcwebimplementation of theLogging.EntrytoLogEventtransformation to cover the Redfish Message Registry andphosphor-loggingenhancements; Leverage the RedfishLogEntry.DiagnosticDatafield to provide a Base64-encoded JSON representation of the entireLogging.Entryfor additional diagnostics does this need to be optional?. Add support to thebmcwebEventService implementation to supportphosphor-logging-hosted events.
sdbusplus
The Foo.errors.yaml content will be combined with the content formerly in the
Foo.metadata.yaml files specified by phosphor-logging and specified by a new
file type Foo.events.yaml. This Foo.events.yaml format will cover both the
current error and metadata information as well as augment with additional
information necessary to generate external facing datasets, such as Redfish
Message Registries. The current Foo.errors.yaml and Foo.metadata.yaml files
will be deprecated as their usage is replaced by the new format.
The sdbusplus library will be enhanced to provide the following:
-
JSON serialization and de-serialization of generated exception types with their assigned metadata; assignment of the JSON serialization to the
messagefield ofsd_bus_error_setcalls when errors are returned from DBus server calls. -
A facility to register exception types, at library load time, with the
sdbuspluslibrary for automatic conversion back to C++ exception types in DBus clients.
The binding generator(s) will be expanded to do the following:
-
Generate complete C++ exception types, with compile-time checking of missing metadata and JSON serialization, for errors and events. Metadata can be of one of the following types:
- size-type and signed integer
- floating-point number
- string
- DBus object path
-
Generate a format that
bmcwebcan use to create and populate a Redfish Message Registry, and translate fromphosphor-loggingto RedfishLogEntryfor a set of errors and events
For general users of sdbusplus these changes should have no impact, except for
the availability of new generated exception types and that specialized instances
of sdbusplus::exception::generated_exception will become available in DBus
clients.
phosphor-dbus-interfaces
Refactoring will be done to migrate existing Foo.metadata.yaml and
Foo.errors.yaml content to the Foo.events.yaml as migration is done by
applications. Minor changes will take place to utilize the new binding
generators from sdbusplus. A small library enhancement will be done to
register all generated exception types with sdbusplus. Future contributors
will be able to contribute new error and tracing event definitions.
phosphor-logging
TODO: Should a tracing event be a
Logging.Entrywith severity ofInformationalor should they be a new type, such asLogging.Eventand managed separately. Thephosphor-loggingdefaultmeson.optionshaveerror_cap=200anderror_info_cap=10. If we increase the total number of events allowed to 10K, the majority of them are likely going to be information / tracing events.
The Logging.Entry interface's AdditionalData property should change to
dict[string, variant[string,int64_t,size_t,object_path]].
The Logging.Create interface will have a new method added:
- name: CreateEntry
parameters:
- name: Message
type: string
- name: Severity
type: enum[Logging.Entry.Level]
- name: AdditionalData
type: dict[string, variant[string,int64_t,size_t,object_path]]
- name: Hint
type: string
default: ""
returns:
- name: Entry
type: object_path
The Hint parameter is used for daemons to be able to query for their
previously recorded error, for marking as resolved. These strings need to be
globally unique and are suggested to be of the format "<service_name>:<key>".
A Logging.SearchHint interface will be created, which will be recorded at the
same object path as a Logging.Entry when the Hint parameter was not an empty
string:
- property: Hint
type: string
The Logging.Manager interface will be added with a single method:
- name: FindEntry
parameters:
- name: Hint
type: String
returns:
- name: Entry
type: object_path
errors:
- xyz.openbmc_project.Common.ResourceNotFound
A lg2::commit API will be added to support the new sdbusplus generated
exception types, calling the new Logging.Create.CreateEntry method proposed
earlier. This new API will support sdbusplus::bus_t for synchronous DBus
operations and both sdbusplus::async::context_t and
sdbusplus::asio::connection for asynchronous DBus operations.
There are outstanding performance concerns with the phosphor-logging
implementation that may impact the ability for scaling to 10,000 event records.
This issue is expected to be self-contained within phosphor-logging, except
for potential future changes to the log-retrieval interfaces used by bmcweb.
In order to decouple the transition to this design, by callers of the logging
APIs, from the experimentation and improvements in phosphor-logging, we will
add a compile option and Yocto DISTRO_FEATURE that can turn lg2::commit
behavior into an OPENBMC_MESSAGE_ID record in the journal, along the same
approach as the previous REDFISH_MESSAGE_ID, and corresponding rsyslog
configuration and bmcweb support to use these directly. This will allow
systems which knowingly scale to a large number of event records, using
rsyslog mechanics, the same level of performance. One caveat of this support
is that the hint and resolution behavior will not exist when that option is
enabled.
bmcweb
bmcweb already has support for build-time conversion from a Redfish Message
Registry, codified in JSON, to header files it uses to serve the registry; this
will be expanded to support Redfish Message Registries generated by sdbusplus.
bmcweb will add a Meson option for additional message registries, provided
from bitbake from phosphor-dbus-interfaces and vendor-specific event
definitions as a path to a directory of Message Registry JSONs. Support will
also be added for adding phosphor-dbus-interfaces as a Meson subproject for
stand-alone testing.
It is desirable for sdbusplus to generate a Redfish Message Registry directly,
leveraging the existing scripts for integration with bmcweb. As part of this
we would like to support mapping a Logging.Entry event to an existing
standardized Redfish event (such as those in the Base registry). The generated
information must contain the Logging.Entry::Message identifier, the
AdditionalData to MessageArgs mapping, and the translation from the
Message identifier to the Redfish Message ID (when the Message ID is not from
"this" registry). In order to facilitate this, we will need to add OEM fields to
the Redfish Message Registry JSON, which are only used by the bmcweb
processing scripts, to generate the information necessary for this additional
mapping.
The xyz.openbmc_project.Logging.Entry to LogEvent conversion needs to be
enhanced, to utilize these Message Registries, in four ways:
-
A Base64-encoded JSON representation of the
Logging.Entrywill be assigned to theDiagnosticDataproperty. -
If the
Logging.Entry::Messagecontains an identifier corresponding to a Registry entry, theMessageIdproperty will be set to the corresponding Redfish Message ID. Otherwise, theLogging.Entry::Messagewill be used directly with no further transformation (as is done today). -
If the
Logging.Entry::Messagecontains an identifier corresponding to a Registry entry, theMessageArgsproperty will be filled in by obtaining the corresponding values from theAdditionalDatadictionary and theMessagefield will be generated from combining these values with theMessagestring from the Registry. -
A mechanism should be implemented to translate DBus
object_pathreferences to Redfish Resource URIs. When anobject_pathcannot be translated,bmcwebwill use a prefix such asobject_path:in theMessageArgsvalue.
The implementation of EventService should be enhanced to support
phosphor-logging hosted events. The implementation of LogService should be
enhanced to support log paging for phosphor-logging hosted events.
phosphor-sel-logger
The phosphor-sel-logger has a meson option send-to-logger which toggles
between using phosphor-logging or the REDFISH_MESSAGE_ID
mechanism. The phosphor-logging-utilizing paths will be
updated to utilize phosphor-dbus-interfaces specified errors and events.
YAML format
Consider an example file in phosphor-dbus-interfaces as
yaml/xyz/openbmc_project/Software/Update.events.yaml with hypothetical errors
and events:
version: 1.3.1
errors:
- name: UpdateFailure
severity: critical
metadata:
- name: TARGET
type: string
primary: true
- name: ERRNO
type: int64
- name: CALLOUT_HARDWARE
type: object_path
primary: true
en:
description: While updating the firmware on a device, the update failed.
message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}.
resolution: Retry update.
- name: BMCUpdateFailure
severity: critical
deprecated: 1.0.0
en:
description: Failed to update the BMC
redfish-mapping: OpenBMC.FirmwareUpdateFailed
events:
- name: UpdateProgress
metadata:
- name: TARGET
type: string
primary: true
- name: COMPLETION
type: double
primary: true
en:
description: An update is in progress and has reached a checkpoint.
message: Updating of {TARGET} is {COMPLETION}% complete.
Each foo.events.yaml file would be used to generate both the C++ classes (via
sdbusplus) for exception handling and event reporting, as well as a versioned
Redfish Message Registry for the errors and events. The YAML schema is as
follows:
$id: https://openbmc-project.xyz/sdbusplus/events.schema.yaml
$schema: https://json-schema.org/draft/2020-12/schema
title: Event and error definitions
type: object
$defs:
event:
type: array
items:
type: object
properties:
name:
type: string
description:
An identifier for the event in UpperCamelCase; used as the class and
Redfish Message ID.
en:
type: object
description: The details for English.
properties:
description:
type: string
description:
A developer-applicable description of the error reported. These
form the "description" of the Redfish message.
message:
type: string
description:
The end-user message, including placeholders for arguemnts.
resolution:
type: string
description: The end-user resolution.
severity:
enum:
- emergency
- alert
- critical
- error
- warning
- notice
- informational
- debug
description:
The `xyz.openbmc_project.Logging.Entry.Level` value for this
error. Only applicable for 'errors'.
redfish-mapping:
type: string
description:
Used when a `sdbusplus` event should map to a specific Redfish
Message rather than a generated one. This is useful when an internal
error has an analog in a standardized registry.
deprecated:
type: string
pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$"
description:
Indicates that the event is now deprecated and should not be created
by any OpenBMC software, but is required to still exist for
generation in the Redfish Message Registry. The version listed here
should be the first version where the error is no longer used.
metadata:
type: array
items:
type: object
properties:
name:
type: string
description: The name of the metadata field.
type:
enum:
- string
- size
- int64
- uint64
- double
- object_path
description: The type of the metadata field.
primary:
type: boolean
description:
Set to true when the metadata field is expected to be part of
the Redfish `MessageArgs` (and not only in the extended
`DiagnosticData`).
properties:
version:
type: string
pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$"
description:
The version of the file, which will be used as the Redfish Message
Registry version.
errors:
$ref: "#/definitions/event"
events:
$ref: ":#/definitions/event"
The above example YAML would generate C++ classes similar to:
namespace sdbusplus::errors::xyz::openbmc_project::software::update
{
class UpdateFailure
{
template <typename... Args>
UpdateFailure(Args&&... args);
};
}
namespace sdbusplus::events::xyz::openbmc_project::software::update
{
class UpdateProgress
{
template <typename... Args>
UpdateProgress(Args&&... args);
};
}
The constructors here are variadic templates because the generated constructor
implementation will provide compile-time assurance that all of the metadata
fields have been populated (in any order). To raise an UpdateFailure a
developers might do something like:
// Immediately report the event:
lg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path));
// or send it in a dbus response (when using sdbusplus generated binding):
throw UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path);
If one of the fields, such as ERRNO were omitted, a compile failure will be
raised indicating the first missing field.
Versioning Policy
Assume the version follows semantic versioning MAJOR.MINOR.PATCH convention.
- Adjusting a description or message should result in a
PATCHincrement. - Adding a new error or event, or adding metadata to an existing error or event,
should result in a
MINORincrement. - Deprecating an error or event should result in a
MAJORincrement.
There is guidance on maintenance of the OpenBMC Message
Registry. We will incorporate that guidance into the equivalent
phosphor-dbus-interfaces policy.
Generated Redfish Message Registry
DSP0266, the Redfish specification, gives requirements for Redfish Message Registries and dictates guidelines for identifiers.
The hypothetical events defined above would create a message registry similar to:
{
"Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1",
"Language": "en",
"Messages": {
"UpdateFailure": {
"Description": "While updating the firmware on a device, the update failed.",
"Message": "A failure occurred updating %1 on %2.",
"Resolution": "Retry update."
"NumberOfArgs": 2,
"ParamTypes": ["string", "string"],
"Severity": "Critical",
},
"UpdateProgress" : {
"Description": "An update is in progress and has reached a checkpoint."
"Message": "Updating of %1 is %2\% complete.",
"Resolution": "None",
"NumberOfArgs": 2,
"ParamTypes": ["string", "number"],
"Severity": "OK",
}
}
}
The prefix OpenBMC_Base shall be exclusively reserved for use by events from
phosphor-logging. Events defined in other repositories will be expected to use
some other prefix. Vendor-defined repositories should use a vendor-owned prefix
as directed by DSP0266.
Vendor implications
As specified above, vendors must use their own identifiers in order to conform
with the Redfish specification (see DSP0266 for requirements on
identifier naming). The sdbusplus (and phosphor-logging and bmcweb)
implementation(s) will enable vendors to create their own events for downstream
code and Registries for integration with Redfish, by creating downstream
repositories of error definitions. Vendors are responsible for ensuring their
own versioning and identifiers conform to the expectations in the Redfish
specification.
One potential bad behavior on the part of vendors would be forking and modifying
phosphor-dbus-interfaces defined events. Vendors must not add their own events
to phosphor-dbus-interfaces in downstream implementations because it would
lead to their implementation advertising support for a message in an
OpenBMC-owned Registry which is not the case, but they should add them to their
own repositories with a separate identifier. Similarly, if a vendor were to
backport upstream changes into their fork, they would need to ensure that the
foo.events.yaml file for that version matches identically with the upstream
implementation.
Alternatives Considered
Many alternatives have been explored and referenced through earlier work. Within this proposal there are many minor-alternatives that have been assessed.
Exception inheritance
The original phosphor-logging error descriptions allowed inheritance between
two errors. This is not supported by the proposal for two reasons:
-
This introduces complexity in the Redfish Message Registry versioning because a change in one file should induce version changes in all dependent files.
-
It makes it difficult for a developer to clearly identify all of the fields they are expected to populate without traversing multiple files.
sdbusplus Exception APIs
There are a few possible syntaxes I came up with for constructing the generated exception types. It is important that these have good ergonomics, are easy to understand, and can provide compile-time awareness of missing metadata fields.
using Example = sdbusplus::error::xyz::openbmc_project::Example;
// 1)
throw Example().fru("Motherboard").value(42);
// 2)
throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42);
// 3)
throw Example("FRU", "Motherboard", "VALUE", 42);
// 4)
throw Example([](auto e) { return e.fru("Motherboard").value(42); });
// 5)
throw Example({.fru = "Motherboard", .value = 42});
Note: These examples are all show using throw syntax, but could also be
saved in local variables, returned from functions, or immediately passed to
lg2::commit.
-
This would be my preference for ergonomics and clarity, as it would allow LSP-enabled editors to give completions for the metadata fields but unfortunately there is no mechanism in C++ to define a type which can be constructed but not thrown, which means we cannot get compile-time checking of all metadata fields.
-
This syntax uses tag-dispatch to enables compile-time checking of all metadata fields and potential LSP-completion of the tag-types, but is more verbose than option 3.
-
This syntax is less verbose than (2) and follows conventions already used in
phosphor-logging'slg2API, but does not allow LSP-completion of the metadata tags. -
This syntax is similar to option (1) but uses an indirection of a lambda to enable compile-time checking that all metadata fields have been populated by the lambda. The LSP-completion is likely not as strong as option (1), due to the use of
auto, and the lambda necessity will likely be a hang-up for unfamiliar developers. -
This syntax has similar characteristics as option (1) but similarly does not provide compile-time confirmation that all fields have been populated.
The proposal therefore suggests option (3) is most suitable.
Redfish Translation Support
The proposed YAML format allows future addition of translation but it is not
enabled at this time. Future development could enable the Redfish Message
Registry to be generated in multiple languages if the message:language exists
for those languages.
Redfish Registry Versioning
The Redfish Message Registries are required to be versioned and has 3 digit
fields (ie. XX.YY.ZZ), but only the first 2 are suppose to be used in the
Message ID. Rather than using the manually specified version we could take a few
other approaches:
-
Use a date code (ex.
2024.17.x) representing the ISO 8601 week when the registry was built.- This does not cover vendors that may choose to branch for stabilization purposes, so we can end up with two machines having the same OpenBMC-versioned message registry with different content.
-
Use the most recent
openbmc/openbmctag as the version.- This does not cover vendors that build off HEAD and may deploy multiple images between two OpenBMC releases.
-
Generate the version based on the git-history.
- This requires
phosphor-dbus-interfacesto be built from a git repository, which may not always be true for Yocto source mirrors, and requires non-trivial processing that continues to scale over time.
- This requires
Existing OpenBMC Redfish Registry
There are currently 191 messages defined in the existing Redfish Message
Registry at version OpenBMC.0.4.0. Of those, not a single one in the codebase
is emitted with the correct version. 96 of those are only emitted by
Intel-specific code that is not pulled into any upstreamed machine, 39 are
emitted by potentially common code, and 56 are not even referenced in the
codebase outside of the bmcweb registry. Of the 39 common messages half of them
have an equivalent in one of the standard registries that should be leveraged
and many of the others do not have attributes that would facilitate a multi-host
configuration, so the registry at a minimum needs to be updated. None of the
current implementation has the capability to handle Redfish Resource URIs.
The proposal therefore is to deprecate the existing registry and replace it with the new generated registries. For repositories that currently emit events in the existing format, we can maintain those call-sites for a time period of 1-2 years.
If this aspect of the proposal is rejected, the YAML format allows mapping from
phosphor-dbus-interfaces defined events to the current OpenBMC.0.4.0
registry MessageIds.
Potentially common:
- phosphor-post-code-manager
- BIOSPOSTCode (unique)
- dbus-sensors
- ChassisIntrusionDetected (unique)
- ChassisIntrusionReset (unique)
- FanInserted
- FanRedundancyLost (unique)
- FanRedudancyRegained (unique)
- FanRemoved
- LanLost
- LanRegained
- PowerSupplyConfigurationError (unique)
- PowerSupplyConfigurationErrorRecovered (unique)
- PowerSupplyFailed
- PowerSupplyFailurePredicted (unique)
- PowerSupplyFanFailed
- PowerSupplyFanRecovered
- PowerSupplyPowerLost
- PowerSupplyPowerRestored
- PowerSupplyPredictiedFailureRecovered (unique)
- PowerSupplyRecovered
- phosphor-sel-logger
- IPMIWatchdog (unique)
SensorThreshold*: 8 different events
- phosphor-net-ipmid
- InvalidLoginAttempted (unique)
- entity-manager
- InventoryAdded (unique)
- InventoryRemoved (unique)
- estoraged
- ServiceStarted
- x86-power-control
- NMIButtonPressed (unique)
- NMIDiagnosticInterrupt (unique)
- PowerButtonPressed (unique)
- PowerRestorePolicyApplied (unique)
- PowerSupplyPowerGoodFailed (unique)
- ResetButtonPressed (unique)
- SystemPowerGoodFailed (unique)
Intel-only implementations:
- intel-ipmi-oem
- ADDDCCorrectable
- BIOSPostERROR
- BIOSRecoveryComplete
- BIOSRecoveryStart
- FirmwareUpdateCompleted
- IntelUPILinkWidthReducedToHalf
- IntelUPILinkWidthReducedToQuarter
- LegacyPCIPERR
- LegacyPCISERR
ME*: 29 different eventsMemory*: 9 different events- MirroringRedundancyDegraded
- MirroringRedundancyFull
PCIeCorrectable*,PCIeFatal: 29 different events- SELEntryAdded
- SparingRedundancyDegraded
- pfr-manager
- BIOSFirmwareRecoveryReason
- BIOSFirmwarePanicReason
- BMCFirmwarePanicReason
- BMCFirmwareRecoveryReason
- BMCFirmwareResiliencyError
- CPLDFirmwarePanicReason
- CPLDFirmwareResilencyError
- FirmwareResiliencyError
- host-error-monitor
- CPUError
- CPUMismatch
- CPUThermalTrip
- ComponentOverTemperature
- SsbThermalTrip
- VoltageRegulatorOverheated
- s2600wf-misc
- DriveError
- InventoryAdded
Impacts
-
New APIs are defined for error and event logging. This will deprecate existing
phosphor-loggingAPIs, with a time to migrate, for error reporting. -
The design should improve performance by eliminating the regular parsing of the
systemdjournal. The design may decrease performance by allowing the number of error and event logs to be dramatically increased, which have an impact to file system utilization and potential for DBus impacts some services such asObjectMapper. -
Backwards compatibility and documentation should be improved by the automatic generation of the Redfish Message Registry corresponding to all error and event reports.
Organizational
- Does this repository require a new repository?
- No
- Who will be the initial maintainer(s) of this repository?
- N/A
- Which repositories are expected to be modified to execute this design?
sdbusplusphosphor-dbus-interfacesphosphor-loggingbmcweb- Any repository creating an error or event.
Testing
-
Unit tests will be written in
sdbusplusandphosphor-loggingfor the error and event generation, creation APIs, and to provide coverage on any changes to theLogging.Entryobject management. -
Unit tests will be written for
bmcwebfor basicLogging.Entrytransformation and Message Registry generation. -
Integration tests should be leveraged (and enhanced as necessary) from
openbmc-test-automationto cover the end-to-end error creation and Redfish reporting.