986 lines
39 KiB
Markdown
986 lines
39 KiB
Markdown
|
|
# Error and Event Logging
|
||
|
|
|
||
|
|
Author: [Patrick Williams][patrick-email] `<stwcx>`
|
||
|
|
|
||
|
|
[patrick-email]: mailto:patrick@stwcx.xyz
|
||
|
|
|
||
|
|
Other contributors:
|
||
|
|
|
||
|
|
Created: May 16, 2024
|
||
|
|
|
||
|
|
## Problem Description
|
||
|
|
|
||
|
|
There is currently not a consistent end-to-end error and event reporting design
|
||
|
|
for the OpenBMC code stack. There are two different implementations, one
|
||
|
|
primarily using phosphor-logging and one using rsyslog, both of which have gaps
|
||
|
|
that a complete solution should address. This proposal is intended to be an
|
||
|
|
end-to-end design handling both errors and tracing events which facilitate
|
||
|
|
external management of the system in an automated and maintainable manner.
|
||
|
|
|
||
|
|
## Background and References
|
||
|
|
|
||
|
|
### Redfish LogEntry and Message Registry
|
||
|
|
|
||
|
|
In Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that
|
||
|
|
could be considered "logs", but one such use within OpenBMC is for an equivalent
|
||
|
|
of the IPMI "System Event Log (SEL)".
|
||
|
|
|
||
|
|
The IPMI SEL is the location where the BMC can collect errors and events,
|
||
|
|
sometimes coming from other entities, such as the BIOS. Examples of these might
|
||
|
|
be "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful".
|
||
|
|
These SEL records are exposed as human readable strings, either natively by a
|
||
|
|
OEM SEL design or by tools such as `ipmitool`, which are typically unique to
|
||
|
|
each system or manufacturer, and could hypothethically change with a BMC or
|
||
|
|
firmware update, and are thus difficult to create automated tooling around. Two
|
||
|
|
different vendors might use different strings to represent a critical
|
||
|
|
temperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example]
|
||
|
|
and ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is
|
||
|
|
also no mechanism with IPMI to ask the machine "what are all of the SELs you
|
||
|
|
might create".
|
||
|
|
|
||
|
|
In order to solve two aspects of this problem, listing of possible events and
|
||
|
|
versioning, Redfish has Message Registries. A message registry is a versioned
|
||
|
|
collection of all of the error events that a system could generate and hints as
|
||
|
|
to how they might be parsed and displayed to a user. An [informative
|
||
|
|
reference][Registry-Example] from the DMTF gives this example:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry",
|
||
|
|
"Id": "Alert.1.0.0",
|
||
|
|
"RegistryPrefix": "Alert",
|
||
|
|
"RegistryVersion": "1.0.0",
|
||
|
|
"Messages": {
|
||
|
|
"LanDisconnect": {
|
||
|
|
"Description": "A LAN Disconnect on %1 was detected on system %2.",
|
||
|
|
"Message": "A LAN Disconnect on %1 was detected on system %2.",
|
||
|
|
"Severity": "Warning",
|
||
|
|
"NumberOfArgs": 2,
|
||
|
|
"Resolution": "None"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
This example defines an event, `Alert.1.0.LanDisconnect`, which can record the
|
||
|
|
disconnect state of a network device and contains placeholders for the affected
|
||
|
|
device and system. When this event occurs, there might be a `LogEntry` recorded
|
||
|
|
containing something like:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.",
|
||
|
|
"MessageId": "Alert.1.0.LanDisconnect",
|
||
|
|
"MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
The `Message` contains a human readable string which was created by applying the
|
||
|
|
`MessageArgs` to the placeholders from the `Message` field in the registry.
|
||
|
|
System management software can rely on the message registry (referenced from the
|
||
|
|
`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to
|
||
|
|
perform string processing for reacting to the event.
|
||
|
|
|
||
|
|
Within OpenBMC, there is currently a [limited design][existing-design] for this
|
||
|
|
Redfish feature and it requires inserting specially formed Redfish-specific
|
||
|
|
logging messages into any application that wants to record these events, tightly
|
||
|
|
coupling all applications to the Redfish implementation. It has also been
|
||
|
|
observed that these [strings][app-example], when used, are often out of date
|
||
|
|
with the [message registry][registry-example] advertised by `bmcweb`. Some
|
||
|
|
maintainers have rejected adding new Redfish-specific logging messages to their
|
||
|
|
applications.
|
||
|
|
|
||
|
|
[LogEntry]:
|
||
|
|
https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json
|
||
|
|
[HPE-Example]:
|
||
|
|
https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html
|
||
|
|
[Oracle-Example]:
|
||
|
|
https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068
|
||
|
|
[Registry-Example]:
|
||
|
|
https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf
|
||
|
|
[existing-design]:
|
||
|
|
https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md
|
||
|
|
[app-example]:
|
||
|
|
https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143
|
||
|
|
[registry-example]:
|
||
|
|
https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5
|
||
|
|
|
||
|
|
### Existing phosphor-logging implementation
|
||
|
|
|
||
|
|
**Note**: While the word 'exception' is used in this section, the existing (and
|
||
|
|
proposed) types can be used by applications and execution contexts with
|
||
|
|
exceptions disabled. They are 'exceptions' because they do inherit from
|
||
|
|
`std::exception` and there is support in the `sdbusplus` bindings for them to be
|
||
|
|
used in exception handling.
|
||
|
|
|
||
|
|
The `sdbusplus` bindings have the capability to define new C++ exception types
|
||
|
|
which can be thrown by a DBus server and turned into an error response to the
|
||
|
|
client. `phosphor-logging` extended this to also add metadata associated to the
|
||
|
|
log type. See the following example error definitions and usages.
|
||
|
|
|
||
|
|
`sdbusplus` error binding definition (in
|
||
|
|
`xyz/openbmc_project/Certs.errors.yaml`):
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
- name: InvalidCertificate
|
||
|
|
description: Invalid certificate file.
|
||
|
|
```
|
||
|
|
|
||
|
|
`phosphor-logging` metadata definition (in
|
||
|
|
`xyz/openbmc_project/Certs.metadata.yaml`):
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
- name: InvalidCertificate
|
||
|
|
meta:
|
||
|
|
- str: "REASON=%s"
|
||
|
|
type: string
|
||
|
|
```
|
||
|
|
|
||
|
|
Application code reporting an error:
|
||
|
|
|
||
|
|
```cpp
|
||
|
|
elog<InvalidCertificate>(Reason("Invalid certificate file format"));
|
||
|
|
// or
|
||
|
|
report<InvalidCertificate>(Reason("Existing certificate file is corrupted"));
|
||
|
|
```
|
||
|
|
|
||
|
|
In this sample, an error named
|
||
|
|
`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can
|
||
|
|
be sent between applications as a DBus response. The `InvalidCertificate` is
|
||
|
|
expected to have additional metadata `REASON` which is a string. The two APIs
|
||
|
|
`elog` and `report` have slightly different behaviors: `elog` throws an
|
||
|
|
exception which can either result in an error DBus result or be handled
|
||
|
|
elsewhere in the application, while `report` sends the event directly to
|
||
|
|
`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the
|
||
|
|
metadata is inserted into the `systemd` journal.
|
||
|
|
|
||
|
|
When an error is sent to the `phosphor-logging` daemon, it will:
|
||
|
|
|
||
|
|
1. Search back through the journal for recorded metadata associated with the
|
||
|
|
event (this is a relative slow operation).
|
||
|
|
2. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object
|
||
|
|
with the associated data extracted from the journal.
|
||
|
|
3. Persist a serialized version of the object.
|
||
|
|
|
||
|
|
Within `bmcweb` there is support for translating
|
||
|
|
`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging`
|
||
|
|
into Redfish `LogEntries`, but this support does not reference a Message
|
||
|
|
Registry. This makes the events of limited utility for consumption by system
|
||
|
|
management software, as it cannot know all of the event types and is left to
|
||
|
|
perform (hand-coded) regular-expressions to extract any information from the
|
||
|
|
`Message` field of the `LogEntry`. Furthermore, these regular-expressions are
|
||
|
|
likely to become outdated over time as internal OpenBMC error reporting
|
||
|
|
structure, metadata, or message strings evolve.
|
||
|
|
|
||
|
|
[Logging-Entry]:
|
||
|
|
https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1
|
||
|
|
|
||
|
|
### Issues with the Status Quo
|
||
|
|
|
||
|
|
- There are two different implementations of error logging, neither of which are
|
||
|
|
both complete and fully accepted by maintainers. These implementations also do
|
||
|
|
not cover tracing events.
|
||
|
|
|
||
|
|
- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish
|
||
|
|
Message Registry and the reporting application. It also requires every
|
||
|
|
application to be "Redfish aware" which limits decoupling between applications
|
||
|
|
and external management interfaces. This also leaves gaps for reporting errors
|
||
|
|
in different management interfaces, such as inband IPMI and PLDM. The approach
|
||
|
|
also does not provide comple-time assurance of appropriate metadata
|
||
|
|
collection, which can lead to producing code being out-of-date with the
|
||
|
|
message registry definitions.
|
||
|
|
|
||
|
|
- The `phosphor-logging` approach does not provide compile-time assurance of
|
||
|
|
appropriate metadata collection and requires expensive daemon processing of
|
||
|
|
the `systemd` journal on each error report, which limits scalability.
|
||
|
|
|
||
|
|
- The `sdbusplus` bindings for error reporting do not currently handle lossless
|
||
|
|
transmission of errors between DBus servers and clients.
|
||
|
|
|
||
|
|
- Similar applications can result in different Redfish `LogEntry` for the same
|
||
|
|
error scenario. This has been observed in sensor threshold exceeded events
|
||
|
|
between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and
|
||
|
|
`phosphor-health-monitor`. One cause of this is two different error reporting
|
||
|
|
approaches and disagreements amongst maintainers as to the preferred approach.
|
||
|
|
|
||
|
|
## Requirements
|
||
|
|
|
||
|
|
- Applications running on the BMC must be able to report errors and failure
|
||
|
|
which are persisted and available for external system management through
|
||
|
|
standards such as Redfish.
|
||
|
|
|
||
|
|
- These errors must be structured, versioned, and the complete set of errors
|
||
|
|
able to be created by the BMC should be available at built-time of a BMC
|
||
|
|
image.
|
||
|
|
- The set of errors, able to be created by the BMC, must be able to be
|
||
|
|
transformed into relevant data sets, such as Redfish Message Registries.
|
||
|
|
- For Redfish, the transformation must comply with the Redfish standard
|
||
|
|
requirements, such as conforming to semantic versioning expectations.
|
||
|
|
- For Redfish, the transformation should allow mapping internally defined
|
||
|
|
events to pre-existing Redfish Message Registries for broader
|
||
|
|
compatibility.
|
||
|
|
- For Redfish, the implementation must also support the EventService
|
||
|
|
mechanics for push-reporting.
|
||
|
|
- Errors reported by the BMC should contain sufficient information to allow
|
||
|
|
service of the system for these failures, either by humans or automation
|
||
|
|
(depending on the individual system requirements).
|
||
|
|
|
||
|
|
- Applications running on the BMC should be able to report important tracing
|
||
|
|
events relevant to system management and/or debug, such as the system
|
||
|
|
successfully reaching a running state.
|
||
|
|
|
||
|
|
- All requirements relevant to errors are also applicable to tracing events.
|
||
|
|
- The implementation must have a mechanism for vendors to be able to disable
|
||
|
|
specific tracing events to conform to their own system design requirements.
|
||
|
|
|
||
|
|
- Applications running on the BMC should be able to determine when a previously
|
||
|
|
reported error is no longer relevant and mark it as "resolved", while
|
||
|
|
maintaining the persistent record for future usages such as debug.
|
||
|
|
|
||
|
|
- The BMC should provide a mechanism for managed entities within the server to
|
||
|
|
report their own errors and events. Examples of managed entities would be
|
||
|
|
firmware, such as the BIOS, and satellite management controllers.
|
||
|
|
|
||
|
|
- The implementation on the BMC should scale to a minimum of
|
||
|
|
[10,000][error-discussion] error and events without impacting the BMC or
|
||
|
|
managed system performance.
|
||
|
|
|
||
|
|
- The implementation should provide a mechanism to allow OEM or vendor
|
||
|
|
extensions to the error and event definitions (and generated artifacts such as
|
||
|
|
the Redfish Message Registry) for usage in closed-source or non-upstreamed
|
||
|
|
code. These extensions must be clearly identified, in all interfaces, as
|
||
|
|
vendor-specific and not be tied to the OpenBMC project.
|
||
|
|
|
||
|
|
- APIs to implement error and event reporting should have good ergonomics. These
|
||
|
|
APIs must provide compile-time identification, for applicable programming
|
||
|
|
languages, of call sites which do not conform to the BMC error and event
|
||
|
|
specifications.
|
||
|
|
|
||
|
|
- The generated error classes and APIs should not require exceptions but
|
||
|
|
should also integrate with the `sdbusplus` client and server bindings, which
|
||
|
|
do leverage exceptions.
|
||
|
|
|
||
|
|
[error-discussion]:
|
||
|
|
https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213
|
||
|
|
|
||
|
|
## Proposed Design
|
||
|
|
|
||
|
|
The proposed design has a few high-level design elements:
|
||
|
|
|
||
|
|
- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error
|
||
|
|
reporting; expand it to cover tracing events; improve the ergonomics of the
|
||
|
|
associated APIs and add compile-time checking of missing metadata.
|
||
|
|
|
||
|
|
- Add APIs to `phosphor-logging` to enable daemons to easily look up their own
|
||
|
|
previously reported events (for marking as resolved).
|
||
|
|
|
||
|
|
- Add to `phosphor-logging` a compile-time mechanism to disable recording of
|
||
|
|
specific tracing events for vendor-level customization.
|
||
|
|
|
||
|
|
- Generate a Redfish Message Registry for all error and events defined in
|
||
|
|
`phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance
|
||
|
|
`bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to
|
||
|
|
cover the Redfish Message Registry and `phosphor-logging` enhancements;
|
||
|
|
Leverage the Redfish `LogEntry.DiagnosticData` field to provide a
|
||
|
|
Base64-encoded JSON representation of the entire `Logging.Entry` for
|
||
|
|
additional diagnostics [[does this need to be optional?]]. Add support to the
|
||
|
|
`bmcweb` EventService implementation to support `phosphor-logging`-hosted
|
||
|
|
events.
|
||
|
|
|
||
|
|
### `sdbusplus`
|
||
|
|
|
||
|
|
The `Foo.errors.yaml` content will be combined with the content formerly in the
|
||
|
|
`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new
|
||
|
|
file type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the
|
||
|
|
current `error` and `metadata` information as well as augment with additional
|
||
|
|
information necessary to generate external facing datasets, such as Redfish
|
||
|
|
Message Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files
|
||
|
|
will be deprecated as their usage is replaced by the new format.
|
||
|
|
|
||
|
|
The `sdbusplus` library will be enhanced to provide the following:
|
||
|
|
|
||
|
|
- JSON serialization and de-serialization of generated exception types with
|
||
|
|
their assigned metadata; assignment of the JSON serialization to the `message`
|
||
|
|
field of `sd_bus_error_set` calls when errors are returned from DBus server
|
||
|
|
calls.
|
||
|
|
|
||
|
|
- A facility to register exception types, at library load time, with the
|
||
|
|
`sdbusplus` library for automatic conversion back to C++ exception types in
|
||
|
|
DBus clients.
|
||
|
|
|
||
|
|
The binding generator(s) will be expanded to do the following:
|
||
|
|
|
||
|
|
- Generate complete C++ exception types, with compile-time checking of missing
|
||
|
|
metadata and JSON serialization, for errors and events. Metadata can be of one
|
||
|
|
of the following types:
|
||
|
|
|
||
|
|
- size-type and signed integer
|
||
|
|
- floating-point number
|
||
|
|
- string
|
||
|
|
- DBus object path
|
||
|
|
|
||
|
|
- Generate a format that `bmcweb` can use to create and populate a Redfish
|
||
|
|
Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry`
|
||
|
|
for a set of errors and events
|
||
|
|
|
||
|
|
For general users of `sdbusplus` these changes should have no impact, except for
|
||
|
|
the availability of new generated exception types and that specialized instances
|
||
|
|
of `sdbusplus::exception::generated_exception` will become available in DBus
|
||
|
|
clients.
|
||
|
|
|
||
|
|
### `phosphor-dbus-interfaces`
|
||
|
|
|
||
|
|
Refactoring will be done to migrate existing `Foo.metadata.yaml` and
|
||
|
|
`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by
|
||
|
|
applications. Minor changes will take place to utilize the new binding
|
||
|
|
generators from `sdbusplus`. A small library enhancement will be done to
|
||
|
|
register all generated exception types with `sdbusplus`. Future contributors
|
||
|
|
will be able to contribute new error and tracing event definitions.
|
||
|
|
|
||
|
|
### `phosphor-logging`
|
||
|
|
|
||
|
|
> TODO: Should a tracing event be a `Logging.Entry` with severity of
|
||
|
|
> `Informational` or should they be a new type, such as `Logging.Event` and
|
||
|
|
> managed separately. The `phosphor-logging` default `meson.options` have
|
||
|
|
> `error_cap=200` and `error_info_cap=10`. If we increase the total number of
|
||
|
|
> events allowed to 10K, the majority of them are likely going to be information
|
||
|
|
> / tracing events.
|
||
|
|
|
||
|
|
The `Logging.Entry` interface's `AdditionalData` property should change to
|
||
|
|
`dict[string, variant[string,int64_t,size_t,object_path]]`.
|
||
|
|
|
||
|
|
The `Logging.Create` interface will have a new method added:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
- name: CreateEntry
|
||
|
|
parameters:
|
||
|
|
- name: Message
|
||
|
|
type: string
|
||
|
|
- name: Severity
|
||
|
|
type: enum[Logging.Entry.Level]
|
||
|
|
- name: AdditionalData
|
||
|
|
type: dict[string, variant[string,int64_t,size_t,object_path]]
|
||
|
|
- name: Hint
|
||
|
|
type: string
|
||
|
|
default: ""
|
||
|
|
returns:
|
||
|
|
- name: Entry
|
||
|
|
type: object_path
|
||
|
|
```
|
||
|
|
|
||
|
|
The `Hint` parameter is used for daemons to be able to query for their
|
||
|
|
previously recorded error, for marking as resolved. These strings need to be
|
||
|
|
globally unique and are suggested to be of the format `"<service_name>:<key>"`.
|
||
|
|
|
||
|
|
A `Logging.SearchHint` interface will be created, which will be recorded at the
|
||
|
|
same object path as a `Logging.Entry` when the `Hint` parameter was not an empty
|
||
|
|
string:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
- property: Hint
|
||
|
|
type: string
|
||
|
|
```
|
||
|
|
|
||
|
|
The `Logging.Manager` interface will be added with a single method:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
- name: FindEntry
|
||
|
|
parameters:
|
||
|
|
- name: Hint
|
||
|
|
type: String
|
||
|
|
returns:
|
||
|
|
- name: Entry
|
||
|
|
type: object_path
|
||
|
|
errors:
|
||
|
|
- xyz.openbmc_project.Common.ResourceNotFound
|
||
|
|
```
|
||
|
|
|
||
|
|
A `lg2::commit` API will be added to support the new `sdbusplus` generated
|
||
|
|
exception types, calling the new `Logging.Create.CreateEntry` method proposed
|
||
|
|
earlier. This new API will support `sdbusplus::bus_t` for synchronous DBus
|
||
|
|
operations and both `sdbusplus::async::context_t` and
|
||
|
|
`sdbusplus::asio::connection` for asynchronous DBus operations.
|
||
|
|
|
||
|
|
There are outstanding performance concerns with the `phosphor-logging`
|
||
|
|
implementation that may impact the ability for scaling to 10,000 event records.
|
||
|
|
This issue is expected to be self-contained within `phosphor-logging`, except
|
||
|
|
for potential future changes to the log-retrieval interfaces used by `bmcweb`.
|
||
|
|
In order to decouple the transition to this design, by callers of the logging
|
||
|
|
APIs, from the experimentation and improvements in `phosphor-logging`, we will
|
||
|
|
add a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit`
|
||
|
|
behavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same
|
||
|
|
approach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog`
|
||
|
|
configuration and `bmcweb` support to use these directly. This will allow
|
||
|
|
systems which knowingly scale to a large number of event records, using
|
||
|
|
`rsyslog` mechanics, the same level of performance. One caveat of this support
|
||
|
|
is that the hint and resolution behavior will not exist when that option is
|
||
|
|
enabled.
|
||
|
|
|
||
|
|
### `bmcweb`
|
||
|
|
|
||
|
|
`bmcweb` already has support for build-time conversion from a Redfish Message
|
||
|
|
Registry, codified in JSON, to header files it uses to serve the registry; this
|
||
|
|
will be expanded to support Redfish Message Registries generated by `sdbusplus`.
|
||
|
|
`bmcweb` will add a Meson option for additional message registries, provided
|
||
|
|
from bitbake from `phosphor-dbus-interfaces` and vendor-specific event
|
||
|
|
definitions as a path to a directory of Message Registry JSONs. Support will
|
||
|
|
also be added for adding `phosphor-dbus-interfaces` as a Meson subproject for
|
||
|
|
stand-alone testing.
|
||
|
|
|
||
|
|
It is desirable for `sdbusplus` to generate a Redfish Message Registry directly,
|
||
|
|
leveraging the existing scripts for integration with `bmcweb`. As part of this
|
||
|
|
we would like to support mapping a `Logging.Entry` event to an existing
|
||
|
|
standardized Redfish event (such as those in the Base registry). The generated
|
||
|
|
information must contain the `Logging.Entry::Message` identifier, the
|
||
|
|
`AdditionalData` to `MessageArgs` mapping, and the translation from the
|
||
|
|
`Message` identifier to the Redfish Message ID (when the Message ID is not from
|
||
|
|
"this" registry). In order to facilitate this, we will need to add OEM fields to
|
||
|
|
the Redfish Message Registry JSON, which are only used by the `bmcweb`
|
||
|
|
processing scripts, to generate the information necessary for this additional
|
||
|
|
mapping.
|
||
|
|
|
||
|
|
The `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be
|
||
|
|
enhanced, to utilize these Message Registries, in four ways:
|
||
|
|
|
||
|
|
1. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned
|
||
|
|
to the `DiagnosticData` property.
|
||
|
|
|
||
|
|
2. If the `Logging.Entry::Message` contains an identifier corresponding to a
|
||
|
|
Registry entry, the `MessageId` property will be set to the corresponding
|
||
|
|
Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used
|
||
|
|
directly with no further transformation (as is done today).
|
||
|
|
|
||
|
|
3. If the `Logging.Entry::Message` contains an identifier corresponding to a
|
||
|
|
Registry entry, the `MessageArgs` property will be filled in by obtaining the
|
||
|
|
corresponding values from the `AdditionalData` dictionary and the `Message`
|
||
|
|
field will be generated from combining these values with the `Message` string
|
||
|
|
from the Registry.
|
||
|
|
|
||
|
|
4. A mechanism should be implemented to translate DBus `object_path` references
|
||
|
|
to Redfish Resource URIs. When an `object_path` cannot be translated,
|
||
|
|
`bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value.
|
||
|
|
|
||
|
|
The implementation of `EventService` should be enhanced to support
|
||
|
|
`phosphor-logging` hosted events. The implementation of `LogService` should be
|
||
|
|
enhanced to support log paging for `phosphor-logging` hosted events.
|
||
|
|
|
||
|
|
### `phosphor-sel-logger`
|
||
|
|
|
||
|
|
The `phosphor-sel-logger` has a meson option `send-to-logger` which toggles
|
||
|
|
between using `phosphor-logging` or the [`REDFISH_MESSAGE_ID`
|
||
|
|
mechanism][existing-design]. The `phosphor-logging`-utilizing paths will be
|
||
|
|
updated to utilize `phosphor-dbus-interfaces` specified errors and events.
|
||
|
|
|
||
|
|
### YAML format
|
||
|
|
|
||
|
|
Consider an example file in `phosphor-dbus-interfaces` as
|
||
|
|
`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors
|
||
|
|
and events:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
version: 1.3.1
|
||
|
|
|
||
|
|
errors:
|
||
|
|
- name: UpdateFailure
|
||
|
|
severity: critical
|
||
|
|
metadata:
|
||
|
|
- name: TARGET
|
||
|
|
type: string
|
||
|
|
primary: true
|
||
|
|
- name: ERRNO
|
||
|
|
type: int64
|
||
|
|
- name: CALLOUT_HARDWARE
|
||
|
|
type: object_path
|
||
|
|
primary: true
|
||
|
|
en:
|
||
|
|
description: While updating the firmware on a device, the update failed.
|
||
|
|
message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}.
|
||
|
|
resolution: Retry update.
|
||
|
|
|
||
|
|
- name: BMCUpdateFailure
|
||
|
|
severity: critical
|
||
|
|
deprecated: 1.0.0
|
||
|
|
en:
|
||
|
|
description: Failed to update the BMC
|
||
|
|
redfish-mapping: OpenBMC.FirmwareUpdateFailed
|
||
|
|
|
||
|
|
events:
|
||
|
|
- name: UpdateProgress
|
||
|
|
metadata:
|
||
|
|
- name: TARGET
|
||
|
|
type: string
|
||
|
|
primary: true
|
||
|
|
- name: COMPLETION
|
||
|
|
type: double
|
||
|
|
primary: true
|
||
|
|
en:
|
||
|
|
description: An update is in progress and has reached a checkpoint.
|
||
|
|
message: Updating of {TARGET} is {COMPLETION}% complete.
|
||
|
|
```
|
||
|
|
|
||
|
|
Each `foo.events.yaml` file would be used to generate both the C++ classes (via
|
||
|
|
`sdbusplus`) for exception handling and event reporting, as well as a versioned
|
||
|
|
Redfish Message Registry for the errors and events. The YAML schema is as
|
||
|
|
follows:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
$id: https://openbmc-project.xyz/sdbusplus/events.schema.yaml
|
||
|
|
$schema: https://json-schema.org/draft/2020-12/schema
|
||
|
|
title: Event and error definitions
|
||
|
|
type: object
|
||
|
|
$defs:
|
||
|
|
event:
|
||
|
|
type: array
|
||
|
|
items:
|
||
|
|
type: object
|
||
|
|
properties:
|
||
|
|
name:
|
||
|
|
type: string
|
||
|
|
description:
|
||
|
|
An identifier for the event in UpperCamelCase; used as the class and
|
||
|
|
Redfish Message ID.
|
||
|
|
en:
|
||
|
|
type: object
|
||
|
|
description: The details for English.
|
||
|
|
properties:
|
||
|
|
description:
|
||
|
|
type: string
|
||
|
|
description:
|
||
|
|
A developer-applicable description of the error reported. These
|
||
|
|
form the "description" of the Redfish message.
|
||
|
|
message:
|
||
|
|
type: string
|
||
|
|
description:
|
||
|
|
The end-user message, including placeholders for arguemnts.
|
||
|
|
resolution:
|
||
|
|
type: string
|
||
|
|
description: The end-user resolution.
|
||
|
|
severity:
|
||
|
|
enum:
|
||
|
|
- emergency
|
||
|
|
- alert
|
||
|
|
- critical
|
||
|
|
- error
|
||
|
|
- warning
|
||
|
|
- notice
|
||
|
|
- informational
|
||
|
|
- debug
|
||
|
|
description:
|
||
|
|
The `xyz.openbmc_project.Logging.Entry.Level` value for this
|
||
|
|
error. Only applicable for 'errors'.
|
||
|
|
redfish-mapping:
|
||
|
|
type: string
|
||
|
|
description:
|
||
|
|
Used when a `sdbusplus` event should map to a specific Redfish
|
||
|
|
Message rather than a generated one. This is useful when an internal
|
||
|
|
error has an analog in a standardized registry.
|
||
|
|
deprecated:
|
||
|
|
type: string
|
||
|
|
pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$"
|
||
|
|
description:
|
||
|
|
Indicates that the event is now deprecated and should not be created
|
||
|
|
by any OpenBMC software, but is required to still exist for
|
||
|
|
generation in the Redfish Message Registry. The version listed here
|
||
|
|
should be the first version where the error is no longer used.
|
||
|
|
metadata:
|
||
|
|
type: array
|
||
|
|
items:
|
||
|
|
type: object
|
||
|
|
properties:
|
||
|
|
name:
|
||
|
|
type: string
|
||
|
|
description: The name of the metadata field.
|
||
|
|
type:
|
||
|
|
enum:
|
||
|
|
- string
|
||
|
|
- size
|
||
|
|
- int64
|
||
|
|
- uint64
|
||
|
|
- double
|
||
|
|
- object_path
|
||
|
|
description: The type of the metadata field.
|
||
|
|
primary:
|
||
|
|
type: boolean
|
||
|
|
description:
|
||
|
|
Set to true when the metadata field is expected to be part of
|
||
|
|
the Redfish `MessageArgs` (and not only in the extended
|
||
|
|
`DiagnosticData`).
|
||
|
|
properties:
|
||
|
|
version:
|
||
|
|
type: string
|
||
|
|
pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$"
|
||
|
|
description:
|
||
|
|
The version of the file, which will be used as the Redfish Message
|
||
|
|
Registry version.
|
||
|
|
errors:
|
||
|
|
$ref: "#/definitions/event"
|
||
|
|
events:
|
||
|
|
$ref: ":#/definitions/event"
|
||
|
|
```
|
||
|
|
|
||
|
|
The above example YAML would generate C++ classes similar to:
|
||
|
|
|
||
|
|
```cpp
|
||
|
|
namespace sdbusplus::errors::xyz::openbmc_project::software::update
|
||
|
|
{
|
||
|
|
|
||
|
|
class UpdateFailure
|
||
|
|
{
|
||
|
|
|
||
|
|
template <typename... Args>
|
||
|
|
UpdateFailure(Args&&... args);
|
||
|
|
};
|
||
|
|
|
||
|
|
}
|
||
|
|
|
||
|
|
namespace sdbusplus::events::xyz::openbmc_project::software::update
|
||
|
|
{
|
||
|
|
|
||
|
|
class UpdateProgress
|
||
|
|
{
|
||
|
|
template <typename... Args>
|
||
|
|
UpdateProgress(Args&&... args);
|
||
|
|
};
|
||
|
|
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
The constructors here are variadic templates because the generated constructor
|
||
|
|
implementation will provide compile-time assurance that all of the metadata
|
||
|
|
fields have been populated (in any order). To raise an `UpdateFailure` a
|
||
|
|
developers might do something like:
|
||
|
|
|
||
|
|
```cpp
|
||
|
|
// Immediately report the event:
|
||
|
|
lg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path));
|
||
|
|
// or send it in a dbus response (when using sdbusplus generated binding):
|
||
|
|
throw UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path);
|
||
|
|
```
|
||
|
|
|
||
|
|
If one of the fields, such as `ERRNO` were omitted, a compile failure will be
|
||
|
|
raised indicating the first missing field.
|
||
|
|
|
||
|
|
### Versioning Policy
|
||
|
|
|
||
|
|
Assume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention.
|
||
|
|
|
||
|
|
- Adjusting a description or message should result in a `PATCH` increment.
|
||
|
|
- Adding a new error or event, or adding metadata to an existing error or event,
|
||
|
|
should result in a `MINOR` increment.
|
||
|
|
- Deprecating an error or event should result in a `MAJOR` increment.
|
||
|
|
|
||
|
|
There is [guidance on maintenance][registry-guidance] of the OpenBMC Message
|
||
|
|
Registry. We will incorporate that guidance into the equivalent
|
||
|
|
`phosphor-dbus-interfaces` policy.
|
||
|
|
|
||
|
|
[registry-guidance]:
|
||
|
|
https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md
|
||
|
|
|
||
|
|
### Generated Redfish Message Registry
|
||
|
|
|
||
|
|
[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish
|
||
|
|
Message Registries and dictates guidelines for identifiers.
|
||
|
|
|
||
|
|
The hypothetical events defined above would create a message registry similar
|
||
|
|
to:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1",
|
||
|
|
"Language": "en",
|
||
|
|
"Messages": {
|
||
|
|
"UpdateFailure": {
|
||
|
|
"Description": "While updating the firmware on a device, the update failed.",
|
||
|
|
"Message": "A failure occurred updating %1 on %2.",
|
||
|
|
"Resolution": "Retry update."
|
||
|
|
"NumberOfArgs": 2,
|
||
|
|
"ParamTypes": ["string", "string"],
|
||
|
|
"Severity": "Critical",
|
||
|
|
},
|
||
|
|
"UpdateProgress" : {
|
||
|
|
"Description": "An update is in progress and has reached a checkpoint."
|
||
|
|
"Message": "Updating of %1 is %2\% complete.",
|
||
|
|
"Resolution": "None",
|
||
|
|
"NumberOfArgs": 2,
|
||
|
|
"ParamTypes": ["string", "number"],
|
||
|
|
"Severity": "OK",
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
The prefix `OpenBMC_Base` shall be exclusively reserved for use by events from
|
||
|
|
`phosphor-logging`. Events defined in other repositories will be expected to use
|
||
|
|
some other prefix. Vendor-defined repositories should use a vendor-owned prefix
|
||
|
|
as directed by [DSP0266][dsp0266].
|
||
|
|
|
||
|
|
[dsp0266]:
|
||
|
|
https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf
|
||
|
|
|
||
|
|
### Vendor implications
|
||
|
|
|
||
|
|
As specified above, vendors must use their own identifiers in order to conform
|
||
|
|
with the Redfish specification (see [DSP0266][dsp0266] for requirements on
|
||
|
|
identifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`)
|
||
|
|
implementation(s) will enable vendors to create their own events for downstream
|
||
|
|
code and Registries for integration with Redfish, by creating downstream
|
||
|
|
repositories of error definitions. Vendors are responsible for ensuring their
|
||
|
|
own versioning and identifiers conform to the expectations in the [Redfish
|
||
|
|
specification][dsp0266].
|
||
|
|
|
||
|
|
One potential bad behavior on the part of vendors would be forking and modifying
|
||
|
|
`phosphor-dbus-interfaces` defined events. Vendors must not add their own events
|
||
|
|
to `phosphor-dbus-interfaces` in downstream implementations because it would
|
||
|
|
lead to their implementation advertising support for a message in an
|
||
|
|
OpenBMC-owned Registry which is not the case, but they should add them to their
|
||
|
|
own repositories with a separate identifier. Similarly, if a vendor were to
|
||
|
|
_backport_ upstream changes into their fork, they would need to ensure that the
|
||
|
|
`foo.events.yaml` file for that version matches identically with the upstream
|
||
|
|
implementation.
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
Many alternatives have been explored and referenced through earlier work. Within
|
||
|
|
this proposal there are many minor-alternatives that have been assessed.
|
||
|
|
|
||
|
|
### Exception inheritance
|
||
|
|
|
||
|
|
The original `phosphor-logging` error descriptions allowed inheritance between
|
||
|
|
two errors. This is not supported by the proposal for two reasons:
|
||
|
|
|
||
|
|
- This introduces complexity in the Redfish Message Registry versioning because
|
||
|
|
a change in one file should induce version changes in all dependent files.
|
||
|
|
|
||
|
|
- It makes it difficult for a developer to clearly identify all of the fields
|
||
|
|
they are expected to populate without traversing multiple files.
|
||
|
|
|
||
|
|
### sdbusplus Exception APIs
|
||
|
|
|
||
|
|
There are a few possible syntaxes I came up with for constructing the generated
|
||
|
|
exception types. It is important that these have good ergonomics, are easy to
|
||
|
|
understand, and can provide compile-time awareness of missing metadata fields.
|
||
|
|
|
||
|
|
```cpp
|
||
|
|
using Example = sdbusplus::error::xyz::openbmc_project::Example;
|
||
|
|
|
||
|
|
// 1)
|
||
|
|
throw Example().fru("Motherboard").value(42);
|
||
|
|
|
||
|
|
// 2)
|
||
|
|
throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42);
|
||
|
|
|
||
|
|
// 3)
|
||
|
|
throw Example("FRU", "Motherboard", "VALUE", 42);
|
||
|
|
|
||
|
|
// 4)
|
||
|
|
throw Example([](auto e) { return e.fru("Motherboard").value(42); });
|
||
|
|
|
||
|
|
// 5)
|
||
|
|
throw Example({.fru = "Motherboard", .value = 42});
|
||
|
|
```
|
||
|
|
|
||
|
|
**Note**: These examples are all show using `throw` syntax, but could also be
|
||
|
|
saved in local variables, returned from functions, or immediately passed to
|
||
|
|
`lg2::commit`.
|
||
|
|
|
||
|
|
1. This would be my preference for ergonomics and clarity, as it would allow
|
||
|
|
LSP-enabled editors to give completions for the metadata fields but
|
||
|
|
unfortunately there is no mechanism in C++ to define a type which can be
|
||
|
|
constructed but not thrown, which means we cannot get compile-time checking
|
||
|
|
of all metadata fields.
|
||
|
|
|
||
|
|
2. This syntax uses tag-dispatch to enables compile-time checking of all
|
||
|
|
metadata fields and potential LSP-completion of the tag-types, but is more
|
||
|
|
verbose than option 3.
|
||
|
|
|
||
|
|
3. This syntax is less verbose than (2) and follows conventions already used in
|
||
|
|
`phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the
|
||
|
|
metadata tags.
|
||
|
|
|
||
|
|
4. This syntax is similar to option (1) but uses an indirection of a lambda to
|
||
|
|
enable compile-time checking that all metadata fields have been populated by
|
||
|
|
the lambda. The LSP-completion is likely not as strong as option (1), due to
|
||
|
|
the use of `auto`, and the lambda necessity will likely be a hang-up for
|
||
|
|
unfamiliar developers.
|
||
|
|
|
||
|
|
5. This syntax has similar characteristics as option (1) but similarly does not
|
||
|
|
provide compile-time confirmation that all fields have been populated.
|
||
|
|
|
||
|
|
The proposal therefore suggests option (3) is most suitable.
|
||
|
|
|
||
|
|
### Redfish Translation Support
|
||
|
|
|
||
|
|
The proposed YAML format allows future addition of translation but it is not
|
||
|
|
enabled at this time. Future development could enable the Redfish Message
|
||
|
|
Registry to be generated in multiple languages if the `message:language` exists
|
||
|
|
for those languages.
|
||
|
|
|
||
|
|
### Redfish Registry Versioning
|
||
|
|
|
||
|
|
The Redfish Message Registries are required to be versioned and has 3 digit
|
||
|
|
fields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the
|
||
|
|
Message ID. Rather than using the manually specified version we could take a few
|
||
|
|
other approaches:
|
||
|
|
|
||
|
|
- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the
|
||
|
|
registry was built.
|
||
|
|
|
||
|
|
- This does not cover vendors that may choose to branch for stabilization
|
||
|
|
purposes, so we can end up with two machines having the same
|
||
|
|
OpenBMC-versioned message registry with different content.
|
||
|
|
|
||
|
|
- Use the most recent `openbmc/openbmc` tag as the version.
|
||
|
|
|
||
|
|
- This does not cover vendors that build off HEAD and may deploy multiple
|
||
|
|
images between two OpenBMC releases.
|
||
|
|
|
||
|
|
- Generate the version based on the git-history.
|
||
|
|
|
||
|
|
- This requires `phosphor-dbus-interfaces` to be built from a git repository,
|
||
|
|
which may not always be true for Yocto source mirrors, and requires
|
||
|
|
non-trivial processing that continues to scale over time.
|
||
|
|
|
||
|
|
### Existing OpenBMC Redfish Registry
|
||
|
|
|
||
|
|
There are currently 191 messages defined in the existing Redfish Message
|
||
|
|
Registry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase
|
||
|
|
is emitted with the correct version. 96 of those are only emitted by
|
||
|
|
Intel-specific code that is not pulled into any upstreamed machine, 39 are
|
||
|
|
emitted by potentially common code, and 56 are not even referenced in the
|
||
|
|
codebase outside of the bmcweb registry. Of the 39 common messages half of them
|
||
|
|
have an equivalent in one of the standard registries that should be leveraged
|
||
|
|
and many of the others do not have attributes that would facilitate a multi-host
|
||
|
|
configuration, so the registry at a minimum needs to be updated. None of the
|
||
|
|
current implementation has the capability to handle Redfish Resource URIs.
|
||
|
|
|
||
|
|
The proposal therefore is to deprecate the existing registry and replace it with
|
||
|
|
the new generated registries. For repositories that currently emit events in the
|
||
|
|
existing format, we can maintain those call-sites for a time period of 1-2
|
||
|
|
years.
|
||
|
|
|
||
|
|
If this aspect of the proposal is rejected, the YAML format allows mapping from
|
||
|
|
`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0`
|
||
|
|
registry `MessageIds`.
|
||
|
|
|
||
|
|
Potentially common:
|
||
|
|
|
||
|
|
- phosphor-post-code-manager
|
||
|
|
- BIOSPOSTCode (unique)
|
||
|
|
- dbus-sensors
|
||
|
|
- ChassisIntrusionDetected (unique)
|
||
|
|
- ChassisIntrusionReset (unique)
|
||
|
|
- FanInserted
|
||
|
|
- FanRedundancyLost (unique)
|
||
|
|
- FanRedudancyRegained (unique)
|
||
|
|
- FanRemoved
|
||
|
|
- LanLost
|
||
|
|
- LanRegained
|
||
|
|
- PowerSupplyConfigurationError (unique)
|
||
|
|
- PowerSupplyConfigurationErrorRecovered (unique)
|
||
|
|
- PowerSupplyFailed
|
||
|
|
- PowerSupplyFailurePredicted (unique)
|
||
|
|
- PowerSupplyFanFailed
|
||
|
|
- PowerSupplyFanRecovered
|
||
|
|
- PowerSupplyPowerLost
|
||
|
|
- PowerSupplyPowerRestored
|
||
|
|
- PowerSupplyPredictiedFailureRecovered (unique)
|
||
|
|
- PowerSupplyRecovered
|
||
|
|
- phosphor-sel-logger
|
||
|
|
- IPMIWatchdog (unique)
|
||
|
|
- `SensorThreshold*` : 8 different events
|
||
|
|
- phosphor-net-ipmid
|
||
|
|
- InvalidLoginAttempted (unique)
|
||
|
|
- entity-manager
|
||
|
|
- InventoryAdded (unique)
|
||
|
|
- InventoryRemoved (unique)
|
||
|
|
- estoraged
|
||
|
|
- ServiceStarted
|
||
|
|
- x86-power-control
|
||
|
|
- NMIButtonPressed (unique)
|
||
|
|
- NMIDiagnosticInterrupt (unique)
|
||
|
|
- PowerButtonPressed (unique)
|
||
|
|
- PowerRestorePolicyApplied (unique)
|
||
|
|
- PowerSupplyPowerGoodFailed (unique)
|
||
|
|
- ResetButtonPressed (unique)
|
||
|
|
- SystemPowerGoodFailed (unique)
|
||
|
|
|
||
|
|
Intel-only implementations:
|
||
|
|
|
||
|
|
- intel-ipmi-oem
|
||
|
|
- ADDDCCorrectable
|
||
|
|
- BIOSPostERROR
|
||
|
|
- BIOSRecoveryComplete
|
||
|
|
- BIOSRecoveryStart
|
||
|
|
- FirmwareUpdateCompleted
|
||
|
|
- IntelUPILinkWidthReducedToHalf
|
||
|
|
- IntelUPILinkWidthReducedToQuarter
|
||
|
|
- LegacyPCIPERR
|
||
|
|
- LegacyPCISERR
|
||
|
|
- `ME*` : 29 different events
|
||
|
|
- `Memory*` : 9 different events
|
||
|
|
- MirroringRedundancyDegraded
|
||
|
|
- MirroringRedundancyFull
|
||
|
|
- `PCIeCorrectable*`, `PCIeFatal` : 29 different events
|
||
|
|
- SELEntryAdded
|
||
|
|
- SparingRedundancyDegraded
|
||
|
|
- pfr-manager
|
||
|
|
- BIOSFirmwareRecoveryReason
|
||
|
|
- BIOSFirmwarePanicReason
|
||
|
|
- BMCFirmwarePanicReason
|
||
|
|
- BMCFirmwareRecoveryReason
|
||
|
|
- BMCFirmwareResiliencyError
|
||
|
|
- CPLDFirmwarePanicReason
|
||
|
|
- CPLDFirmwareResilencyError
|
||
|
|
- FirmwareResiliencyError
|
||
|
|
- host-error-monitor
|
||
|
|
- CPUError
|
||
|
|
- CPUMismatch
|
||
|
|
- CPUThermalTrip
|
||
|
|
- ComponentOverTemperature
|
||
|
|
- SsbThermalTrip
|
||
|
|
- VoltageRegulatorOverheated
|
||
|
|
- s2600wf-misc
|
||
|
|
- DriveError
|
||
|
|
- InventoryAdded
|
||
|
|
|
||
|
|
## Impacts
|
||
|
|
|
||
|
|
- New APIs are defined for error and event logging. This will deprecate existing
|
||
|
|
`phosphor-logging` APIs, with a time to migrate, for error reporting.
|
||
|
|
|
||
|
|
- The design should improve performance by eliminating the regular parsing of
|
||
|
|
the `systemd` journal. The design may decrease performance by allowing the
|
||
|
|
number of error and event logs to be dramatically increased, which have an
|
||
|
|
impact to file system utilization and potential for DBus impacts some services
|
||
|
|
such as `ObjectMapper`.
|
||
|
|
|
||
|
|
- Backwards compatibility and documentation should be improved by the automatic
|
||
|
|
generation of the Redfish Message Registry corresponding to all error and
|
||
|
|
event reports.
|
||
|
|
|
||
|
|
### Organizational
|
||
|
|
|
||
|
|
- **Does this repository require a new repository?**
|
||
|
|
- No
|
||
|
|
- **Who will be the initial maintainer(s) of this repository?**
|
||
|
|
- N/A
|
||
|
|
- **Which repositories are expected to be modified to execute this design?**
|
||
|
|
- `sdbusplus`
|
||
|
|
- `phosphor-dbus-interfaces`
|
||
|
|
- `phosphor-logging`
|
||
|
|
- `bmcweb`
|
||
|
|
- Any repository creating an error or event.
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error
|
||
|
|
and event generation, creation APIs, and to provide coverage on any changes to
|
||
|
|
the `Logging.Entry` object management.
|
||
|
|
|
||
|
|
- Unit tests will be written for `bmcweb` for basic `Logging.Entry`
|
||
|
|
transformation and Message Registry generation.
|
||
|
|
|
||
|
|
- Integration tests should be leveraged (and enhanced as necessary) from
|
||
|
|
`openbmc-test-automation` to cover the end-to-end error creation and Redfish
|
||
|
|
reporting.
|