openbmc_docs/designs/code-update.md

364 lines
18 KiB
Markdown
Raw Normal View History

2024-12-23 14:53:31 +08:00
# Code Update Design
Author: Jagpal Singh Gill <paligill@gmail.com>
Created: 4th August 2023
## Problem Description
This section covers the limitations discoverd with
[phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
1. Current code update flow is complex as it involves 3 different daemons -
Image Manager, Image Updater and Update Service.
2. Update invocation flow has no explicit interface but rather depends upon the
discovery of a new file in /tmp/images by Image Manager.
3. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which
requires write access to filesystem. This poses a security risk.
4. Current design doesn't support parallel upgrades for different firmware
([Issue](https://github.com/openbmc/bmcweb/issues/257)).
## Background and References
- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software)
- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update)
## Requirements
1. Able to start an update, given a firmware image and update settings.
- Update settings shall be able to specify when to apply the image, for example
immediately or on device reset or on-demand.
2. Able to retrieve the update progress and status.
3. Able to produce an interface complaint with
[Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json)
4. Unprivileged daemons with access to DBus should be able to accept and perform
a firmware update.
5. Update request shall respond back immediately, so client can query the status
while update is in progress.
6. All errors shall propagate back to the client.
7. Able to support update for different type of hardware components such as
CPLD, NIC, BIOS, BIC, PCIe switches, etc.
8. Design shall impose no restriction to choose any specific image format.
9. Able to update multiple hardware components of same type running different
firmware images, for example, two instances of CPLDx residing on the board,
one performing functionX and other performing functionY and hence running
different firmware images.
10. Able to update multiple components in parallel.
11. Able to restrict critical system actions, such as reboot for entity under
update while the code update is in flight.
## Proposed Design
### Proposed End to End Flow
```mermaid
sequenceDiagram;
participant CL as Client
participant BMCW as BMCWeb
participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX>
% Bootstrap Action for CodeUpdater
note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path
note over CU: Swid = <DeviceX>_<RandomId>
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<SwId>
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<SwId>
CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<SwId> <br> with Status = Active
CU ->> CU: Create functional association <br> from Version to Inventory Item
CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray)
loop For every RedfishTargetURI
note over BMCW: Map RedfishTargetURI /redfish/v1/UpdateService/FirmwareInventory/<SwId> to<br> Object path /xyz/openbmc_project/software/<SwId>
note over BMCW: Get serviceName corresponding to the object path <br>from mapper.
BMCW ->> CU: StartUpdate(Image, ApplyTime)
note over CU: Swid = <DeviceX>_<RandomId>
note over CU: ObjectPath = /xyz/openbmc_project/Software/<SwId>
CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = NotReady
CU -->> BMCW: {ObjectPath, Success}
CU ->> CU: << Delegate Update for asynchronous processing >>
par BMCWeb Processing
BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath)
BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath)
BMCW ->> BMCW: Create Task<br> to handle matcher notifications
BMCW -->> CL: <TaskNum>
loop
BMCW --) BMCW: Process notifications<br> and update Task attributes
CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum>
BMCW -->>CL: TaskStatus
end
and << Asynchronous Update in Progress >>
note over CU: Verify Image
break Image Verification FAILED
CU ->> CU: Activation.Status = Invalid
CU --) BMCW: Notify Activation.Status change
end
CU ->> CU: Activation.Status = Ready
CU --) BMCW: Notify Activation.Status change
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath
CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath
CU ->> CU: Activation.Status = Activating
CU --) BMCW: Notify Activation.Status change
note over CU: Start Update
loop
CU --) BMCW: Notify ActivationProgress.Progress change
end
note over CU: Finish Update
CU ->> CU: Activation.Status = Active
CU --) BMCW: Notify Activation.Status change
CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition
CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress
alt ApplyTime == Immediate
note over CU: Reset Device and<br> update functional association to System Inventory Item
else
note over CU: Create active association to System Inventory Item
end
end
end
```
- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per
above flow) handling its update process and would need to implement the
proposed interfaces in next section. This satisfies the
[Requirement# 6](#requirements).
- Since, there would be single daemon handling the update (as compared to
three), less hand shaking would be involved and hence addresses the
[Issue# 1](#problem-description) and [Requirement# 4](#requirements).
### Proposed D-Bus Interface
The DBus Interface for code update will consist of following -
| Interface Name | Existing/New | Purpose |
| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: |
| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738) | New | Provides update method |
| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml) | Existing | Provides version info |
| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml) | Existing | Provides activation status |
| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml) | Existing | Provides activation progress percentage |
| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) | Existing | Signifies barrier for state transitions while update is in progress |
| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml) | Existing | Provides the redundancy priority for the version interface |
Introduction of xyz.openbmc_project.Software.Update interface streamlines the
update invocation flow and hence addresses the [Issue# 2](#problem-description)
and [Requirement# 1 & 2](#requirements).
#### Association
`running` : A `running` association from xyz.openbmc_project.Inventory.Item to
xyz.openbmc_project.Software.Version represents the current functional or
running software version for the associated inventory item. The `ran_on` would
be the corresponding reverse association.
`activating` : An `activating` association from
xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version
represents the activated (but not yet run) software version for the associated
inventory item. There could be more than one active versions for an inventory
item, for example, in case of A/B redundancy models there are 2 associated
flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface
defines the priority for each one.
For A/B redundancy model with staging support, the
xyz.openbmc_project.Software.Activation.Activations.Staged will help to define
which software version is currently staged.
The `activated_on` would be the corresponding reverse association.
### Keep images in memory
Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file
descriptor rather than file path. Implementation needs to monitor appropriate
memory limits to prevent parallel updates from running BMC out of memory.
### Propagate errors to client
xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any
errors related to initial setup and image metadata/header parsing back to user.
Any asynchronous errors which happen during the update process will be notified
via failed activation status which maps to failed task associated with the
update. Also, a phosphor-logging event will be created and sent back to client
via
[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json).
Another alternative could be to use
[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json).
### Firmware Image Format
Image parsing will be performed in \<deviceX>CodeUpdater and since
\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any
image format for the firmware image. This fulfills the
[Requirement# 7](#requirements).
### Multi part Images
A multi part image has multiple component images as part of one image package.
PLDM image is one such example of multi part image format. Sometimes, for multi
part devices there is no concrete physical firmware device but firmware device
itself consists of multiple phsyical components, each of which may have its own
component image. In such a scenario, \<deviceX>CodeUpdater can create a logical
inventory item for the firmware device. While performing the firmware device
update, the client may target the logical firmware device which further knows
how to update the corresponding child components for supplied component images.
The user can also update the specific component by providing the image package
with component as head node. The \<deviceX>CodeUpdater can implement the
required logic to verify if the supplied image is targeted for itself (and child
components) or not.
### Update multiple devices of same type
- For same type devices, extend the Dbus path to specify device instance, for
example, /xyz/openbmc_project/Software/\<deviceX>\_\<InstanceNum>\_\<SwId>.
All the corresponding interfaces can reside on this path and same path will be
returned from xyz.openbmc_project.Software.Update.StartUpdate.
This fulfills the [Requirement# 9](#requirements).
### Parallel Upgrade
- Different type hardware components:
Upgrade for different type hardware components can be handled either by
different <deviceX>CodeUpdater daemons or by a single daemon for hardware
components with common features, for example, PLDMd may handle update for
devices using PLDM specification. Such updates can be invoked in parallel from
BMCWeb and tracked via different tasks.
- Similar type hardware component:
BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on
different D-Bus paths pertaining to each hardware instance. For more details
on D-Bus paths refer to
[Update multiple devices of same type](#update-multiple-devices-of-same-type).
This fulfills the [Requirement# 9](#requirements).
### Uninterrupted Updates
`ActivationBlocksTransitions` interface will be created on the specific D-Bus
path for a version update which will help to block any interruptions from
critical system actions such as reboots. This interface can in turn start and
stop services such as Boot Guard Service to prevent such interruptions.
Moreover, when a device is being upgraded the sensor scanning for that device
might need to be disabled. To achieve this, the sensor scanning flow can check
for existence of `ActivationBlocksTransitions` interface on associated `Version`
DBus path for the inventory item. If such interface exists, the sensor scanning
for that device can be skipped by returning back relevant error (such as
`EBUSY`) to the client. Another alternative is to check for existence of
`ActivationBlocksTransitions` interface only if sensor scanning times out. This
won't impact average case performance for sensor scanning but only the worst
case scenario when device is busy, for example, due to update in progress.
## Alternatives Considered
### Centralized Design with Global Software Manager
Single SoftwareManager which communicates with the BCMWeb, hosts all the
interfaces such as Version, Activation, Progress for all hardware components
within the system on different DBus paths. Software Manager keeps list of
various hardware update services within the system and start them based on
update request. These on-demand services update the hardware and interfaces
hosted by Software Manager and exits.
#### Pros
- Most of the DBus interfaces gets implemented by Software Manager and vendors
would need to write minimal code to change properties for these interfaces
based on status and progress.
- Under normal operating conditions (no update in flight), only Software Manager
will be running.
#### Cons
- Imposes the need of a common image format as Software Manager needs to parse
and verify the image for creating interfaces.
- Limitation in the design, as there is a need to get the current running
version from the hardware at system bring up. So, Software Manager would need
to start each update daemon at system startup to get the running version.
### Pull model for Status and Progress
The proposed solution uses a push model where status and progress updates are
asynchronously pushed to BMCWeb. Another alternative would be to use a pull
model where Update interface can have get methods for status and progress (for
example, getActivationStatus and getActivationProgress).
#### Pros
- Server doesn't have to maintain a Dbus matcher
([Issue](https://github.com/openbmc/bmcweb/issues/202)).
- Easier implementation in Server as no asynchronous handlers would be required.
#### Cons
- Server would still need maintain some info so it can map client's task status
request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling
getActivationStatus and getActivationProgress.
- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more
of an implementation problem which can be resolved through implementation
changes.
- Currently, activation and progress interfaces are being used in
[lot of Servers](#organizational). In future, harmonizing the flow to single
one will involve changing the push to pull model in all those places. With the
current proposal, the only change will be in update invocation flow.
## Impacts
The introduction of new DBus API will temporarily create two invocation flows
from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code
stacks. As all the code update daemons gets moved to the new flow, Servers would
be changed to only support new API stack. No user-api impact as design adheres
to Redfish UpdateService.
## Organizational
### Does this design require a new repository?
Yes. There will be a device transport level repositories and multiple
\<deviceX>CodeUpdater using similar transport layer can reside in same
repository. For example, all devices using PMBus could have a common repository.
### Who will be the initial maintainer(s) of this repository?
Meta will propose repositories for following devices and `Jagpal Singh Gill` &
`Patrick Williams` will be the maintainer for them.
- VR Update
- CPLD Update
### Which repositories are expected to be modified to execute this design?
Requires changes in following repositories to incorporate the new interface for
update invocation -
| Repository | Modification Owner |
| :------------------------------------------------------------------------------ | :----------------- |
| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) | Jagpal Singh Gill |
| [BMCWeb](https://github.com/openbmc/bmcweb) | Jagpal Singh Gill |
| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid) | Jagpal Singh Gill |
| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update) | Jagpal Singh Gill |
| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak |
| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation) | Adriana Kobylak |
NOTE: For
[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code
seems unused, so not tracking for change.
## Testing
### Unit Testing
All the functional testing of the reference implementation will be performed
using GTest.
### Integration Testing
The end to end integration testing involving Servers (for example BMCWeb) will
be covered using openbmc-test-automation.