364 lines
18 KiB
Markdown
364 lines
18 KiB
Markdown
|
|
# Code Update Design
|
||
|
|
|
||
|
|
Author: Jagpal Singh Gill <paligill@gmail.com>
|
||
|
|
|
||
|
|
Created: 4th August 2023
|
||
|
|
|
||
|
|
## Problem Description
|
||
|
|
|
||
|
|
This section covers the limitations discoverd with
|
||
|
|
[phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
|
||
|
|
|
||
|
|
1. Current code update flow is complex as it involves 3 different daemons -
|
||
|
|
Image Manager, Image Updater and Update Service.
|
||
|
|
2. Update invocation flow has no explicit interface but rather depends upon the
|
||
|
|
discovery of a new file in /tmp/images by Image Manager.
|
||
|
|
3. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which
|
||
|
|
requires write access to filesystem. This poses a security risk.
|
||
|
|
4. Current design doesn't support parallel upgrades for different firmware
|
||
|
|
([Issue](https://github.com/openbmc/bmcweb/issues/257)).
|
||
|
|
|
||
|
|
## Background and References
|
||
|
|
|
||
|
|
- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
|
||
|
|
- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software)
|
||
|
|
- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update)
|
||
|
|
|
||
|
|
## Requirements
|
||
|
|
|
||
|
|
1. Able to start an update, given a firmware image and update settings.
|
||
|
|
|
||
|
|
- Update settings shall be able to specify when to apply the image, for example
|
||
|
|
immediately or on device reset or on-demand.
|
||
|
|
|
||
|
|
2. Able to retrieve the update progress and status.
|
||
|
|
3. Able to produce an interface complaint with
|
||
|
|
[Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json)
|
||
|
|
4. Unprivileged daemons with access to DBus should be able to accept and perform
|
||
|
|
a firmware update.
|
||
|
|
5. Update request shall respond back immediately, so client can query the status
|
||
|
|
while update is in progress.
|
||
|
|
6. All errors shall propagate back to the client.
|
||
|
|
7. Able to support update for different type of hardware components such as
|
||
|
|
CPLD, NIC, BIOS, BIC, PCIe switches, etc.
|
||
|
|
8. Design shall impose no restriction to choose any specific image format.
|
||
|
|
9. Able to update multiple hardware components of same type running different
|
||
|
|
firmware images, for example, two instances of CPLDx residing on the board,
|
||
|
|
one performing functionX and other performing functionY and hence running
|
||
|
|
different firmware images.
|
||
|
|
10. Able to update multiple components in parallel.
|
||
|
|
11. Able to restrict critical system actions, such as reboot for entity under
|
||
|
|
update while the code update is in flight.
|
||
|
|
|
||
|
|
## Proposed Design
|
||
|
|
|
||
|
|
### Proposed End to End Flow
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
sequenceDiagram;
|
||
|
|
participant CL as Client
|
||
|
|
participant BMCW as BMCWeb
|
||
|
|
participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX>
|
||
|
|
|
||
|
|
% Bootstrap Action for CodeUpdater
|
||
|
|
note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path
|
||
|
|
note over CU: Swid = <DeviceX>_<RandomId>
|
||
|
|
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<SwId>
|
||
|
|
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<SwId>
|
||
|
|
CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<SwId> <br> with Status = Active
|
||
|
|
CU ->> CU: Create functional association <br> from Version to Inventory Item
|
||
|
|
|
||
|
|
CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray)
|
||
|
|
|
||
|
|
loop For every RedfishTargetURI
|
||
|
|
note over BMCW: Map RedfishTargetURI /redfish/v1/UpdateService/FirmwareInventory/<SwId> to<br> Object path /xyz/openbmc_project/software/<SwId>
|
||
|
|
note over BMCW: Get serviceName corresponding to the object path <br>from mapper.
|
||
|
|
BMCW ->> CU: StartUpdate(Image, ApplyTime)
|
||
|
|
|
||
|
|
note over CU: Swid = <DeviceX>_<RandomId>
|
||
|
|
note over CU: ObjectPath = /xyz/openbmc_project/Software/<SwId>
|
||
|
|
CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = NotReady
|
||
|
|
CU -->> BMCW: {ObjectPath, Success}
|
||
|
|
CU ->> CU: << Delegate Update for asynchronous processing >>
|
||
|
|
|
||
|
|
par BMCWeb Processing
|
||
|
|
BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath)
|
||
|
|
BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath)
|
||
|
|
BMCW ->> BMCW: Create Task<br> to handle matcher notifications
|
||
|
|
BMCW -->> CL: <TaskNum>
|
||
|
|
loop
|
||
|
|
BMCW --) BMCW: Process notifications<br> and update Task attributes
|
||
|
|
CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum>
|
||
|
|
BMCW -->>CL: TaskStatus
|
||
|
|
end
|
||
|
|
and << Asynchronous Update in Progress >>
|
||
|
|
note over CU: Verify Image
|
||
|
|
break Image Verification FAILED
|
||
|
|
CU ->> CU: Activation.Status = Invalid
|
||
|
|
CU --) BMCW: Notify Activation.Status change
|
||
|
|
end
|
||
|
|
CU ->> CU: Activation.Status = Ready
|
||
|
|
CU --) BMCW: Notify Activation.Status change
|
||
|
|
|
||
|
|
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath
|
||
|
|
CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath
|
||
|
|
CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath
|
||
|
|
CU ->> CU: Activation.Status = Activating
|
||
|
|
CU --) BMCW: Notify Activation.Status change
|
||
|
|
note over CU: Start Update
|
||
|
|
loop
|
||
|
|
CU --) BMCW: Notify ActivationProgress.Progress change
|
||
|
|
end
|
||
|
|
note over CU: Finish Update
|
||
|
|
CU ->> CU: Activation.Status = Active
|
||
|
|
CU --) BMCW: Notify Activation.Status change
|
||
|
|
CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition
|
||
|
|
CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress
|
||
|
|
alt ApplyTime == Immediate
|
||
|
|
note over CU: Reset Device and<br> update functional association to System Inventory Item
|
||
|
|
else
|
||
|
|
note over CU: Create active association to System Inventory Item
|
||
|
|
end
|
||
|
|
end
|
||
|
|
end
|
||
|
|
```
|
||
|
|
|
||
|
|
- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per
|
||
|
|
above flow) handling its update process and would need to implement the
|
||
|
|
proposed interfaces in next section. This satisfies the
|
||
|
|
[Requirement# 6](#requirements).
|
||
|
|
- Since, there would be single daemon handling the update (as compared to
|
||
|
|
three), less hand shaking would be involved and hence addresses the
|
||
|
|
[Issue# 1](#problem-description) and [Requirement# 4](#requirements).
|
||
|
|
|
||
|
|
### Proposed D-Bus Interface
|
||
|
|
|
||
|
|
The DBus Interface for code update will consist of following -
|
||
|
|
|
||
|
|
| Interface Name | Existing/New | Purpose |
|
||
|
|
| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: |
|
||
|
|
| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738) | New | Provides update method |
|
||
|
|
| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml) | Existing | Provides version info |
|
||
|
|
| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml) | Existing | Provides activation status |
|
||
|
|
| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml) | Existing | Provides activation progress percentage |
|
||
|
|
| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) | Existing | Signifies barrier for state transitions while update is in progress |
|
||
|
|
| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml) | Existing | Provides the redundancy priority for the version interface |
|
||
|
|
|
||
|
|
Introduction of xyz.openbmc_project.Software.Update interface streamlines the
|
||
|
|
update invocation flow and hence addresses the [Issue# 2](#problem-description)
|
||
|
|
and [Requirement# 1 & 2](#requirements).
|
||
|
|
|
||
|
|
#### Association
|
||
|
|
|
||
|
|
`running` : A `running` association from xyz.openbmc_project.Inventory.Item to
|
||
|
|
xyz.openbmc_project.Software.Version represents the current functional or
|
||
|
|
running software version for the associated inventory item. The `ran_on` would
|
||
|
|
be the corresponding reverse association.
|
||
|
|
|
||
|
|
`activating` : An `activating` association from
|
||
|
|
xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version
|
||
|
|
represents the activated (but not yet run) software version for the associated
|
||
|
|
inventory item. There could be more than one active versions for an inventory
|
||
|
|
item, for example, in case of A/B redundancy models there are 2 associated
|
||
|
|
flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface
|
||
|
|
defines the priority for each one.
|
||
|
|
|
||
|
|
For A/B redundancy model with staging support, the
|
||
|
|
xyz.openbmc_project.Software.Activation.Activations.Staged will help to define
|
||
|
|
which software version is currently staged.
|
||
|
|
|
||
|
|
The `activated_on` would be the corresponding reverse association.
|
||
|
|
|
||
|
|
### Keep images in memory
|
||
|
|
|
||
|
|
Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file
|
||
|
|
descriptor rather than file path. Implementation needs to monitor appropriate
|
||
|
|
memory limits to prevent parallel updates from running BMC out of memory.
|
||
|
|
|
||
|
|
### Propagate errors to client
|
||
|
|
|
||
|
|
xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any
|
||
|
|
errors related to initial setup and image metadata/header parsing back to user.
|
||
|
|
Any asynchronous errors which happen during the update process will be notified
|
||
|
|
via failed activation status which maps to failed task associated with the
|
||
|
|
update. Also, a phosphor-logging event will be created and sent back to client
|
||
|
|
via
|
||
|
|
[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json).
|
||
|
|
|
||
|
|
Another alternative could be to use
|
||
|
|
[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json).
|
||
|
|
|
||
|
|
### Firmware Image Format
|
||
|
|
|
||
|
|
Image parsing will be performed in \<deviceX>CodeUpdater and since
|
||
|
|
\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any
|
||
|
|
image format for the firmware image. This fulfills the
|
||
|
|
[Requirement# 7](#requirements).
|
||
|
|
|
||
|
|
### Multi part Images
|
||
|
|
|
||
|
|
A multi part image has multiple component images as part of one image package.
|
||
|
|
PLDM image is one such example of multi part image format. Sometimes, for multi
|
||
|
|
part devices there is no concrete physical firmware device but firmware device
|
||
|
|
itself consists of multiple phsyical components, each of which may have its own
|
||
|
|
component image. In such a scenario, \<deviceX>CodeUpdater can create a logical
|
||
|
|
inventory item for the firmware device. While performing the firmware device
|
||
|
|
update, the client may target the logical firmware device which further knows
|
||
|
|
how to update the corresponding child components for supplied component images.
|
||
|
|
The user can also update the specific component by providing the image package
|
||
|
|
with component as head node. The \<deviceX>CodeUpdater can implement the
|
||
|
|
required logic to verify if the supplied image is targeted for itself (and child
|
||
|
|
components) or not.
|
||
|
|
|
||
|
|
### Update multiple devices of same type
|
||
|
|
|
||
|
|
- For same type devices, extend the Dbus path to specify device instance, for
|
||
|
|
example, /xyz/openbmc_project/Software/\<deviceX>\_\<InstanceNum>\_\<SwId>.
|
||
|
|
All the corresponding interfaces can reside on this path and same path will be
|
||
|
|
returned from xyz.openbmc_project.Software.Update.StartUpdate.
|
||
|
|
|
||
|
|
This fulfills the [Requirement# 9](#requirements).
|
||
|
|
|
||
|
|
### Parallel Upgrade
|
||
|
|
|
||
|
|
- Different type hardware components:
|
||
|
|
|
||
|
|
Upgrade for different type hardware components can be handled either by
|
||
|
|
different <deviceX>CodeUpdater daemons or by a single daemon for hardware
|
||
|
|
components with common features, for example, PLDMd may handle update for
|
||
|
|
devices using PLDM specification. Such updates can be invoked in parallel from
|
||
|
|
BMCWeb and tracked via different tasks.
|
||
|
|
|
||
|
|
- Similar type hardware component:
|
||
|
|
|
||
|
|
BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on
|
||
|
|
different D-Bus paths pertaining to each hardware instance. For more details
|
||
|
|
on D-Bus paths refer to
|
||
|
|
[Update multiple devices of same type](#update-multiple-devices-of-same-type).
|
||
|
|
|
||
|
|
This fulfills the [Requirement# 9](#requirements).
|
||
|
|
|
||
|
|
### Uninterrupted Updates
|
||
|
|
|
||
|
|
`ActivationBlocksTransitions` interface will be created on the specific D-Bus
|
||
|
|
path for a version update which will help to block any interruptions from
|
||
|
|
critical system actions such as reboots. This interface can in turn start and
|
||
|
|
stop services such as Boot Guard Service to prevent such interruptions.
|
||
|
|
|
||
|
|
Moreover, when a device is being upgraded the sensor scanning for that device
|
||
|
|
might need to be disabled. To achieve this, the sensor scanning flow can check
|
||
|
|
for existence of `ActivationBlocksTransitions` interface on associated `Version`
|
||
|
|
DBus path for the inventory item. If such interface exists, the sensor scanning
|
||
|
|
for that device can be skipped by returning back relevant error (such as
|
||
|
|
`EBUSY`) to the client. Another alternative is to check for existence of
|
||
|
|
`ActivationBlocksTransitions` interface only if sensor scanning times out. This
|
||
|
|
won't impact average case performance for sensor scanning but only the worst
|
||
|
|
case scenario when device is busy, for example, due to update in progress.
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
### Centralized Design with Global Software Manager
|
||
|
|
|
||
|
|
Single SoftwareManager which communicates with the BCMWeb, hosts all the
|
||
|
|
interfaces such as Version, Activation, Progress for all hardware components
|
||
|
|
within the system on different DBus paths. Software Manager keeps list of
|
||
|
|
various hardware update services within the system and start them based on
|
||
|
|
update request. These on-demand services update the hardware and interfaces
|
||
|
|
hosted by Software Manager and exits.
|
||
|
|
|
||
|
|
#### Pros
|
||
|
|
|
||
|
|
- Most of the DBus interfaces gets implemented by Software Manager and vendors
|
||
|
|
would need to write minimal code to change properties for these interfaces
|
||
|
|
based on status and progress.
|
||
|
|
- Under normal operating conditions (no update in flight), only Software Manager
|
||
|
|
will be running.
|
||
|
|
|
||
|
|
#### Cons
|
||
|
|
|
||
|
|
- Imposes the need of a common image format as Software Manager needs to parse
|
||
|
|
and verify the image for creating interfaces.
|
||
|
|
- Limitation in the design, as there is a need to get the current running
|
||
|
|
version from the hardware at system bring up. So, Software Manager would need
|
||
|
|
to start each update daemon at system startup to get the running version.
|
||
|
|
|
||
|
|
### Pull model for Status and Progress
|
||
|
|
|
||
|
|
The proposed solution uses a push model where status and progress updates are
|
||
|
|
asynchronously pushed to BMCWeb. Another alternative would be to use a pull
|
||
|
|
model where Update interface can have get methods for status and progress (for
|
||
|
|
example, getActivationStatus and getActivationProgress).
|
||
|
|
|
||
|
|
#### Pros
|
||
|
|
|
||
|
|
- Server doesn't have to maintain a Dbus matcher
|
||
|
|
([Issue](https://github.com/openbmc/bmcweb/issues/202)).
|
||
|
|
- Easier implementation in Server as no asynchronous handlers would be required.
|
||
|
|
|
||
|
|
#### Cons
|
||
|
|
|
||
|
|
- Server would still need maintain some info so it can map client's task status
|
||
|
|
request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling
|
||
|
|
getActivationStatus and getActivationProgress.
|
||
|
|
- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more
|
||
|
|
of an implementation problem which can be resolved through implementation
|
||
|
|
changes.
|
||
|
|
- Currently, activation and progress interfaces are being used in
|
||
|
|
[lot of Servers](#organizational). In future, harmonizing the flow to single
|
||
|
|
one will involve changing the push to pull model in all those places. With the
|
||
|
|
current proposal, the only change will be in update invocation flow.
|
||
|
|
|
||
|
|
## Impacts
|
||
|
|
|
||
|
|
The introduction of new DBus API will temporarily create two invocation flows
|
||
|
|
from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code
|
||
|
|
stacks. As all the code update daemons gets moved to the new flow, Servers would
|
||
|
|
be changed to only support new API stack. No user-api impact as design adheres
|
||
|
|
to Redfish UpdateService.
|
||
|
|
|
||
|
|
## Organizational
|
||
|
|
|
||
|
|
### Does this design require a new repository?
|
||
|
|
|
||
|
|
Yes. There will be a device transport level repositories and multiple
|
||
|
|
\<deviceX>CodeUpdater using similar transport layer can reside in same
|
||
|
|
repository. For example, all devices using PMBus could have a common repository.
|
||
|
|
|
||
|
|
### Who will be the initial maintainer(s) of this repository?
|
||
|
|
|
||
|
|
Meta will propose repositories for following devices and `Jagpal Singh Gill` &
|
||
|
|
`Patrick Williams` will be the maintainer for them.
|
||
|
|
|
||
|
|
- VR Update
|
||
|
|
- CPLD Update
|
||
|
|
|
||
|
|
### Which repositories are expected to be modified to execute this design?
|
||
|
|
|
||
|
|
Requires changes in following repositories to incorporate the new interface for
|
||
|
|
update invocation -
|
||
|
|
|
||
|
|
| Repository | Modification Owner |
|
||
|
|
| :------------------------------------------------------------------------------ | :----------------- |
|
||
|
|
| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) | Jagpal Singh Gill |
|
||
|
|
| [BMCWeb](https://github.com/openbmc/bmcweb) | Jagpal Singh Gill |
|
||
|
|
| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid) | Jagpal Singh Gill |
|
||
|
|
| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update) | Jagpal Singh Gill |
|
||
|
|
| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak |
|
||
|
|
| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation) | Adriana Kobylak |
|
||
|
|
|
||
|
|
NOTE: For
|
||
|
|
[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code
|
||
|
|
seems unused, so not tracking for change.
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
### Unit Testing
|
||
|
|
|
||
|
|
All the functional testing of the reference implementation will be performed
|
||
|
|
using GTest.
|
||
|
|
|
||
|
|
### Integration Testing
|
||
|
|
|
||
|
|
The end to end integration testing involving Servers (for example BMCWeb) will
|
||
|
|
be covered using openbmc-test-automation.
|