171 lines
7.8 KiB
Markdown
171 lines
7.8 KiB
Markdown
|
|
# Using io_uring in BMCs for asynchronous sensor reads
|
||
|
|
|
||
|
|
Author: Jerry Zhu ([jerryzhu@google.com](mailto:jerryzhu@google.com))
|
||
|
|
|
||
|
|
Other contributors:
|
||
|
|
|
||
|
|
- Brandon Kim ([brandonkim@google.com](mailto:brandonkim@google.com), brandonk)
|
||
|
|
- William A. Kennington III ([wak@google.com](mailto:wak@google.com), wak)
|
||
|
|
|
||
|
|
Created: June 9, 2021
|
||
|
|
|
||
|
|
## Problem Description
|
||
|
|
|
||
|
|
Currently, OpenBMC has code that performs I2C reads for sensors that may take
|
||
|
|
longer than desired. These IO operations are currently synchronous, and
|
||
|
|
therefore may block other functions such as IPMI. This project will involve
|
||
|
|
going through OpenBMC repositories (specifically
|
||
|
|
[phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)) that may have this
|
||
|
|
drawback currently, and adding an asynchronous interface using the new io_uring
|
||
|
|
library.
|
||
|
|
|
||
|
|
## Background and References
|
||
|
|
|
||
|
|
io_uring is a new asynchronous framework for Linux I/O interface (added to 5.1
|
||
|
|
Linux kernel, 5.10 is preferred). It is an upgrade from the previous
|
||
|
|
asynchronous IO called AIO, which had its limitations in context of its usage in
|
||
|
|
sensor reads for OpenBMC.
|
||
|
|
|
||
|
|
[brandonkim@google.com](mailto:brandonkim@google.com) has previously created a
|
||
|
|
method for preventing sensors from blocking all other sensor reads and D-Bus if
|
||
|
|
they do not report failures quickly enough in the phosphor-hwmon repository
|
||
|
|
([link to change](https://gerrit.openbmc.org/c/openbmc/phosphor-hwmon/+/24337)).
|
||
|
|
Internal Google BMC efforts have also focused on introducing the io_uring
|
||
|
|
library to its code.
|
||
|
|
|
||
|
|
## Requirements
|
||
|
|
|
||
|
|
By using io_uring, the asynchronous sensor reads will need to maintain the same
|
||
|
|
accuracy as the current, synchronous reads in each of the daemons. Potential
|
||
|
|
OpenBMC repositories that will benefit from this library include:
|
||
|
|
|
||
|
|
- [phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)
|
||
|
|
- [phosphor-nvme](https://github.com/openbmc/phosphor-nvme)
|
||
|
|
- [dbus-sensors](https://github.com/openbmc/dbus-sensors)
|
||
|
|
- any other appropriate repository
|
||
|
|
|
||
|
|
The focus of this project is to add asynchronous sensor reads to the
|
||
|
|
phosphor-hwmon repository, which is easier to implement than adding asynchronous
|
||
|
|
sensor reads into dbus-sensors.
|
||
|
|
|
||
|
|
Users will need the ability to choose whether they want to utilize this new
|
||
|
|
asynchronous method of reading sensors, or remain with the traditional,
|
||
|
|
synchronous method. In addition, the performance improvement from using the new
|
||
|
|
io_uring library will need to be calculated for each daemon.
|
||
|
|
|
||
|
|
## Proposed Design
|
||
|
|
|
||
|
|
In the phosphor-hwmon repository, the primary files that will require
|
||
|
|
modification are
|
||
|
|
[sensor.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.hpp)
|
||
|
|
and
|
||
|
|
[mainloop.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.hpp),
|
||
|
|
as well the addition of a caching layer for the results from the sensor reads.
|
||
|
|
|
||
|
|
In mainloop.cpp currently, the `read()` function, which reads hwmon sysfs
|
||
|
|
entries, iterates through all sensors and calls `_ioAccess->read(...)` for each
|
||
|
|
one; this operation is potentially blocking.
|
||
|
|
|
||
|
|
The refactor will maintain this loop over all sensors, but instead make the read
|
||
|
|
operation non-blocking by using an io_uring wrapper. A caching layer will be
|
||
|
|
used to store the read results, which will be the main access point for
|
||
|
|
obtaining sensor reads in mainloop.cpp.
|
||
|
|
|
||
|
|
```
|
||
|
|
Interface Layer
|
||
|
|
+--------------------------------------------+
|
||
|
|
| |
|
||
|
|
| +------------+ +-------------+ |
|
||
|
|
| | | | | |
|
||
|
|
| | Redfish | | IPMI | |
|
||
|
|
| | | | | |
|
||
|
|
| +-----+------+ +-------+-----+ |
|
||
|
|
| ^ ^ |
|
||
|
|
+---------|------------------------|---------+
|
||
|
|
| |
|
||
|
|
v v
|
||
|
|
+---------+------------------------+---------+
|
||
|
|
| |
|
||
|
|
| DBus |
|
||
|
|
| |
|
||
|
|
+---------^------------------------^---------+
|
||
|
|
| |
|
||
|
|
+-------v-------+ +--------v-------+
|
||
|
|
| | | |
|
||
|
|
|phosphor-hwmon | | dbus-sensors |
|
||
|
|
| | | |
|
||
|
|
+-------^-------+ +--------^-------+
|
||
|
|
| <--------------------- | <------- caching layer at this level
|
||
|
|
+--------v------------------------v--------+
|
||
|
|
| |
|
||
|
|
| Linux kernel |
|
||
|
|
| |
|
||
|
|
+----------^---------------------^---------+
|
||
|
|
| |
|
||
|
|
+----v-----+ +-----v----+
|
||
|
|
| | | |
|
||
|
|
|i2c sensor| |i2c sensor|
|
||
|
|
| | | |
|
||
|
|
+----------+ +----------+
|
||
|
|
|
||
|
|
```
|
||
|
|
|
||
|
|
Using a flag variable (most likely to be placed in the .conf files of each hwmon
|
||
|
|
sensor), users will be able to determine whether or not to utilize this new
|
||
|
|
io_uring implementation for compatibility reasons.
|
||
|
|
|
||
|
|
## Detailed Design
|
||
|
|
|
||
|
|
The read cache is implemented using an `unordered_map` of {sensor hwmon path:
|
||
|
|
read result}. The read result is a struct that keeps track of any necessary
|
||
|
|
information for processing the read values and handling errors. Such information
|
||
|
|
includes open file descriptor from the `open()` system call, number of retries
|
||
|
|
remaining for reading this sensor when errors occur, etc.
|
||
|
|
|
||
|
|
Each call to access the read value of a particular sesnor in the read cache will
|
||
|
|
not only return the cached value but will also submit a SQE (submission queue
|
||
|
|
event) to io_uring for that sensor; this SQE acts as a read request that will be
|
||
|
|
sent to the kernel. The implementation maintains a set of sensors that keeps
|
||
|
|
track of any pre-existing submissions so that multiple SQEs for the same sensor
|
||
|
|
do not get submitted and overlap; the set entries will be cleared upon
|
||
|
|
successful return of the read result using the CQE (completion queue event). The
|
||
|
|
CQE will then be processed, and its information will update the cache map.
|
||
|
|
|
||
|
|
The asynchronous nature of this implementation comes from sending all possible
|
||
|
|
SQE requests, a non-blocking operation, at once instead of being blocked by slow
|
||
|
|
sensor reads in the synchronous implementation. The kernel will process these
|
||
|
|
requests, and before the next iteration of sensor reads the cache will attempt
|
||
|
|
to process any returned CQEs, a non-blocking operation as well.
|
||
|
|
|
||
|
|
Simply put, an access to some "Sensor A" in the read cache will create an
|
||
|
|
underlying read request that makes a best effort to update the value of "Sensor
|
||
|
|
A" before the next time the sensor read loop (currently 1 s by default) gets the
|
||
|
|
value of "Sensor A" through the cache.
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
Linux does have a native asynchronous IO interface, simply dubbed AIO; however,
|
||
|
|
there are a number of limitations. The biggest limitation of AIO is that it only
|
||
|
|
supports true asynchronous IO for un-buffered reads. Furthermore, there are a
|
||
|
|
number of ways that the IO submission can end up blocking - for example, if
|
||
|
|
metadata is required to perform IO. Additionally, the memory costs of AIO are
|
||
|
|
more expensive than those of io_uring.
|
||
|
|
|
||
|
|
For these primary reasons, the native AIO library will not be considered for
|
||
|
|
this implementation of asynchronous reads.
|
||
|
|
|
||
|
|
## Impacts
|
||
|
|
|
||
|
|
This project would impact all OpenBMC developers of openbmc/phosphor-hwmon
|
||
|
|
initially. It has improved the latency performance of phosphor-hwmon; throughput
|
||
|
|
has also been shown to increase (note that throughput profiling was more
|
||
|
|
arbitrary than latency profiling). These performance changes will have to be
|
||
|
|
calculated in further detail across different machines.
|
||
|
|
|
||
|
|
There will be no security impact.
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
The change will utilize the gTest framework to ensure the original functionality
|
||
|
|
of the code in the repository modified stays the same.
|