openbmc_docs/designs/thermal-control-modes.md
2024-12-23 14:53:31 +08:00

81 lines
3.8 KiB
Markdown

# Control.ThermalMode dbus interface with Supported and Current properties
Author: Matthew Barth !msbarth
Other contributors: None
Created: 2019-02-06
## Problem Description
An issue was discovered where the exhaust heat from the system GPUs causes
overtemp warnings on optical cables on certain system configurations. The issue
can be resolved by altering the fan control application's floor table,
effectively raising the floor when these optical cables exist but an interface
is needed to do so. Since the issue revolves around the optical cables
themselves, where no current mechanism exists to detect the presence of the
optical cables plugged into a card downwind from the GPUs' exhaust, an end-user
must be presented with an ability to enable this raised floor speed table.
## Background and References
The witherspoon system supports pci cards that could have optical cables plugged
in place of copper cables. These optical cables can report overtemp warnings to
the OS when high GPU utilization workloads exist. When this occurs with low
enough CPU utilization, the fans could be kept at a given floor speed that
sufficiently cools the components within the chassis, but not the optical cables
with the slow moving hot exhaust.
Without an available exhaust temp sensor, there's no direct way to determine the
exhaust temp and include that within the fan control algorithm. A similar issue
exists on other system where mathematical calculations are done based on the
overall power dissipation.
Mathematical calculations to logically estimate exit air temps:
https://github.com/openbmc/dbus-sensors/blob/master/src/ExitAirTempSensor.cpp
## Requirements
Create the ability for an end-user to enable the use of a thermal control mode
other than the default. In this use-case, the mode is specific to an
undetectable configuration that alters the fan floor speeds unrelated to
standardized profile/modes such "Acoustic" and "Performance". Once the end-user
selects a documented mode for the platform, the thermal control application
alters its control algorithm according to the defined mode, which is
implementation specific to that instance of the application on that platform.
## Proposed Design
Create a Control.ThermalMode dbus interface containing a supported list of
available thermal control modes along with what current mode is in use.
Initially the current mode would be set to "Default" and the implementation of
the interface would populate the supported list of modes.
As one implementation, phosphor-fan-presence/control would be updated to extend
this dbus interface object which would fill in the list of supported modes from
its fan control configuration for the platform. Once the fan control application
starts, the interface would be added on the zone object and available to be
queried for supported modes or update the current mode. An end-user may set the
current mode to any of those supported modes and the current mode would be
persisted each time it is updated. This is to ensure each time the fan control
application zone objects are started, the last set control mode is used.
## Alternatives Considered
Mathematical calculation to create a virtual exhaust temp sensor value based on
overall power dissipation. However, in the witherspoon situation, using this
technique would not be reliable in adjusting the floor speeds for only
configurations using optical cables. This would instead present the possibility
of raising floor speeds for configurations where its unnecessary.
## Impacts
The thermal control application used must be configured to provide what thermal
control modes are supported/available on the interface as well as perform the
associated control changes when a mode is set.
## Testing
Trigger the use of an alternative fan floor table based on the thermal control
mode selected on a witherspoon system.