CPU undervolting
Most computer enthusiasts have heard of CPU overclocking to get higher performance, usually requiring higher CPU voltage leading to more power consumption and more heat. In the case of my Thinkpad X61, I wanted the opposite: undervolting the CPU to increase the battery life and have a quieter system.
Undervolting the CPU is a large topic, but in a nutshell CPU makers (eg. Intel) and system integrator (eg. Lenovo) define the power a CPU should use at different frequency based on the CPU and laptop design. For a given CPU model (eg. Core 2 Duo T7300) there can be varying qualities because the build process is not always perfect, and powers and frequencies are tuned such as all manufactured CPU including the worst can operate safely. This usually means the power levels are chosen based on the worst possible parts and usually higher than needed for most of them.
If you're not completely out of luck and did not get a crappy CPU you can usually undervolt it and still have it running flawlessly. Lower voltage means less power and less heat.
Undervolting my Core 2 Duo T7300
How to control the power levels is very dependent of the CPU model.
The Core 2 Duo family supports "Enhanced Intel SpeedStep ® Technology" documented in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2, Chapter 14 "Power and Thermal Management".
The frequency and voltage are controlled by writing specific values into the IA32_PERF_CTL
MSR, a special register of the processor.
Making frequency and voltage configurable through a MSR means it is simple to update: just write the magic value to the MSR and voila. How are those magic values determined? They are encoded in the platform ACPI firmware.
ACPI is a whole subject in itself, but for now it suffices to say that ACPI exposes tables that can be read by the operating system, and one of them is the Performance Supported States (_PSS
).
It is an array describing each performance state with associated frequency and other information including a Control
value:
Control: indicates the value to be written to the Performance Control Register (PERF_CTRL) in order to initiate a transition to the performance state.
The operating system reads this table and use its entries to switch the CPU between the various performance states, usually based on the current load.
On Linux, this is the role of the acpi-cpufreq
driver.
On newer CPU, this has been superseded by other mechanisms mostly because this operating system-assisted switching is not very fast but for the Core 2 Duo T7300 in my Thinkpad X61, acpi-cpufreq
it is.
Linux PHC
Linux PHC was an out-of-tree fork of acpi-cpufreq
to tweak the Control
values from userspace: instead of using the ACPI _PSS
table values the driver would use values provided through sysfs
.
To determine working values for your system, a script such as mprime-phc-setup can be used: as discussed above, not all CPU are equals and finding the minimum voltage for each frequency for your CPU requires experimentation.
This worked well but with an out-of-tree driver for old CPU, the writing was on the wall: it is no longer maintained, and each new kernel upgrade was a bit more tedious.
To make my life easier, I looked at updating the Control
values directly in the ACPI _PSS
itself: the mainline acpi-cpufreq
driver would just then use those updated values and life would be great again.
Updating ACPI tables
Instead of updating ACPI tables directly by patching my laptop firmware (looked risky), I opted to use Linux ability to override ACPI SSDT at boot time.
I dumped and recompiled the ACPI tables (as documented in the Linux kernel admin guide and located the _PSS
table (in SSDT9 in my case).
I then updated the Control
values based on the values I was using with Linux-PHC, recompiled the table, updated my initrd and rebooted.
It worked!
I had to make sure it was automatically included when the initrd was rebuilt (eg. on kernel upgrades).
Thankfully, there is a way to tweak it in Debian via initramfs-tools
hooks.
I used a method similar to CPU microcode upgrades with an acpi_table
hook:
cat > /etc/initramfs-tools/hooks/acpi_table << EOF
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
# get pre-requisites
prereqs)
prereqs
exit 0
;;
esac
. /usr/share/initramfs-tools/hook-functions
# generate early initramfs image and prepend
echo "using early initramfs ACPI tables update mode..."
EDIR=$(mktemp -d "${TMPDIR:-/var/tmp}/mkinitramfs-EDIR_XXXXXXXXXX") || {
echo "E: acpi-table: cannot create temporary directory" >&2
exit 1
}
EFW=$(mktemp "${TMPDIR:-/var/tmp}/mkinitramfs-EFW_XXXXXXXXXX") || {
echo "E: acpi-table: cannot create temporary file" >&2
exit 1
}
(cd "${EDIR}" && ln -s /lib kernel \
&& find kernel/firmware/acpi -maxdepth 1 -type f -name '*.aml' -print0 2>/dev/null \
| cpio -0 -L -H newc --create > "${EFW}") \
&& prepend_earlyinitramfs "${EFW}" && {
rm "${EFW}"
rm "${EDIR}/kernel"
rmdir "${EDIR}"
exit 0
}
# usually we get here when initramfs-tools is missing prepend_earlyinitramfs()
# or when cpio fails
rm "${EFW}" || true
rm "${EDIR}/kernel" || true
rmdir "${EDIR}" || true
echo "E: acpi-table: failed to create or prepend the early initramfs to the initramfs" >&2
:
EOF
This script looks for any *.aml
file under /lib/firmware/acpi/
and automatically add them to initrd.
Those SSDT overlays will then be picked up by the kernel to upgrade the ACPI tables.
How to tweak without Linux-PHC
Note: be careful when writing values into MSR, especially for frequency and voltage. Putting wrong values could destroy your CPU.
As discussed, the Control
value is written in the IA32_PERF_CTL
MSR.
The Intel documentation also states that when the transition is successful, the Status
value of the _PSS
is written back in the high bits of the IA32_PERF_STATUS
MSR.
Those MSR can be read and written on Linux via the rdmsr
and wrmsr
commands (you'll need to load the msr.ko
module for those to works).
For example, the Control
and Status
values for the 800MHz frequency are both 0x0000880B
in my original _PSS
. I can write this value to the IA32_PERF_CTL
MSR (0x199
) for CPU 0 with the command wrmsr -p 0 0x199 0x0000880B
and check the IA32_PERF_STATUS
MSR (0x198
) value with the command rdmsr -p 0 0x198
.