Hacking ACPI to undervolt my Thinkpad

CPU undervolting

Most computer enthusiasts have heard of CPU overclocking to get higher performance, usually requiring higher CPU voltage leading to more power consumption and more heat. In the case of my Thinkpad X61, I wanted the opposite: undervolting the CPU to increase the battery life and have a quieter system.

Undervolting the CPU is a large topic, but in a nutshell CPU makers (eg. Intel) and system integrator (eg. Lenovo) define the power a CPU should use at different frequency based on the CPU and laptop design. For a given CPU model (eg. Core 2 Duo T7300) there can be varying qualities because the build process is not always perfect, and powers and frequencies are tuned such as all manufactured CPU including the worst can operate safely. This usually means the power levels are chosen based on the worst possible parts and usually higher than needed for most of them.

If you're not completely out of luck and did not get a crappy CPU you can usually undervolt it and still have it running flawlessly. Lower voltage means less power and less heat.

Undervolting my Core 2 Duo T7300

How to control the power levels is very dependent of the CPU model. The Core 2 Duo family supports "Enhanced Intel SpeedStep ® Technology" documented in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2, Chapter 14 "Power and Thermal Management". The frequency and voltage are controlled by writing specific values into the IA32_PERF_CTL MSR, a special register of the processor.

Making frequency and voltage configurable through a MSR means it is simple to update: just write the magic value to the MSR and voila. How are those magic values determined? They are encoded in the platform ACPI firmware.

ACPI is a whole subject in itself, but for now it suffices to say that ACPI exposes tables that can be read by the operating system, and one of them is the Performance Supported States (_PSS). It is an array describing each performance state with associated frequency and other information including a Control value:

Control: indicates the value to be written to the Performance Control Register (PERF_CTRL) in order to initiate a transition to the performance state.

The operating system reads this table and use its entries to switch the CPU between the various performance states, usually based on the current load.

On Linux, this is the role of the acpi-cpufreq driver. On newer CPU, this has been superseded by other mechanisms mostly because this operating system-assisted switching is not very fast but for the Core 2 Duo T7300 in my Thinkpad X61, acpi-cpufreq it is.

Linux PHC

Linux PHC was an out-of-tree fork of acpi-cpufreq to tweak the Control values from userspace: instead of using the ACPI _PSS table values the driver would use values provided through sysfs. To determine working values for your system, a script such as mprime-phc-setup can be used: as discussed above, not all CPU are equals and finding the minimum voltage for each frequency for your CPU requires experimentation. This worked well but with an out-of-tree driver for old CPU, the writing was on the wall: it is no longer maintained, and each new kernel upgrade was a bit more tedious.

To make my life easier, I looked at updating the Control values directly in the ACPI _PSS itself: the mainline acpi-cpufreq driver would just then use those updated values and life would be great again.

Updating ACPI tables

Instead of updating ACPI tables directly by patching my laptop firmware (looked risky), I opted to use Linux ability to override ACPI SSDT at boot time. I dumped and recompiled the ACPI tables (as documented in the Linux kernel admin guide and located the _PSS table (in SSDT9 in my case). I then updated the Control values based on the values I was using with Linux-PHC, recompiled the table, updated my initrd and rebooted. It worked!

I had to make sure it was automatically included when the initrd was rebuilt (eg. on kernel upgrades). Thankfully, there is a way to tweak it in Debian via initramfs-tools hooks. I used a method similar to CPU microcode upgrades with an acpi_table hook:

cat > /etc/initramfs-tools/hooks/acpi_table << EOF
#!/bin/sh

PREREQ=""

prereqs()
{
    echo "$PREREQ"
}

case $1 in
    # get pre-requisites
    prereqs)
    prereqs
    exit 0
    ;;
esac

. /usr/share/initramfs-tools/hook-functions

# generate early initramfs image and prepend
echo "using early initramfs ACPI tables update mode..."
EDIR=$(mktemp -d "${TMPDIR:-/var/tmp}/mkinitramfs-EDIR_XXXXXXXXXX") || {
    echo "E: acpi-table: cannot create temporary directory" >&2
    exit 1
    }
EFW=$(mktemp "${TMPDIR:-/var/tmp}/mkinitramfs-EFW_XXXXXXXXXX") || {
    echo "E: acpi-table: cannot create temporary file" >&2
    exit 1
    }
(cd "${EDIR}" && ln -s /lib kernel \
    && find kernel/firmware/acpi -maxdepth 1 -type f -name '*.aml' -print0 2>/dev/null \
    | cpio -0 -L -H newc --create > "${EFW}") \
&& prepend_earlyinitramfs "${EFW}" && {
    rm "${EFW}"
    rm "${EDIR}/kernel"
    rmdir "${EDIR}"
    exit 0
}

# usually we get here when initramfs-tools is missing prepend_earlyinitramfs()
# or when cpio fails

rm "${EFW}" || true
rm "${EDIR}/kernel" || true
rmdir "${EDIR}" || true

echo "E: acpi-table: failed to create or prepend the early initramfs to the initramfs" >&2

:
EOF

This script looks for any *.aml file under /lib/firmware/acpi/ and automatically add them to initrd. Those SSDT overlays will then be picked up by the kernel to upgrade the ACPI tables.

How to tweak without Linux-PHC

Note: be careful when writing values into MSR, especially for frequency and voltage. Putting wrong values could destroy your CPU.

As discussed, the Control value is written in the IA32_PERF_CTL MSR. The Intel documentation also states that when the transition is successful, the Status value of the _PSS is written back in the high bits of the IA32_PERF_STATUS MSR.

Those MSR can be read and written on Linux via the rdmsr and wrmsr commands (you'll need to load the msr.ko module for those to works). For example, the Control and Status values for the 800MHz frequency are both 0x0000880B in my original _PSS. I can write this value to the IA32_PERF_CTL MSR (0x199) for CPU 0 with the command wrmsr -p 0 0x199 0x0000880B and check the IA32_PERF_STATUS MSR (0x198) value with the command rdmsr -p 0 0x198.