Analyzing Geekbench 6 under Intel's BOT

An investigation into Intel's Binary Optimization Tool reveals significant performance gains in Geekbench 6 through aggressive code modification, sparking debates over benchmark integrity.

We’ve spent the past week investigating Intel’s Binary Optimization Tool (BOT). BOT modifies instruction sequences in executables to improve performance, and can only be used with a handful of applications (including Geekbench 6). Intel’s public documentation on BOT is limited, so we decided to dig in ourselves to understand how it works and what optimizations it’s applying to Geekbench.

We tested both Geekbench 6.3 and Geekbench 6.7 on a Panther Lake laptop (an MSI Prestige 16 AI+ with an Intel Core 9 386H) with BOT enabled and disabled.

Startup Overhead

When running Geekbench 6.3 with BOT enabled, the first run has a 40-second startup delay before the program starts. Subsequent runs are faster, with a 2-second startup delay. The startup delay disappears when BOT is disabled.

When running Geekbench 6.7 with BOT enabled, all runs have a 2-second startup delay. The startup delay disappears when BOT is disabled.

Geekbench Results

Geekbench 6.3 scores increase when BOT is enabled compared to when BOT is disabled. On our test system, both the single-core and the multi-core scores increased by 5.5%.

| Geekbench 6.3 | BOT Disabled | BOT Enabled | Difference | |---|---|---|---| | Single-Core | 2955 | 3119 | +5.5% | | Multi-Core | 16786 | 17705 | +5.5% |

Some Geekbench 6.3 workload scores also increase, with scores for two workloads (Object Remover and HDR) increasing by up to 30% with BOT enabled.

Geekbench 6.7 single-threaded and multi-threaded scores remained roughly the same with BOT enabled and disabled.

| Geekbench 6.7 | BOT Disabled | BOT Enabled | Difference | |---|---|---|---| | Single-Core | 2938 | 2937 | +0.0% | | Multi-Core | 16892 | 17045 | +0.9% |

Based on these results, we know BOT only optimizes specific versions of Geekbench. We examined the work done during the startup delay, and BOT is computing a checksum of the Geekbench executable. This suggests the checksum is used to identify whether the binary is known to BOT, and thus whether BOT can optimize the binary.

BOT Optimizations

Intel’s Software Development Emulator (SDE) is a development tool that can monitor which instructions are executed during a program run. We used SDE to see which instructions are executed during a Geekbench run, focusing on the HDR workload.

| Metric | BOT Disabled | BOT Enabled | Difference | |---|---|---|---| | Total Instructions | 1.26 trillion | 1.08 trillion | -14% | | Scalar Instructions | 220 billion | 84.6 billion | -62% | | Vector Instructions | 1.25 billion | 18.3 billion | +1366% |

Based on the instruction counts, it’s clear BOT has performed significant changes. The number of total instructions is reduced by 14%. Most of that reduction comes from BOT vectorizing parts of the workload’s code, converting instructions that operate on one value into instructions that operate on eight values. This is a significantly more sophisticated transformation than simple code-reordering.

Conclusions

Real-world application code is incredibly varied. BOT undermines this by replacing that varied code with processor-tuned, fully optimized binaries, measuring peak rather than typical performance. Right now, BOT only supports a handful of applications, meaning BOT-optimized benchmark results paint an unrealistic picture of how a CPU performs in practice. This makes Intel processors appear faster relative to AMD and other vendors than they would be in typical usage.

Next Steps

We will continue to flag BOT-optimized results in the Geekbench Browser. Geekbench 6.7 will include a way to check whether BOT is running and flag results when it is detected. This means we’ll be able to remove the warning from Geekbench 6.7 results when BOT is not detected. Geekbench 6.6 and earlier results on Windows will continue to be flagged.

Source: Hacker News