Linking Smaller Haskell Binaries (2023)

Haskell binaries can be large, but strategies like section splitting and Identical Code Folding (ICF) can significantly reduce their size at link time.
Linking Smaller Haskell Binaries
Publish date: Jan 7, 2023 Last updated: Jan 8, 2023
Haskell binaries can get quite large (think ~100MB), especially for projects with many transitive dependencies. Here are two strategies that can help at link time, the latter being more experimental.
I used the test-pandoc binary from pandoc on GHC 9.2.5 below. This was nice because obviously it was easy to test if linking broke anything (just run the tests).
-split-sections and --gc-sections
You can instruct ghc to emit code in individual minimal sections, allowing the linker to easily find and remove dead code. This looks like, in your cabal.project:
package *
ghc-options: -split-sections
-- For C code; This likely won't have much effect, but worth including:
gcc-options: -fdata-sections -ffunction-sections
package pandoc
-- All of the major thinkers support gc-sections, but we'll use lld since
-- it supports the next experiment we want to do (and it's fast).
-- NOTE: assumes gcc >= 10, which knows how to invoke lld
ld-options: -fuse-ld=lld -Wl,--gc-sections,--build-id
These options take the size of the stripped binary down from 113M to 83M (-27%).
Identical code folding
Both gold and lld are able to do a limited form of icf, identifying functionally-equivalent sections at link time, and combining them, fixing up any inbound references. lld seems to have a more effective implementation.
You can try it by adding the following to the settings above:
package pandoc
...
-- --print-icf-sections is just for debugging
ld-options: -Wl,--icf=all,--ignore-data-address-equality,--ignore-function-address-equality,--print-icf-sections
This gets the binary down to 64M for me (a further -23%).
Looking closer at ICF
There are lots of reasons we can imagine a Haskell binary having a lot of potential code folding candidates; for instance the same function being inlined and maybe specialized the same way in several modules. I wanted to poke at the binary briefly and see if I could learn anything.
Taking a look at the logs from lld and picking a section arbitrarily, we can find sections that are repeated across half a dozen modules within pandoc. Compiling with debug symbols and inspecting with objdump shows that these sections indeed look the same.
Misc
Interactions with -fdistinct-constructor-tables
I’ve come to rely on -fdistinct-constructor-tables -finfo-table-map for profiling and debugging. Unsurprisingly, duplicate info tables are folded away, which isn’t what we want for debugging. You’d need to instruct the linker that these separate infotables should be treated as GC roots, and not removed.
An opportunity to speed up compilation?
All these duplicate sections represent not only wasted space in the binary, but also wasted time during compilation. Could GHC be smarter about e.g. caching compilation units that eventually are emitted as sections early in compilation?
Tools that didn’t work
I tried to play with bloaty to investigate the composition of the binary, but it chokes on Haskell code. I thought it might be neat to try to use kcov to investigate how much of the resulting binary was dead code, but it chokes on probably the same bug.
Source: Hacker News










