An incoherent Rust

The article explores how Rust's coherence and orphan rules, while ensuring type soundness, inadvertently create a 'first-mover advantage' that hinders the evolution of the ecosystem.

No LLMs were involved in the process of writing this blog post.

Stunted Ecosystem Development

The Rust ecosystem has a fundamental problem with how it’s developing.

Foundational crates such as serde define foundational traits such as Serialize, and then every crate in the ecosystem needs to implement the Serialize traits for their own types. If a crate doesn’t implement serde’s traits for its types then those types can’t be used with serde as downstream crates cannot implement serde’s traits for another crate’s types.

Worse yet, if someone publishes an alternative to serde (say, nextserde) then all crates which have added support for serde also need to add support for nextserde. Adding support for every new serialization library in existence is unrealistic and a lot of work for crate authors.

As a user of these crates if you want to use a new serialization library you’re forced to fork all of these crates and patch them with support for nextserde. This makes it significantly harder for alternatives to foundational crates such as serde to be made and propagate throughout the ecosystem.

There are strong incentives for old crates that “got there first” to stick around in the ecosystem regardless of whether better alternatives exist or not just because its artifically difficult to replace them.

This is not the fault of any library or people writing Rust code. Instead, this problem is forced onto the ecosystem by the language itself through coherence and the orphan rules.

See also Niko’s explanation of how coherence harms the rust ecosystem in Coherence and crate-level where clauses - nikomatsakis.

Coherence and the Orphan Rules

Coherence checks that a Trait is only ever implemented at most once for a type and any given set of generic arguments to the trait:

trait Trait {}
trait Thingies {}
trait OtherThingies {}
impl<T: Thingies> Trait for T {}
impl<T: OtherThingies> Trait for T {}

error[E0119]: conflicting implementations of trait `Trait`
--> src/lib.rs:7:1
|
6 | impl<T: Thingies> Trait for T {}
| ----------------------------- first implementation here
7 | impl<T: OtherThingies> Trait for T {}
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation
For more information about this error, try `rustc --explain E0119`.
error: could not compile `playground` (lib) due to 1 previous error

The orphan rules are a check that helps us implement coherence. They enforce that you can only write a trait implementation if either the trait or the self type is defined in the current crate (it’s actually a little more complicated than this but its not too important for this blog post).

// crate a
pub trait Trait {}
pub struct Foo;
// crate b
use a::*;
impl Trait for Foo {}

error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
--> src/lib.rs:8:1
|
8 | impl Trait for Foo {}
| ^^^^^^^^^^^^^^^---
| |
| `a::Foo` is not defined in the current crate
|
= note: impl doesn't have any local type before any uncovered type parameters
= note: for more information see https://doc.rust-lang.org/reference/items/implementations.html#orphan-rules
= note: define and implement a trait or new type instead

Even though there are no overlapping impls this code is still rejected due to the orphan rules.

See also Trait implementation coherence - Rust Reference.

Why Coherence

The HashMap Problem

// crate a
#[derive(PartialEq, Eq)]
pub struct MyData(u8);
// crate b
impl Hash for MyData {
fn hash(&self) {
self.0.hash();
}
}
pub fn make_hashset() -> HashSet<MyData> {
// Uses the `Hash` impl defined in this crate to insert
1), MyData(12)].into()
}
// crate c
impl Hash for MyData {
fn hash(&self) {
// You probably don't want this to be your hash function...
0.hash();
}
}
pub fn check_hashset(set: HashSet<MyData>) {
// Uses the `Hash` impl defined in this crate to lookup
1)));
assert!(set.contains(MyData(12)))
}
// crate d

In this example we pass a HashSet constructed in crate b to a function in crate c, where the Hash impl used by crate b to construct the HashSet is different from the Hash impl used by crate c to check if entries are present in the HashSet.

The differing Hash impls mean that check_hashset is going to produce completely nonsensical results where none of the values are known to be present in the set.

See also “So wait, how does the orphan rule protect composition” in Coherence and crate-level where clauses - nikomatsakis.

Soundness

Currently coherence is actually important for the type system to be sound:

trait Trait {
type Assoc;
}
// crate a
impl Trait for () {
type Assoc = *const u8;
}
pub fn make_assoc() -> <() as Trait>::Assoc {
// `<() as Trait>::Assoc` is implemented as being `*const u8`
0x0 as *const u8
}
// crate b
impl Trait for () {
type Assoc = Box<u8>;
}
fn drop_assoc(a: <() as Trait>::Assoc) {
// `<() as Trait>::Assoc` is implemented as being `Box<u8>`
let a: Box<u8> = a;
// free'ing an allocation here
}
// crate c
// create a `*const u8` and then implicitly transmute it to a `Box<u8>`

Here we have two overlapping trait impls which specify different values for the associated type Assoc.

If the user constructs a value of type <()>::Assoc where the compiler thinks this is a raw pointer, and then later the user reads the value of type <()>::Assoc where the compiler thinks this is a Box, then we will have transmuted *const u8 to Box<u8> in safe code.

Why Orphan Rules

While coherence is necessary for soundness, the orphan rules are (mostly) not. There are two mains reasons for the orphan rules:

First, the orphan rules allow for all crates in the rust ecosystem to compose together. If we were to check for no overlapping impls at link time we would still be sound, but it would be possible for crates to exist which are incompatible with each other:

// crate a
pub trait GetU32 { fn get(self) -> u32 }
// crate b
impl GetU32 for u32 {
fn get(self) -> u32 {
self
}
}
// crate c
impl GetU32 for u32 {
fn get(self) -> u32 {
self
}
}
// crate d
extern crate b;
extern crate c;
// Uh oh... there are two impls of `GetU32` for `u32`.
// Coherence violation -> error

In this example both b and c depend on a and have had to implement GetU32 for u32 themselves as the author of crate a forgot to do so. Then, crate d comes along wanting to use both crates but can’t because now there are overlapping trait impls.

Secondly, the orphan rules allow for upholding coherence in the face of separate compilation/dynamic linking.

A rust library can be compiled into a dynamic library and then dynamically linked to without knowing it was a rust crate. We need to know that this library doesn’t have any impls which overlap with impls in the project it’s being linked to.

// crate a
pub trait GetU32 { fn get(self) -> u32 }
// crate b
impl GetU32 for u32 {
fn get(self) -> u32 {
self
}
}
// crate c
impl GetU32 for u32 {
fn get(self) -> u32 {
self
}
}
fn main() { ... }

In this example we have both crate b and c again, but imagine crate b was compiled to a dynamic library and then dynamically linked to crate c.

When compiling crate c the compiler doesn’t know the contents of crate b as it’s just a dynamic library. Yet, compilation should not be allowed to succeed as there are overlapping impls which can lead to unsoundness.

The orphan rules allow us to reason about what impls crate c could have written and allow us to restrict what impls other crates can write to only be those that crate c cannot write.

So, while the orphan rules are incredibly valuable for the rust ecosystem as a whole, they aren’t strictly necessary and are largely a means of enforcing coherence.