Rust has higher kinded types already... sort of
Discuss this post on Reddit.
Refresher
First, a very quick refresher.
In Rust, a type which takes type parameters (Rc<T>
, Vec<T>
, HashMap<K, V>
,
etc) is only a valid type when all type parameters are specified. In other
words, Rc
, Vec
, and HashMap<K>
are not types. You can't have a variable of
type Rc
. You can't pass Rc
as a parameter to other types.
The ability to have such things be actual types is a feature called higher kinded types (HKT). It's primarily useful when combined with traits which also take type parameters. Consider, for example, a hypothetical trait to abstract over different kinds of reference-counted pointers.
trait RcPtr<T>: Clone + Deref<Target=T> {}
You can easily implement this for Rc
and Arc
:
impl<T> RcPtr<T> for Rc<T> {}
impl<T> RcPtr<T> for Arc<T> {}
However, with HKTs, RcPtr
(as opposed to RcPtr<T>
) would be a valid trait,
and Rc
and Arc
would implement it (as opposed to Rc<T>
and Arc<T>
).
This would allow you to do things like:
struct TreeNode<T, P: RcPtr> {
val: T,
right: P<TreeNode<T, P>>,
left: P<TreeNode<T, P>>,
}
This is a tree data structure where the caller gets to decide whether it's thread-safe or not. There are many applications which would benefit greatly from allowing the caller to specify some property such as thread safety, caching behavior, etc - all things which require HKTs in order to leave type parameters unbound until later than is allowed in normal Rust today (I say "normal" because, well, read the next section).
HKTs in Rust Today
One way to view the limitation that prevents us from writing this code is that,
a) when declaring a trait bound, all type parameters must be bound (i.e., P: RcPtr
is not a valid trait bound since RcPtr
takes a type parameter) and, b)
when declaring a type, all type parameters must be bound (i.e., Rc
and Arc
aren't types; Rc<T>
and Arc<T>
are).
However, it's not quite true that there's no way to define a trait which allows
some type parameters to remain unbound, or that there's no way to implement
those traits with types which also leave some type parameters unbound. Consider,
for example, the following function from the standard library's Iterator
trait:
trait Iterator {
fn sum<S: Sum<Self::Item>>(self) -> S;
}
Even once a type has implemented the Iterator
trait, the S
type parameter on
sum
is still unbound - it's only bound once the method is called. This implies
a general pattern - if you want to allow specifying a trait bound while leaving
type parameters unbound, just move them into the methods on that trait. Let's
give it a try:
trait RcPtr {
fn clone<T>(&self) -> Self;
fn deref<T>(&self) -> &T;
}
If this trait gives you an uneasy feeling, you've got the right intuition. Normally, with a trait with a type parameter like:
trait RcPtr<T> {
fn clone(&self) -> Self;
fn deref(&self) -> &T;
}
You declare variables like:
let foo: RcPtr<T> = ...;
Then, whenever you operate on foo
, the compiler guarantees that the correct
type T
is used. This is important, or else you'd be able to operate on the
same variable as different types at different points in the program:
let foo: usize = 0;
(foo: &str) = "hello there";
(foo: &[u8])[0] += 1;
In other words, the fact that Rust tracks the type of variables is what allows
us to have memory safety! So if the type parameter to RcPtr
is moved from the
trait itself into each of the methods, then you could do something like:
let foo: RcPtr = ...;
println!("{}", foo.deref::<&str>());
let bar = foo.clone::<&[u8]>();
In other words, you could violate memory safety. No bueno. I guess we're out of luck...
Although... There is a giant, hairy, awful, scary escape hatch in the language, so what if we just...
unsafe trait RcPtr {
unsafe fn clone<T>(&self) -> Self;
unsafe fn deref<T>(&self) -> &T;
}
That's better :)
Of course, using unsafe
means that you're opting into manually verifying
certain properties. So what properties does the programmer pinky-swear to uphold
here? Essentially, it becomes the programmer's responsibility to track the types
of variables and only ever access them consistently. In other words, for a given
instance of a variable, it must only ever be accessed with the same T
parameter.
Luckily, Rust is pretty good at encapsulation, so we can just encapsulate this
in a safe type that uses unsafe under the hood, and guarantees to always use the
same T
:
struct RcWrapper<R: RcPtr, T> {
rc: R,
_marker: PhantomData<T>,
}
impl<R: RcPtr, T> RcWrapper<R, T> {
fn new(t: T) -> Self {
RcWrapper {
rc: R::new::<T>(t),
_marker: PhantomData,
}
}
}
impl<R: RcPtr, T> Deref for RcWrapper<R, T> {
type Target = T;
fn deref(&self) -> &T {
unsafe { self.rc.deref() }
}
}
Given an R: RcPtr
and a T
, instead of writing R<T>
as we would if Rust
supported HKT natively, we write RcWrapper<R, T>
. We can then treat the
RcWrapper<R, T>
just as we would R<T>
if we had native HKT - we can pass it
around, call methods on it, clone it, etc.
OK, so we can define the trait, and we can even make it safe to use. But how on earth could we implement the trait? What would such an implementation look like?
Awful, that's what. While one of the benefits of giving variables types is that you can guarantee that you always operate on them as the proper type, another benefit is that you always know what size they are. What size is something whose type isn't known until you call methods on it? God only knows...
And when only God knows something, you've got no choice but to figure it out at runtime, and stick it on the heap.
struct SingleThreadedRc {
// Actually *mut Rc<T>, but we don't know
// what T is until runtime...
inner: *mut (),
}
impl SingleThreadedRc {
// Dear SingleThreadedRc, I pinky-swear that
// you're actually holding a pointer to an Rc<T>.
// I swear on my mother's grave.
unsafe fn inner<T>(&self) -> &Rc<T> {
&*(self.inner as *const T)
}
}
unsafe impl RcPtr for SingleThreadedRc {
unsafe fn clone<T>(&self) -> Self {
// Get access to the Rc<T> we knew we had
// inside of us deep down all along...
let inner: &Rc<T> = self.inner();
let new_inner: &Rc<T> = inner.clone();
let new_inner = Box::new(new_inner);
SingleThreadedRc {
inner: Box::into_raw(new_inner) as *mut (),
}
}
unsafe fn deref<T>(&self) -> &T {
let inner: &Rc<T> = self.inner();
inner.deref()
}
}
Does anybody have any Purell?
But it works. You can see a full example including usage here.
Conclusion
To recap, here's the general pattern. You would like a trait, MyTrait<T>
, to
be higher-kinded, but we don't support that in Rust right now. Instead, you
create a trait, MyTrait
, which takes no type parameters, but each of its
methods take a type parameter, T
. Then, if you have an M: MyTrait
, instead
of doing M<T>
as you normally would, you create a concrete wrapper type,
MyTraitWrapper<M: MyTrait, T>
. M<T>
is then written as MyTraitWrapper<M, T>
.
Of course, the code presented here is far from pretty or optimal. One obvious
drawback is the heap allocation. If you could guarantee an upper bound on the
size of the internal type (e.g., if you knew an upper bound on the size of
Rc<T>
for all T
), you could stack- instead of heap-allocate.
This also doesn't cover everything you might want HKT for. For example, how do
you emulate the following trait, where you'd like to be able to do M: MyTrait
and then, later, M::<T>::MyType
?
trait MyTrait<T> {
type MyType: Clone;
}
I'm pretty sure it's possible, but I haven't actually gotten it working in code yet. In the interest of keeping this post short (and also reclaiming some of my free time), I've decided to leave it an exercise to readers to either get it working, or argue that it can't be done (and thus that the approach I've proposed here falls short of full HKT).
Finally, in fairness, calling this "HKT" is a bit of a stretch. It's not a general-purpose mechanism - you have to manually implement it for every trait and every type - and it doesn't provide the same soundness guarantees that a language feature would. But, for any given application that needs a particular trait to be HKT-ified, it gets the job done.
Rust has HKT right now, but only the sinners can use it.