Bringing Typing Back to the Programmer#
Hello! This is the second FhY design blog post. If you haven’t read our first post and are interested in what this project is all about, you can find it here.
On Wednesday (that would be the day before the first design blog post was released), the
team had a discussion about types and how the type signature of an operation or
procedure would affect how it was mapped to a hardware target. In the current state of
FhY (i.e., as represented by the first blog post), sub-routines would be represented
as op or proc definitions with a definite type signature. For example, a simple
op definition for a matrix-matrix multiplication might look like this:
op matmul(input int32[m, n], input int32[n, p] B) -> output int32[m, p] {
temp index[1:m] i;
temp index[1:p] j;
temp index[1:n] k;
temp int32[m, p] C;
C[i, j] = sum[k](A[i, k] * B[k, j]);
return C;
}
This code defines the matrix-matrix multiplication correctly. However, it only defines
it for int32 multi-dimensional arrays. While this might seem innocuous, it could
cause problems when trying to perform operation scheduling that will yield the best
performance.
Why?
Let’s say you have at your disposal a CPU on your current machine and an FPGA with a
custom domain-specific accelerator realized on the FPGA. For the sake of this example,
consider the GeneSys NPU (a part of the GeneSys deep learning acceleration system; read
more about it here) as the hardware realized on
the FPGA. The GeneSys NPU has a systolic array that can perform matrix-matrix
multiplications extremely efficiently in int8 precision.
You are aware that you have this NPU that can perform the matrix-matrix multiplication
operation efficiently in int8 precision; therefore, you define your program’s data
as multi-dimensional arrays with int8 elements. However, when you invoke the
matmul operation defined above, the types of the input arrays will be promoted from
int8 to int32, as this is a valid type promotion with no information loss. This
means that the compiler must map the matmul operation to the CPU rather than the
FPGA, as the CPU is the only target that can handle the int32 data. Unfortunately,
the CPU is much slower than the GeneSys NPU at matrix-matrix multiplication (like orders
of magnitude slower).
You, the programmer, could circumvent this issue by defining a new matmul operation
that takes int8 arrays as input. This would solve the immediate problem. However, we
want to build libraries of operations that can be reused across different projects.
Creating a permutation of every operation for every possible type signature is a
non-starter considering that the logic of the operation is the same regardless of the
type signature.
One solution to this problem would be to introduce extremely weak base data types like
real that could be promoted to whatever base data type the user passes. This would
allow the user to define the operation with a weak type signature and the compiler to
promote the weak types to the appropriate base data type. However, now, that real
base data type can promote to just about anything. This could lead to an adjacent
problem where the programmer has access to a hardware target that can handle an
operation efficiently at a certain precision, but hasn’t designed their program around
that precision and has now lost the opportunity to use that hardware target.
Instead, we propose to leverage template types like those found in C++ and Rust (called Generics). This solution provides the best of both worlds, as the programmer can define the operation with a weak type signature and the programmer can specify exactly what base data type they want to use. This allows the programmer to design their program around the precision they want to use and be notified by the compiler if they are missing out on an opportunity to use a hardware target that can handle the operation efficiently in a certain precision.
Here is the same matrix-matrix multiplication operation defined with template types:
op matmul<T>(input T[m, n], input T[n, p] B) -> output T[m, p] {
temp index[1:m] i;
temp index[1:p] j;
temp index[1:n] k;
temp T[m, p] C;
C[i, j] = sum[k](A[i, k] * B[k, j]);
return C;
}
proc main(input int8[2, 2] A, input int8[2, 2] B, output int8[2, 2] C) {
C = matmul<int8>(A, B);
}
As you can see, the matmul operation is now defined with a template type T. The
programmer can now specify that they want the operation to be performed with int8
arrays by invoking the operation with matmul<int8>. Therefore, the logic for the
operation is the same regardless of the base data type, and the programmer can design
their program around the precision they want to use. Additionally, if the programmer
attempts to pass a higher precision array to the operation, the compiler with throw an
error for an illegal implicit type promotion. Overall, we feel that this solution
provides an easy way to define operations for many different base data types while also
ensuring that the programmer can tailor their program to the precision they desire.
If you have any thoughts on this solution or have a better solution, please let us know!
Release Date: Friday April 5th, 2024
Last Updated: Friday May 24th, 2024
Post Author(s): Christopher Priebe