Bringing Typing Back to the Programmer#

Hello! This is the second FhY design blog post. If you haven’t read our first post and are interested in what this project is all about, you can find it here.

On Wednesday (that would be the day before the first design blog post was released), the team had a discussion about types and how the type signature of an operation or procedure would affect how it was mapped to a hardware target. In the current state of FhY (i.e., as represented by the first blog post), sub-routines would be represented as op or proc definitions with a definite type signature. For example, a simple op definition for a matrix-matrix multiplication might look like this:

op matmul(input int32[m, n], input int32[n, p] B) -> output int32[m, p] {
    temp index[1:m] i;
    temp index[1:p] j;
    temp index[1:n] k;
    temp int32[m, p] C;

    C[i, j] = sum[k](A[i, k] * B[k, j]);
    return C;
}

This code defines the matrix-matrix multiplication correctly. However, it only defines it for int32 multi-dimensional arrays. While this might seem innocuous, it could cause problems when trying to perform operation scheduling that will yield the best performance.

Why?

Let’s say you have at your disposal a CPU on your current machine and an FPGA with a custom domain-specific accelerator realized on the FPGA. For the sake of this example, consider the GeneSys NPU (a part of the GeneSys deep learning acceleration system; read more about it here) as the hardware realized on the FPGA. The GeneSys NPU has a systolic array that can perform matrix-matrix multiplications extremely efficiently in int8 precision.

You are aware that you have this NPU that can perform the matrix-matrix multiplication operation efficiently in int8 precision; therefore, you define your program’s data as multi-dimensional arrays with int8 elements. However, when you invoke the matmul operation defined above, the types of the input arrays will be promoted from int8 to int32, as this is a valid type promotion with no information loss. This means that the compiler must map the matmul operation to the CPU rather than the FPGA, as the CPU is the only target that can handle the int32 data. Unfortunately, the CPU is much slower than the GeneSys NPU at matrix-matrix multiplication (like orders of magnitude slower).

You, the programmer, could circumvent this issue by defining a new matmul operation that takes int8 arrays as input. This would solve the immediate problem. However, we want to build libraries of operations that can be reused across different projects. Creating a permutation of every operation for every possible type signature is a non-starter considering that the logic of the operation is the same regardless of the type signature.

One solution to this problem would be to introduce extremely weak base data types like real that could be promoted to whatever base data type the user passes. This would allow the user to define the operation with a weak type signature and the compiler to promote the weak types to the appropriate base data type. However, now, that real base data type can promote to just about anything. This could lead to an adjacent problem where the programmer has access to a hardware target that can handle an operation efficiently at a certain precision, but hasn’t designed their program around that precision and has now lost the opportunity to use that hardware target.

Instead, we propose to leverage template types like those found in C++ and Rust (called Generics). This solution provides the best of both worlds, as the programmer can define the operation with a weak type signature and the programmer can specify exactly what base data type they want to use. This allows the programmer to design their program around the precision they want to use and be notified by the compiler if they are missing out on an opportunity to use a hardware target that can handle the operation efficiently in a certain precision.

Here is the same matrix-matrix multiplication operation defined with template types:

op matmul<T>(input T[m, n], input T[n, p] B) -> output T[m, p] {
    temp index[1:m] i;
    temp index[1:p] j;
    temp index[1:n] k;
    temp T[m, p] C;

    C[i, j] = sum[k](A[i, k] * B[k, j]);
    return C;
}

proc main(input int8[2, 2] A, input int8[2, 2] B, output int8[2, 2] C) {
    C = matmul<int8>(A, B);
}

As you can see, the matmul operation is now defined with a template type T. The programmer can now specify that they want the operation to be performed with int8 arrays by invoking the operation with matmul<int8>. Therefore, the logic for the operation is the same regardless of the base data type, and the programmer can design their program around the precision they want to use. Additionally, if the programmer attempts to pass a higher precision array to the operation, the compiler with throw an error for an illegal implicit type promotion. Overall, we feel that this solution provides an easy way to define operations for many different base data types while also ensuring that the programmer can tailor their program to the precision they desire.

If you have any thoughts on this solution or have a better solution, please let us know!

Release Date: Friday April 5th, 2024
Last Updated: Friday May 24th, 2024
Post Author(s): Christopher Priebe

Bringing Typing Back to the Programmer#

This Page