######################################## Bringing Typing Back to the Programmer ######################################## .. contents:: :local: Hello! This is the second *FhY* design blog post. If you haven't read our first post and are interested in what this project is all about, you can find it :doc:`here `. On Wednesday (that would be the day before the first design blog post was released), the team had a discussion about types and how the type signature of an operation or procedure would affect how it was mapped to a hardware target. In the current state of *FhY* (i.e., as represented by the first blog post), sub-routines would be represented as ``op`` or ``proc`` definitions with a definite type signature. For example, a simple ``op`` definition for a matrix-matrix multiplication might look like this: .. code-block:: FhY op matmul(input int32[m, n], input int32[n, p] B) -> output int32[m, p] { temp index[1:m] i; temp index[1:p] j; temp index[1:n] k; temp int32[m, p] C; C[i, j] = sum[k](A[i, k] * B[k, j]); return C; } This code defines the matrix-matrix multiplication correctly. However, it only defines it for ``int32`` multi-dimensional arrays. While this might seem innocuous, it could cause problems when trying to perform operation scheduling that will yield the best performance. Why? Let's say you have at your disposal a CPU on your current machine and an FPGA with a custom domain-specific accelerator realized on the FPGA. For the sake of this example, consider the GeneSys NPU (a part of the GeneSys deep learning acceleration system; read more about it `here `_) as the hardware realized on the FPGA. The GeneSys NPU has a systolic array that can perform matrix-matrix multiplications extremely efficiently in ``int8`` precision. You are aware that you have this NPU that can perform the matrix-matrix multiplication operation efficiently in ``int8`` precision; therefore, you define your program's data as multi-dimensional arrays with ``int8`` elements. However, when you invoke the ``matmul`` operation defined above, the types of the input arrays will be promoted from ``int8`` to ``int32``, as this is a valid type promotion with no information loss. This means that the compiler must map the ``matmul`` operation to the CPU rather than the FPGA, as the CPU is the only target that can handle the ``int32`` data. Unfortunately, the CPU is much slower than the GeneSys NPU at matrix-matrix multiplication (like orders of magnitude slower). You, the programmer, could circumvent this issue by defining a new ``matmul`` operation that takes ``int8`` arrays as input. This would solve the immediate problem. However, we want to build libraries of operations that can be reused across different projects. Creating a permutation of every operation for every possible type signature is a non-starter considering that the logic of the operation is the same regardless of the type signature. One solution to this problem would be to introduce extremely weak base data types like ``real`` that could be promoted to whatever base data type the user passes. This would allow the user to define the operation with a weak type signature and the compiler to promote the weak types to the appropriate base data type. However, now, that ``real`` base data type can promote to just about anything. This could lead to an adjacent problem where the programmer has access to a hardware target that can handle an operation efficiently at a certain precision, but hasn't designed their program around that precision and has now lost the opportunity to use that hardware target. Instead, we propose to leverage template types like those found in C++ and Rust (called Generics). This solution provides the best of both worlds, as the programmer can define the operation with a weak type signature and the programmer can specify exactly what base data type they want to use. This allows the programmer to design their program around the precision they want to use and be notified by the compiler if they are missing out on an opportunity to use a hardware target that can handle the operation efficiently in a certain precision. Here is the same matrix-matrix multiplication operation defined with template types: .. code-block:: FhY op matmul(input T[m, n], input T[n, p] B) -> output T[m, p] { temp index[1:m] i; temp index[1:p] j; temp index[1:n] k; temp T[m, p] C; C[i, j] = sum[k](A[i, k] * B[k, j]); return C; } proc main(input int8[2, 2] A, input int8[2, 2] B, output int8[2, 2] C) { C = matmul(A, B); } As you can see, the ``matmul`` operation is now defined with a template type ``T``. The programmer can now specify that they want the operation to be performed with ``int8`` arrays by invoking the operation with ``matmul``. Therefore, the logic for the operation is the same regardless of the base data type, and the programmer can design their program around the precision they want to use. Additionally, if the programmer attempts to pass a higher precision array to the operation, the compiler with throw an error for an illegal implicit type promotion. Overall, we feel that this solution provides an easy way to define operations for many different base data types while also ensuring that the programmer can tailor their program to the precision they desire. If you have any thoughts on this solution or have a better solution, please let us know! - **Release Date**: Friday April 5th, 2024 - **Last Updated**: Friday May 24th, 2024 - **Post Author(s)**: Christopher Priebe