Coding Coverage and Corner Cases#

Hi! Welcome to our fifth FhY design blog post. You may view our previous blog post here.

Preface: AST Node Design and Construction#

Ordinarily, we follow the Object Oriented Programming (OOP) maxim of “Composition over Inheritance.” But when it comes to construction of an Abstract Syntax Tree (AST), building abstractions to be inherited controls the type our nodes take. All are derived from a base AST node to express fundamental grammar constructs of the FhY language. We observe that this approach is apt to solve the problem, and is the industry standard. For example TVM, constructs all AST nodes from the base class called Object.

In our initial approach to constructing AST nodes from a Concrete Syntax Tree (CST) we used the FhYListener class generated by ANTLR, but we had conflicting requirements for our AST nodes. Normally, we want our AST nodes to be more or less immutable. However, during the build process, we don’t have all the required information upfront to instantiate the node class. This inherently meant that our nodes needed to be flexible during the construction. We had navigated this conflict by creating a separate builder class that accepted an AST node class to dynamically define mutable attribute fields. Coupled with a wrapper used to register the type information (annotations) of our AST nodes we could search the registry for field information and defaults of the received class. We can then use the builder class to instantiate the node after we captured all of the required fields. One unfortunate downside to dynamically constructed classes, however, is that the attributes are not known to static code analyzers (i.e. the IDE) for developers ease of use.

Coupled with other design and grammar woes, we quickly recognized that this was the wrong approach to constructing the AST from the CST. Now, using a visitor pattern, we control the traversal route, gathering all the required information to instantiate an AST node. We no longer have a conflicting requirement of flexibility during the build. To that end, we have made all of our AST nodes frozen dataclasses to make our nodes more immutable.

Unit Testing#

In general, our unit tests are designed to be simple and straightforward tests that confirm expected output and behavior of various classes and functions. The purpose is to validate our current production status meets our requirements and make it possible to build an extensible framework that will catch any future problematic changes to accelerate development.

Currently, a large portion of our unit tests is dedicated to validating correct construction our AST nodes. While this does not improve our reported testing coverage, it does increase our confidence to correctly handle the vast anticipated variety of input source FhY code. We hope to mitigate any bizarre or obscure errors clients may receive, to make debugging and diagnosing problems much easier. Furthermore, this makes it possible to make changes and extend the FhY grammar in the future which will inevitably build and benefit on this strong foundational paradigm of rigorous unit testing.

Defining the structure of our AST nodes was relatively straightforward. We have classes that hold relevant data derived from FhY coding language. As we are supporting the construction of new node types, we are supporting new language constructs. We therefore have a workflow that looks something like this:

[FhY Code (text)] --(ANTLR Lexing + Parsing)--> [CST] --> [AST]

Our input is our FhY code snippets, but we are validating the constructed AST nodes. This process is methodical in testing every expected use. Then, we transition our testing to confirm we raise errors (specifically syntax errors) when incorrect FhY code is tested. Finding corner cases is inevitable throughout this process. For example, forgetting the argument variable name can slip though parsing by ANTLR:

op foo(input int32[m, n]) {

}

This means we need to throw a syntax error when we discover this during construction of the AST so that we fail fast and provide the user with enough information to resolve the issue.

I recently found an interesting corner case. We currently hard code defined function keywords for our lexer: [“op”, “proc”, or “native”]. If we define a function using the wrong keyword, we would expect to raise an error. However, because this does not match the syntax of a function definition, this code segment is bypassed. In the snippet below, we construct an empty module node:

def foo(input int32[m, n] A) {

}

Discovery of this little corner case helped us debug a larger problem with our grammar. We will discuss this issue and solution in much greater detail in a future blog post.

Integration Testing Paradigm#

After our discussion earlier this week, we have created a unified CLI entry point to our FhY package. After installation, it can be called through the command line like so:

fhy --module <filepath>
fhy --library <filepath>

In our current state, we are only handling single files (This will change soon!). But, this utility is perfect for our current integration testing pipeline. We have a separate sample directory containing expected input and output (text) files which can be batch processed by our integration tests. As long as the expected output is included when adding a new example fhy code, it will automatically be tested. And if you forgot to add the expected output, our tests will tell you as much.

We also can serialize our AST Nodes to JSON format, using the subparser serialize:

fhy -m <filepath> serialize --format json

This will stream the json format to stdout, which makes it possible to pipe to compare output and perform a diff through the command line.

Currently, we are performing integration tests against the entry point (installed package) using subprocess against input FhY source code files and comparing against expected output files to confirm we get expected results, in the same manner as we expect clients to use our FhY Frontend.

  • Release Date: Thursday April 25th, 2024

  • Last Updated: Thursday May 23rd, 2024

  • Post Author(s): Jason C Del Rio