Listen… We Visit the FhY AST#

Hi! Welcome to our fourth FhY design blog post. You may view our previous blog post here.

Meeting Minutes#

We met on Monday, April 22nd, 2024, and had the following discussions…

Current Progress:#

Today we reviewed our current progress on development of FhY, as follows:

  1. Initially, we were using a Listener (coupled to a stack to control scope), to collect the required information to build AST nodes. This strategy proved problematic as we began implementing more complex node types. For example, simple conditional statements, containing two bodies to perform when true or false, was difficult to distinguish. This may have been a limitation of our current grammar definition, but it became more evident that we needed to have finer control over how we were visiting each node type. We therefore, implemented a visitor pattern, which became much more straightforward and more readable and maintainable.

  2. Implemented a basic ChainMap, used to control variable scoping. Without this, a variable would be assigned a new ID every time we encountered it through parsing. Now, before constructing a variable identifier, we search the current scope in the chain map, and return the existing value if found, otherwise create a new identifier.

  3. We initially had troubles parsing floating point numbers, discovered by our unit tests, because all numbers were being interpreted as integers. Originally, we had floating_point_literals in the grammar as a parser rule. Modifying the grammar, to make FLOAT_LITERAL a lexer token resolved the issue. This is due to an implicit rule imposed by ANTLR, where lexer tokens take precedence by the physical order listed out in the grammar file.

  4. AST node Conversion: We wrote a simple class to convert an AST node back into the original FhY Language, with an option to include assigned Identifier IDs. This was used primarily for debugging purposes, helping to implement the ChainMap described above, but using the option would no longer be valid FhY code. We are currently calling this conversion, “pretty printing,” because we can control the indentation of the output text.

  5. We have setup a GitHub workflow, for CI/CD, which automatically builds our ANTLR- generated files (allowing for easy modifications to the grammar), and performs consistent unit testing, code coverage analysis, code linting, and static type checking analysis. Previously, we had setup a tox configuration file used for local testing purposes. Note, it might be a good idea to use tox as part of our GitHub workflow for a more consistent testing paradigm both locally and remotely. Further note, we are currently only supporting Python versions >=3.11, due to use of StrEnum class, but we hope to change that soon.

Future Work:#

  1. Serialization: Using a visitor pattern, write a class to convert AST nodes to JSON format, and another class to convert JSON format back into AST nodes.

  2. AST node Passes: Write simple passes to our AST nodes to perform label and type checking, as preliminary analyses of the converted code.

  3. Template Types: We need code testing coverage of template types

  4. Write a script (cli) for FhY to perform compilation from the command line by defining a package entry point.

    1. Desired Usage: We want to be able to serialize the AST, piping the output to provide a diff between the expected output. This will help for debugging purposes and build up our current tool chain.

    2. Include a flag to provide the user with a more verbose output during compilation, such as printing out variables or the AST.

  5. Integration Testing: Current method of embedding string test cases within the code is not scalable. Extract out example FhY Code into separate input and output directories, which is automatically retrieved. Then, use the cli interface to perform integration testing, since this is what clients will use. #. Expand our repertoire of toy example files for integration testing.

  6. Symbol Tables: Design and write IR symbol tables.

  7. Multi-File Compilation and Linking: Design a strategy to accommodate compilation of multiple files and linking external libraries.

  • Note that the term visitor pattern is a little ambiguous on whether they return a value or not. The JSON Serialization visitor returns a value, since we are constructing entirely new nodes, but the AST node passes do not (i.e. are acting in place).

  • Edit: We created a transformer class so this ambiguity is no longer true.

Backlogged Work from Discussions:#

  1. Serialization will come in many forms. For now, we had developed a pretty print class to deconstruct AST nodes back into FhY code (as long as no other options are used), and are focusing on JSON format next. But how do we move across different coding languages? At the end of the day, different coding languages are just another version of a text output format. We will need walkers / visitors, to convert other code formats starting with Python, then Rust. Future expansion of coding support will be strategized after the previous two have been implemented, both internally and through external community engagement. Let us know what you think!

  • Release Date: Thursday April 25th, 2024

  • Last Updated: Friday May 24th, 2024

  • Post Author(s): Jason C Del Rio