##################################
 Designing a Scalable CI Workflow
##################################

.. contents::
   :local:

Hi! Welcome to our sixth *FhY* design blog post. You may view our previous blog post
:doc:`here <post_2024-05-02>`.


*********
 Preface
*********

We had previously already configured a GitHub workflow and tox.ini file to leverage many
of  our developer tools. However, there was one thing that really bothered me...

I had noticed during the previous implementation that coding coverage reported locally
in a virtual environment compared to coverage reported during a tox run were different.
Specifically, in our tox environment, we were not capturing coverage of our integration
tests that use subprocess against our FhY entry point. Initially, the only way I was
able to solve this was to use a development environment configured in the tox.ini file
as shown:

.. code-block:: text

   \\ tox.ini
   [testenv:coverage]
   use_develop = true

Now, this flag `use_develop` tells tox to install your package in editable mode (i.e.
`pip install -e <package>`). Because this isn't how clients will normally use your
package, I created a separate environment to perform unit testing in standard install.
But this setup is less than ideal because it means we are running our unit tests twice.

This initially worked fine because we are only testing a single Python version.
However, we would like to scale to support both multiple Python versions and host OS's
in a matrix. Duplicating this testing process unnecessarily consumes our available
action hours and is fundamentally not scalable.

Furthermore, because we plan to matrix both OS and Python versions, we do not need to
analyze coding coverage for every permutation, but instead collate and report separately
after all testing has completed. This has an additional advantage of separating testing
and coverage reporting as two independent steps to more quickly determine where a
failure might occur. As a side benefit, this also makes it easier to review the coverage
report both in GitHub Actions and locally. We no longer would need to endlessly scroll
through the litany of performed tests just to see why we failed coverage.


************************************************
 The Big Strange World of Coding Coverage Ennui
************************************************

Every package I had previously developed or managed didn't have a script or entry point
where I needed to use and monitor a subprocess for the purpose of integration testing to
capture coding coverage. Up until this point, I had exclusively used the pytest plugin,
`pytest-cov`, because it was easy and it just worked... until now.

Because pytest-cov is just coverage.py with extra steps, and I also wanted to separate
testing and coverage report, I started with the documentation from coverage.py, on
`capturing subprocess coverage
<https://coverage.readthedocs.io/en/latest/subprocess.html#configuring-python-for-sub-process-measurement>`_
. We learn that the process of capturing subprocess events is relatively
straightforward. We need to do two things:

#. Create a .pth file initiating coverage process_startup:

   .. code-block:: python

      import coverage

      coverage.process_startup()

#. Set the environment variable, `COVERAGE_PROCESS_START` to your config file (e.g.
   pyproject.toml).

Now we have a tox.ini file that looks like this:

.. code-block:: text

   # tox.ini
   [testenv]
   set_env =
      COVERAGE_PROCESS_START={toxinidir}/pyproject.toml
   commands_pre =
      python -c 'import pathlib; pathlib.Path("{env_site_packages_dir}/cov.pth")\
         .write_text("import coverage; coverage.process_startup()")'
   commands =
      coverage run -m pytest {posargs}

   [testenv:coverage]
   depends = py{311, 312, 313}
   commands =
      coverage combine
      coverage report

Now, you might think to yourself, this should work!... and unfortunately it likely will
not. The real problem lies between our unit and integration tests. In our unit tests, we
are testing against the local copy (within the project root). But our integration tests
are using subprocess to access the entry point. To test this, we have to install our
package, which means coverage is occurring not on our local copy, but within our
`.tox/py/**/site-packages/*`. In development mode, where our package is installed in
editable form, our local copy and installed copy are the same (because the installed
package is just a reference to our local), which means the coverage is combined without
issue. However, in standard install, this is no longer true and there is no easy way to
combine / reference the installed and local code source, so we get coverage of both
duplicated file positions which ends up looking like we have drastically decreased our
coding coverage. Mock Example Coding Report:

.. code-block:: bash

   .tox/py/**/site-packages/package/example.py      23%
   package/example.py                               77%

It is easy to forget when using tox that the local copy of your source code is visible
to it because everything is happening "behind the scenes." What we really want while
using tox is to perform all of our testing against our installed package (within the
tox virtualenv). This is why we are using tox in the first place.

After finding this `interesting read <https://hynek.me/articles/testing-packaging/>`_,
written by the developer of `attrs` open source project, `Hynek Schlawack
<https://github.com/hynek>`_, the real problem is with our project packaging structure.

This was the general structure which followed standard Python packaging convention
practices, creating a more flat layout directory tree:

.. code-block:: text

   project/
   ├── docs/
   │   └── source/
   │       ├── conf.py
   │       └── index.rst
   ├── project/
   │   ├── \__init__.py
   │   ├── module_a.py
   │   ├── module_b.py
   │   └── subpackage/
   │       └── submodule.py
   ├── tests/
   │   ├── \__init__.py
   │   └── conftest.py
   ├── pyproject.toml
   ├── tox.ini
   └── .readthedocs.yaml

If the the answer isn't obvious yet, we are calling pytest from the project root
directory, which means the project is importable from both the root and the
site-packages because both are on path. The easiest solution is to change that by
creating a single layer of indirection, imbedding the project within a src folder:

.. code-block:: text

   // reduced to relevant change for brevity
   project/
   ├── src/
   │   └── project/
   │       ├── \__init__.py
   │       ├── module_a.py
   │       ├── module_b.py
   │       └── subpackage/
   │           └── submodule.py

Now, when we call pytest and perform unit tests and coverage, we are only testing
against the installed site-packages within the tox virtualenv.

To wrap up our coverage woes, several other configurations need to be updated to assist
both setuptools and coverage tools. In our pyproject.toml:

.. code-block:: toml

   # pyproject.toml
   # Now this can be used generally for any project because we just need to find src
   [tool.setuptools]
   package-dir = {"" = "src"}

   [tool.setuptools.packages.find]
   where = ["src"]

   [tool.coverage]
   # This was less obvious due to `src_pkg` config option, and duplicated name below
   # Change this name to name of package (i.e. this is not general)
   source = ["fhy"]

   [tool.coverage.paths]
   # Informs coverage that these two paths are identical
   # Preventing duplicated coverage reporting (see above)
   source = ["src", "*/.tox/py*/**/site-packages"]

and our tox.ini:

.. code-block:: text

   # tox.ini
   [testenv:coverage]
   description = Report Code Coverage
   skip_install = true  # We no longer need to install the package for reporting
   parallel_show_output = true
   deps = coverage
   depends = py{311, 312}  # Change this as needed
   commands =
      coverage combine
      coverage report

At this point, your whole configuration should accomplish and correctly capture
coding coverage of your project using tox virtual environments, even if you need to
also capture subprocess or multiple processes performed in parallel.


Key Take Away Points
====================

#. Guidance on best practices to setup your Python project is currently `evolving
   <https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/>`_
   away from the previously recommended flat layout. Use a `src` directory layout when
   setting up your (next) Python project to make certain it is not importable during
   testing. This src layout project structure is most compatible with tox to ensure all
   testing is performed on the source code installed within the tox virtual environment.

#. Use `coverage.py` directly when monitoring code coverage forgoing the pytest plugin,
   pytest-cov, to separate unit testing and coverage reporting.

#. Update your pyproject.toml if using setuptools to correctly find your Python package,
   and tool.coverage{.paths} sections to correctly capture code coverage. Showing the
   relevant sections in completion below:

   .. code-block:: toml

      # setuptools project packaging
      [tool.setuptools]
      package-dir = {"" = "src"}

      [tool.setuptools.packages.find]
      where = ["src"]

      [project.optional-dependencies]
      test = ["pytest", "coverage", "pytest-xdist"]

      # pytest config
      [tool.pytest.ini_options]
      testpaths = ["tests"]
      addopts = "-n auto -rA"  # only use `-n auto` when using pytest-xdist plugin too.

      # Coverage tool config
      [tool.coverage.run]
      parallel = true
      branch = true
      source = ["fhy"]

      [tool.coverage.paths]
      source = ["src", "*/.tox/py*/**/site-packages"]

      [tool.coverage.report]
      fail_under = 85.0  # Decide what coverage is right for your project
      precision = 1
      show_missing = true
      skip_empty = true

#. Update your tox.ini, shown below in completion:

   .. code-block:: text

      [testenv]
      description = Run Unit Tests
      extras = test
      set_env =
         COVERAGE_PROCESS_START={toxinidir}/pyproject.toml
      commands_pre =
         python -c 'import pathlib; pathlib.Path("{env_site_packages_dir}/cov.pth").write_text("import coverage; coverage.process_startup()")'
      commands =
         coverage run -m pytest {posargs}

      [testenv:coverage]
      description = Report Code Coverage
      skip_install = true  # We no longer need to install the package for reporting
      parallel_show_output = true
      deps = coverage
      depends = py{311, 312}  # Change this as needed
      commands =
         coverage combine
         coverage report


**********************************
 Implementation of Final Workflow
**********************************

Once we were able to accurately capture coverage, we could finally create a better
realized and scalable CI workflow:

.. figure:: /_static/img/github_workflow_20240523.png
   :alt: GitHub Action Workflow
   :align: center

#. Dynamically generate our parser files using ANTLR, making it possible to extend the
   FhY grammar without issue in our remote repository.

#. Perform "style" checks, confirming code linting, formatting, static typing, and
   documentation build all performed on a single OS and Python version. In a parallel
   environment we matrix Python versions and host OS platforms to perform pytest
   testing and capture coding coverage.

#. Finally, when tests are completed, we can collate and report coverage.

-  **Release Date**: Thursday 23rd May 2024
-  **Last Updated**: Friday 24th May 2024
-  **Post Author(s)**: Jason C Del Rio