Introduction

GPU Acceleration and CUDA

GPU acceleration is the use of Graphics Processing Units (GPUs) for general-purpose computation beyond just graphics rendering. While CPUs are optimized for sequential tasks, GPUs contain thousands of smaller cores that can execute many operations in parallel. It is essential for high-performance computing, scientific research and machine learning. As a proprietary framework NVIDIA’s CUDA platform could not be included in Debian’s main archive as it did not comply with Debian Free Software Guidelines (DFSG). This has limited the capability of Debian and Debian based distributions to provide out-of-the-box GPU compute support, leaving users to rely on non-free sources and vendor-specific installations.

What is ROCm?

ROCm is AMD’s open-source platform for GPU computing. Its permissive licensing makes it possible to integrate fully into Debian while also maintaining transparency. Although Debian ships some of its core components, GPU acceleration was still underutilized in many high-impact packages. This project proposed a systematic effort to enhance Debian’s support for AMD GPUs by leveraging the ROCm technology. The overall aim was to:

  • Enable ROCm GPU acceleration in packages having ROCm support upstream.
  • Package new ROCm-compatible tools and libraries.
  • Integrate autopkgtests to ensure continuous validation on the Debian ROCm CI.

Project Milestones

The following are the milestones achieved.

New Tools and Libraries for the ROCm Ecosystem

A significant portion of the project timeline was dedicated to packaging new libraries and tools from the ROCm stack (particularly computer vision). These include:

ROCm Performance Primitives

A high-performance GPU-accelerated library for image processing.

Status:

  • Default backend - HIP
  • Uploaded to experimental (in the NEW queue)

Tests are broken as the NIFTI::znz target is misconfigured. The entire test suite had to be excluded because the authorship of the test data and examples could not be determined.

source

MIVisionX

A comprehensive computer vision library suite providing an open-source implementation of Khronos OpenVX.

Status:

  • Default backend - HIP
  • Tests - functional (autopkgtest running)
source test results (tested on MI-210)

MIGraphX

A machine learning inference library delivering optimized graph-based computations for fast and efficient GPU execution.

Status:

  • Default backend - HOST
  • Tests - functional (autopkgtest running)

The default backend is set to HOST for now because the ROCm build depends on hipBLASLt. The packaging of hipBLASLt is still in progress.

source test results (tested on Epyc Milan 7512)

Enhanced GPU Support in Existing Packages

ROCm has been enabled in these packages.

SpFFT

A high-performance library for computing sparse 3D Fast Fourier Transforms. It is commonly used in scientific computing and simulations.

Status:

  • Default backend - HIP having CPU compatibility
  • Tests - functional (autopkgtest running)
merge request 1 merge request 2 test results (tested on MI-210)

CuPy

A Python library that offers NumPy-compatible GPU-accelerated arrays, enabling high-performance numerical computations and scientific workflows.

Status:

  • Added licenses to ensure DFSG compliance and separated CUDA source.

merge request

Kokkos

A performance portability library that provides abstractions for parallel execution and memory management across CPUs and GPUs. It helps write code across multiple architectures efficiently.

Status:

  • Default backend - HIP with CPU as fallback
  • Tests - functional

merge request

This has been the most complex package to work on so far, as it did not support building fat binaries for all architectures at once. To solve this, I have used for loops to clean, configure, build and install for each architecture one by one, and all those libraries will be managed by the Debian Alternatives System.

Work in Progress

ONNX Runtime

It relies on legacy code from PyTorch that checks for rocm_version.h. This approach is not suitable for Debian and Debian-based distributions, as each package may have a different ROCm version, making this check unnecessary. Therefore, a manual patch is required.

Composable Kernel

The composable kernel failed to build due to some compiler issues. Using the latest llvm-20 compiler fixed those errors, but the compiler itself crashes after some time.

What I learned

Debian Continuous Integration (CI)

The Debian Continuous Integration (CI) is an automated system that coordinates the execution of automated tests against all packages in the Debian system. It continuously runs autopkgtest test suites from sources in the Debian archive. Integrating autopkgtest was one of the most valuable task, as it ensured correct functionality across all compiled libraries. I learned how to automate the testing process and also how to gracefully split the tests into CPU tests and GPU-accelerated tests.

Splitting the tests was necessary because the official Debian CI infrastructure does not have AMD GPUs, which would then lead to false regressions and block packages from migrating to testing. To address this, the GPU-specific tests are skipped in Debian CI, while the full test suite is run only on the ROCm CI, maintained by the Debian ROCm team. This ensures GPU-accelerated functionality is properly validated.

In SpFFT, the GPU tests can be skipped using a GTest Filter during test run. In MIVisionX, an environement variable AGO_DEFAULT_TARGET is to be set to CPU mode.

Debian Alternatives System

The Debian Alternatives System is used to manage multiple variants of the same package in a clean and user-friendly way. By dynamically linking executables, headers, and libraries to private implementations of their specific GPU architecture variants, Kokkos can now run seamlessly on systems that have ROCm-compatible hardware. It also has a CPU fallback in case no supported GPU is detected.

Licenses and DFSG Compliance

While working on CuPy and RPP, I gained some experience reconstructing the source tarball to exclude any software having proprietory code, prebuilt binaries and undeterministic licenses to comply with Debian Free Software Guidelines (DFSG). Based on maintainers’ feedback and code reviews done by my mentor, Cory Bloor, I ensured the packages show compliance with DFSG.

Forwarding Debian Patches Upstream

I also learned the importance of minimizing Debian-specific changes and providing fixes in a way that does not break upstream changes. I also filed PRs on some issues with correct import and declaration of half.hpp on Debian and Debian-based distributions.

Debian Packaging Tools

There are a number of essential tools that helped me maintain clean, reproducible, and policy-compliant packages:

  • gbp (git-buildpackage) : to manage packaging branches efficiently, keeping upstream changes and Debian-specific modifications well separated.
  • quilt : to handle Debian-specific patches in a structured and maintainable manner. I learned to create, refresh, and document patches, and even forwarded some of them upstream when appropriate.
  • sbuild/pbuilder : to build packages in clean chroot environments, replicating Debian’s official build setup and ensuring reproducibility.
  • lintian : to automatically detect common issues and ensure compliance with Debian policy.
  • dch : to simplify changelog management and ensure proper version tracking.
  • autopkgtest : to ensure correct functionality across different build environments and CI pipelines.

Bi-weekly Updates

Bi-weekly meetings were initiated for discussions on ROCm. It was inspiring to see others working on even more exciting and complex problems. The team always welcomed new ideas and offered valuable insights which greatly increased engagement and motivation throughout the project.

Acknowledgements

Special thanks to my mentor, Cory Bloor, for his continuous guidance and support throughout the duration of this project.

Working with Spaarsh as well has been an amazing experience! I would also like to thank Andrius Merkys, Mo Zhou and all the folks during the AMD Debian ROCm meet for their valuable feedback and suggestions during code reviews, it really helped me get started and follow best practices for Debian.