AMD GPU Hackathon for Research Software Developers Hosted by ENCCS in Stockholm

Thor Wikfeldt, ENCCS

From 14-16 March 2023, the EuroCC National Competence Centre Sweden (ENCCS) hosted a hackathon event at the RISE Research Institutes of Sweden offices on the main KTH Royal Institute of Technology campus in Stockholm. The hackathon was aimed at research software developers interested in porting or optimising their code to run on AMD Instinct™ graphics processing units (GPUs) using GPU programming frameworks such as HIP, OpenMP, OpenACC, or SYCL.

The event utilised the Dardel supercomputer, Sweden’s flagship supercomputer operated by PDC at KTH in Stockholm, which is accessible for academic research through NAISS. Participating teams were granted access to the Dardel system ahead of the hackathon.

The event had an impressive turnout, and eight teams were ultimately invited to join. The level of interest was indeed higher than what could be accommodated based on the number of expert mentors, but we hope to continue running GPU programming hackathons in coming years! To apply for the hackathon, teams submitted a detailed description of their project and the potential impact of their code on a specific organisation or the wider community. Prior to the hackathon, the teams that were accepted were expected to engage with recommended learning resources, profile their code, and meet virtually (online) with their assigned mentors.

One of the Hackathon teams working virtually with a mentor

The hackathon kicked off with an online day on the 7th of March, where teams were matched with mentors and treated to introductory seminars on GPU programming by seasoned experts from AMD and HPE. George Markomanolis, a principal member of technical staff at AMD, provided an in-depth walkthrough of compilation aspects and profiling tools like Omniperf, while John Levesque, a senior distinguished technologist at HPE, offered a comprehensive introduction to the HPE programming environment and profiling tools. Johan Hellsvik, an application expert at PDC, also gave an introduction to the PDC environment.

During the primary in-person segment of the hackathon from 14-16 March, participating teams received invaluable guidance from expert mentors representing HPE, AMD, PDC, and ENCCS. Each day commenced with a stand-up session, where teams shared their progress, challenges, and daily goals. Based on these discussions, mentors were assigned to provide appropriate support.

Although each of the teams and their mentors spent a significant portion of the day in separate rooms, concentrating on their projects, a vibrant sense of community was palpable during coffee breaks, on the hackathon chat channel, and especially during social events. In fact, teams and mentors gathered informally at a renowned Stockholm pub on the eve of the hackathon on the 13th of March, and an official dinner took place on the evening of the 14th. Additionally, participants enjoyed an exceptional guided tour of the PDC machine room, led by Luca Manzari and Gert Svensson, the system manager and deputy director of PDC, respectively. The tour offered an in-depth look at the complexities of liquid cooling, backup power, and fire safety systems, along with intriguing stories and historical titbits about PDC.

Some Hackathon participants with the Dardel system at PDC during the tour of the PDC computer hall

Overall, the hackathon proved to be a valuable opportunity for research software developers to enhance their GPU programming skills and collaborate with expert mentors in achieving their goals. The event was a resounding success, and we eagerly anticipate hosting similar events in the future! For a more detailed account of the hackathon experience from the point of view of the project teams, please read on for stories from the UppASD and IFS teams!

UppASD

UppASD is a program for simulation of atomistic spin dynamics and spin-lattice dynamics. The program is written in Fortran 2003 with shared memory parallelisation over CPU cores by means of OpenMP. Team UppASD entered the hackathon with the expectation of getting expert advice on which programming model to work with for GPU offloading of compute intensive kernels, and getting started with the implementation. Being agnostic at first on the choice of HIP, OpenSYCL, and OpenMP, the team settled for working with OpenMP. The first step was to obtain a better understanding of the performance of the current OpenMP parallelised CPU code. Then work ensued to implement offloading directives for one of the dominant terms of the Hamiltonian, namely bilinear spin-exchange, as well as for the semi-implicit midpoint integrator. The good progress made during the hackathon forms a platform for porting all the main parts of UppASD to GPU code.

Integrated Forecasting System (IFS)

Two teams from the European Centre for Medium-Range Weather Forecasts (ECMWF) offices in Bonn and Reading travelled to Stockholm. In their metaphorical suitcases were two components of the weather forecasting model IFS, which is currently being deployed and optimised on the LUMI system for the first Digital Twins of the European Commission’s Destination Earth initiative.

The objective of the first team was to make the existing initial offload implementation of the spectral transformation library, a key component and one of the most computationally expensive parts of the IFS, run faster on AMD MI250X GPUs. The second team worked on a proxy application for the physical parameterisations, with the goal of developing an optimal GPU adaptation recipe. For that, they could draw on a wide range of already available programming model implementations of the same algorithm, trying to make as many as possible work as fast as the reference results on NVIDIA A100 GPUs.

IFS is written mostly in Fortran, and early on, teething problems of the relatively juvenile software stack for AMD GPUs were encountered. But the Cray compiler could ultimately be convinced to generate working offload binaries for both applications. With the help of the fabulous mentors, profiling efforts were soon successful, and first optimisation approaches were identified. After three days, the spectral transforms team could enter a fourfold speed-up via a mixture of targeted optimisations. The physical parameterisations application managed to achieve on-par performance with NVIDIA A100 GPUs for a HIP implementation of the algorithm but failed to achieve improvements on other programming model implementations, in particular pragma-based approaches.

Nevertheless, it was a positive outcome for all, returning home with suitcases filled with faster code, a lot of newly acquired knowledge, memories of positive interactions with mentors and other participants, and an even longer list of ideas and tasks they want to tackle next.