Skip to main content

Dardel Update

Gert Svensson, PDC

The first phase of Dardel, the new Swedish National Infrastructure for Computing (SNIC) flagship system at PDC, recently passed the final acceptance test with flying colours. At the time of writing, PDC is in the process of migrating all the researchers who have been working on Beskow and Tegner over to using Dardel, and is also accepting new users who have not previously used PDC’s systems.

Current Status of First Phase

In the final acceptance test, which started at the beginning of October, several research groups were selected to test the first phase of Dardel by running real applications on the system for one month. During that period, the system had to function so that it was available at least 98% of the time to pass the test, which it did! After the acceptance test, additional system software and applications were installed while the test users continued to work on the system.

By the time this article is published, the process of transferring users from Beskow and Tegner should be in full swing. After discussions with the principal investigators for each allocation, researchers will be moved over gradually from Beskow and Tegner to Dardel. Files will be transferred, allocation by allocation, from the old to the new disks.

Status of Second Phase (GPU-partition)

GPU node architecture
GPU node architecture

The second phase of Dardel will be a partition of the system featuring GPUs. This partition has been delayed due to worldwide shortages and delays in the electronics industry, however, the plan is for the second phase of Dardel to be installed in mid-March 2022. This GPU-partition will use AMD Instinct® MI250X GPUs and will have 56 GPU nodes – each with a special version of a 64-core AMD CPU, known as the 7A53 (Trento) and only available from Hewlett Packard Enterprise (HPE) – plus four MI250X GPUs connected by AMD’s Infinity Fabric® as shown in the figure above. Each GPU node will have 512 GB of shared fast HBM2E memory. The memory is cache-coherent, which will simplify programming. In principle, each MI250 consists of two separate GPU devices (similar to the K80s in Tegner). The performance of the MI250X is impressive at up to 95.7 TFLOPS in double precision when using special matrix operations, giving Dardel a theoretical peak performance of over 21 PFLOPS. HPE guarantee a High-Performance Linpack (HPL) performance for the entire GPU-partition of at least 8.2 PFLOPS!

The table shows the number and types of nodes that will be in the Dardel system after both phases and all the currently-planned extensions are installed. The bold figures indicate the nodes that have already been installed.
Types of nodes Memory Number of CPU nodes Number of GPU nodes
SNIC initial Industry Scania SU Astronomy SNIC extra Total
Thin 256 GB 488 36 0 0 0 524 0
Large 512 GB 20 0 236 12 48 316 56
Huge 1 TB 8 0 0 0 0 8 0
Giant 2 TB 2 0 0 0 8 10 0
TOTAL - 518 36 236 12 56 858 56