Skip to main content

First Phase of Dardel Installed

Gert Svensson, PDC

As announced in late February (see “Press Release: New supercomputer coming to KTH!” ), Hewlett Packard Enterprise (HPE) was awarded the contract to supply a new Swedish National Infrastructure for Computing (SNIC) system at PDC. The new system, which is called Dardel, is an HPE Cray pre-exascale supercomputer that is being installed in two phases: the first phase of the system is based on central processing units (CPUs), and the second phase will be based on graphics processing units (GPUs).

Current Status

Installation of the first (CPU-based) phase of PDC’s new Dardel flagship system started on the 2nd of August 2021. As with previous Cray installations at PDC, the installation of Dardel was well planned, and the system was in place in the PDC computer hall and ready to be started on the 5th of August. The photos at the end of this article show the parts of the system arriving at KTH and then being physically installed in the PDC computer hall.

According to the contract with HPE, this first phase of Dardel should have been delivered and installed at PDC in April this year. However, the installation had to be delayed due to external factors. At the time of writing (late August), the Dardel system is undergoing the first part of the acceptance testing process. When that is completed, PDC staff will spend two weeks integrating the new system into the PDC environment. This will involve tasks like connecting the system to PDC’s network, setting up user authentication, and installing application software. After that, there will be a one-month test period with selected test users trialling the system and trying to push its boundaries. If problems arise, there may need to be additional test periods after the problems are mitigated. Once a successful test period has been completed, all researchers will gradually be moved over from Beskow and Tegner to Dardel. We expect that all PDC users will be moved over during October at the latest, and then it will be possible to decommission Beskow and Tegner.

CPU Nodes

The original contract for the new system specified that phase one of Dardel would contain 518 computational nodes with dual AMD 7742 CPUs with 64 cores each. Using options in the initial agreement, PDC has been able to significantly expand the system with additional nodes as follows.

  • An extra partition with 36 nodes for industrial and business collaboration at PDC is being included in phase one.
  • Phase one of Dardel will also have an additional partition consisting of 236 computational nodes to be used in the research collaboration between KTH and the heavy vehicle manufacturer Scania.
  • Furthermore, SNIC has decided to invest in 56 additional nodes for academic research.
  • The Department of Astronomy at Stockholm University (SU) has also invested in 12 nodes.

The Scania partition, the SU partition and the additional SNIC nodes have not been installed at the time of writing. Details of all the nodes in phase one can be found in the table below; the nodes that have been installed to date are indicated.

This table shows the number and types of nodes that will make up the first phase of Dardel. The coloured boxes indicate the nodes that have already been installed.
   

Number of nodes

Name of
nodes

Memory

SNIC initial

Industry

Scania

SU Dept. of Astronomy

SNIC additional

Total

Thin

256 GB

488

36

-

-

-

524

Large

512 GB

20

-

236

12

48

316

Huge

1 TB

8

-

-

-

-

8

Giant

2 TB

2

-

-

-

8

10

Total

-

518

36

236

12

56

858

GPU Nodes

The second phase of the Dardel system is a small but powerful partition consisting of 56 GPU nodes, each with four future AMD Instinct GPUs. The GPU nodes are expected to be delivered at the end of this year and will increase the system’s total capacity by more than a factor of two.

Updates to the PDC System Environment

In conjunction with the installation of Dardel, some updates in the previous PDC system environment are planned. For example, there will soon be a choice of login methods: using either a Kerberos-based approach, or a Secure Shell (SSH) login method relying on key pairs. This is discussed more in the article “PDC Portal for Improved Login” .

The PDC system environment has been utilising two systems for storing data files: AFS and Lustre. The AFS system has primarily been used for user’s home directories; small amounts of data could be stored there in the longer term. PDC’s Lustre system is a parallel file storage system that has mainly been used to provide extremely fast access to large amounts of data for running simulations. However, that situation is about to change. Existing home directories in the AFS file system will be replaced by home directories in the Lustre system. Note that the plan is for the AFS file system to still be available from the Dardel file transfer nodes, so researchers will be able to transfer any data they have stored in the AFS file system to Lustre.

Preparing the PDC computer hall and installing Dardel

Photos of the preparatory work in the PDC computer hall earlier this year, along with scenes from the week of 2-5 August 2021 when Dardel was delivered and installed at PDC, can be seen at www.pdc.kth.se/hpc-services/computing-systems/photo-timeline .