Dardel expansion starting

Published Sep 01, 2022

The Dardel system is about to be significantly updated with:

  • 480 more central processing unit (CPU) nodes with different memory sizes,
  • 56 extremely powerful graphics processor unit (GPU) nodes,
  • a new interconnect (SlingShot 11) going from 100 to 200 Gbps, and
  • a 50% increase in storage capacity

as described in the previous PDC Newsletter (see Dardel Second Phase on the Way ).

The plan is for this expansion and upgrade of the system to start in September and continue for the rest of this year. The process will start with an update of the system's control software - this will be done in several phases, each consisting of many steps. A large part of this work can be done while the system is in normal operation. However, some of the steps that are involved in updating the control software must be performed with the system shut down or with components like the scheduler restarted. The risk for unforeseen problems during these steps is higher than usual, so unscheduled downtime may occur. This phase of the upgrade may take until October to be completed.

The next phase of the upgrade will be to install new GPU and CPU hardware running SlingShot 11 in the Dardel system. PDC had initially planned to do the interconnect update in one large operation while the whole system was shut down. However, Hewlett Packard Enterprise (HPE) has recently developed a method to update the interconnect on "islands" of the system. This means that part of the system can use SlingShot 10 while the rest is using SlingShot 11. Both Slingshot interconnects will be able to work in the Dardel system and communicate with the storage system. However, it will not be possible for jobs to be run on both the SlingShot 10 and the Slingshot 11 partitions of Dardel at the same time. This recent development by HPE means that researchers should be able to continue using a significant part of the Dardel system while a small fraction of the system is being updated. There will be several restarts needed during this period as well.

The whole upgrade process may, at times, cause some inconvenience to researchers using Dardel during the autumn, however the resulting system will be extremely powerful and able to serve the research community for many years.