Skip to main content

Severe problems after Dardel update

System can be used while HPE works on fixes but bugs might strike at any time

Published Feb 28, 2024

Dardel was updated to a newer software stack, which led to severe Lustre problems

Dardel was switched to a new software stack (called Raspberry) on Wednesday the 31st of January 2024. The new stack caused some problems with the Lustre disk system. This means applications that are I/O heavy could make the Lustre file system extremely slow for some time or even crash the application due to a bug in the Lustre client. All I/O intensive applications were affected by this Lustre bug. The VASP application was especially affected. We have identified this to be the VASP application using I/O in a way that triggers this bug. The fix for the VASP application was relatively straightforward and is now implemented in most VASP versions. Please read the section below for information on how to use the updated VASP applications. We updated the Lustre server software during the week starting on the 19th of February. This had no effect on the bug and a new, more severe bug was introduced that sometimes crashes the Lustre server. It seems that the old Lustre bug triggers the new server-side Lustre bug. At this time, HPE is working on fixes for the bugs. The system is open for use during this period, but users should be aware that the bugs can hit at any time.

To use the updated Dardel system

  • Log in to the dardel.pdc.kth.se login node in the usual way.
  • PDC has updated some of the application software. Such software can be used by loading the module PDC/23.03:
ml PDC/23.03

  Also, use PDC/23.03 to compile software.

  • To find the module where software XYZ is located, the command
module spider XYZ

  can be used. Also, the software in the module PDC/22.06 should work (but it is better to use PDC/23.03 if the package that you want to use is available in this module).

To use the GPU nodes

  • Log in as above.
  • The default version of the AMD GPU Stack ROCm is now 5.0.2. Version 5.3.3 has also been installed but is not supported by HPE under Raspberry. Advanced users can try 5.3.3 at their own risk.

To use the updated VASP applications

PDC has updated most of the VASP applications so they do the I/O in a better way. Such software is located in the module PDC/23.03 and can be used by the following commands:

ml PDC/23.03 #Load the PDC/23.03 module

module av vasp #List all (updated) VASP modules in PDC/23.03

ml vasp/6.3.2-vanilla #Load one of the VASP modules