Events:
Until resolved/an update is available we ask any user to resist other actions than submitting/checking jobs, edit plain files.
Please avoid spawning a new ssh session every other second, to initiate a massive file transfer, or start I/O intensive multi-cpu/multi-task heavy pre/post-processing analysis of very large data sets, &c
Any job running making use of /cfs/klemming/ beetween roughly 2024-03-10/20:00 and the restart this morning likely affected. Potentially completely stuck.
As many compute nodes got flagged being in poor shape, and do not run any jobs, we will take this opportunity and re-start them with a bug fix (CAST-35315) aimed at the lustre _client_ kernel bug.
The lustre server side bug remain.
New jobs started will run on a compute where lustre client kernel bug is fixed.