Events:

2006-12-29 at 14:00 [xxx (lenngren)]
To be able to perform maintainance on the Infiniband interconnect of the Lenngren cluster, a service window has been scheduled for that system on Friday 2006-12-29 starting at 14:00. The work is expected to take 3 to 4 hours.
2006-12-14 at 15:54
e-mail: a yet unknown set of users affiliated with mimer.kth.se (in any random way: former students, guests, researchers, ..) might be victims of an outage in proper forwarding/reception of e-mail. This is still going on, and has been, for the past 24 hours. We are sorry for any inconvenience.
2006-10-18 at 10:00 [xxx (HSM)]
The HSM machine "{hsm,pynchon}.pdc.kth.se" will get updated system software. Operations to/from HSM (hget/hput) will not be available. Start 10:00 (local time), planned duration 6 hours, probably shorter.
2006-10-05 at 09:57 [xxx (lenngren)]
A handful of nodes have deadlocks in the scali-licensing scheme, causing sca-mpi jobs to hang.
2006-09-21 at 18:01 [xxx (SBC / CBR)]
SBC cluster scheduler node has been replaced and the queue should be running again.
2006-09-21 at 16:00 [xxx (SBC / CBR)]
The scheduling node in the SBC cluster has been discovered hung since a couple of hours. Jobs submitted during this time are not lost but will not appear in the queue until the scheduling node is back up again.
2006-09-27 at 10:00 [xxx (HSM)]
The HSM machine "{hsm,pynchon}.pdc.kth.se" will get fixed software which requres reboot and while we are at it it will be moved to a more convinient location. Operations to/from HSM (hget/hput) will not be available. Start 10:00 (local time), planned duration 6 hours, probably shorter.
2006-09-18 at 10:46
redistributed information for CSC-homed users: Maintenance work on many of the servers at CSC will be performed on Saturday Sep 23 starting at 10 am. Most UNIX computers at Nada will be heavily affected during this time. Services like e-mail and www, will also be affected.
2006-08-27 at 10:00
Informational, for users with bits at CSC: Maintenance work on many of the servers at CSC will be performed on Sunday Aug 27 starting at 10 am. Most UNIX computers at Nada will be heavily affected during this time. Services like e-mail and www, will also be affected.
2006-08-21 at 11:52
Clarification: we have not lost any incoming emails, they have only been queued until our mailserver can accept them. There is thus no need to resend any mails.
2006-08-21 at 11:42
PDC suffered a mail outage this morning. I.e., no mails were received. Should now be back to normal.
2006-08-11 at 11:27 [xxx (lenngren)]
The PDC Summer School will be held August 14-25 and will have higher priority on Lenngren during daytime weekdays. This may also occur on Lucidor.
2006-08-07 at 18:18 [xxx (SBC / CBR)]
AFS server valine salvaging. Will probably serve files again in ~60 minutes.
2006-08-07 at 15:31 [xxx (SBC / CBR)]
AFS server valine hang. Cause unknown. Work in progress.
2006-08-07 at 17:28
The afs-server valine.pdc.kth.se is currently being restarted.
2006-07-10 at 18:04 [xxx (lucidor)]
The Lucidor login node blumino.pdc.kth.se has been restarted to fix an AFS client crash. No jobs should be affected by this.
2006-06-22 at 18:14
Tomorrow, 2006-06-23, is Midsummer Day. Public parts of PDC have holiday.
2006-06-20 at 09:46 [xxx (lenngren)]
Staff error removed parts of running jobs.
2006-06-02 at 11:24 [xxx (lucidor)]
The Lucidor login node will be rebooted at approx. 3.30 PM to fix an AFS cache problem.
2006-06-02 at 11:24
The PDC helpdesk will be closed during Jun 5-6 (National Day) and reopens on Wednesday morning.
2006-05-24 at 11:22
The PDC helpdesk will be closed during May 25-26 (Ascension Day) and reopens on Monday morning.
2006-04-12 at 13:23
On Maundy Thursday, the PDC helpdesk closes for Easter at noon. It will be closed both Good Friday and Easter Monday to be reopened on tuesday morning.
2006-04-06 at 15:15 [xxx (lenngren)]
The log in (lise/d14n36.pdc.kth.se) is being restarted due to excess user behaviour. System processes risk of getting damaged.
2006-04-06 at 13:00
All systems have resumed operation. Crashed jobs won't charge accounts where applicable. Please report anomalies.
2006-04-06 at 09:29
We did kill all batch-systems ~0700 this morning. We will gradually re-start systems during the day.
2006-04-06 at 09:25
Broken router-hardware (~0545 this morning) caused all our networks go bad. Most certainly all running jobs affected.
2006-03-23 at 15:45 [xxx (HSM)]
The HSM machine was rebooted due to resource deprivation.
2006-03-09 at 10:48 [xxx (lucidor)]
One rack (16 nodes) gone due to ethernet switch failure.
2006-03-06 at 12:08
AFS: during the past weekend one afs-server did experience problems twice; at 2006-03-04 shortly after 02:00 and at 2006-03-05 shortly after 07:00. Please report failures.
2006-03-03 at 12:05 [xxx (lenngren)]
fixed failed date in previous message.
2006-03-03 at 12:00 [xxx (lenngren)]
The scali (mpi) license-daemon-clients randomly did segfault (fail.) Probably causing massive failures of mpi-jobs.
2006-03-02 at 16:04
AFS File server sculpin was out Thu Mar 2 15:36:06 2006 to Thu Mar 2 15:49:54 2006. Sorry, we are working on it. Typical error message during that period: "Connection timed out".
2006-03-01 at 15:09
AFS on server sculpin was down between Wed Mar 1 13:55:28 2006 and Wed Mar 1 14:09:03 2006. Probable cause is an AFS fileserver program problem. If your job crashed or did get strange results in this time period, that is most probably the cause.
2006-02-27 at 17:40 [xxx (SBC / CBR)]
SBC cluster queue is unpaused after replacement of a broken disk on the scheduler node. No jobs should have been affected by the disk failure.
2006-02-27 at 15:30 [xxx (SBC / CBR)]
The SBC cluster queue is temporarily paused due to hardware failure.
2006-02-19 at 10:00
If your $HOME or another AFS volume that you are working on is in the nada.kth.se AFS cell, you may experience problems during the day according to the Staff at NADA:

Maintenance work on many of the servers at Nada will be performed on Feb 19 starting at 10:00 am.

Most UNIX computers at Nada will be heavily affected during this time. Services like E-mail and WWW, will also be affected.

PDC-only users will not be affected. NADA users plan your batch runs accordingly.

2006-02-07 at 21:01 [xxx (lenngren)]
Interactive nodes disabled until we have deviced better methods against deviced misuse of resources.
2006-02-07 at 21:01 [xxx (lenngren)]
lise.pdc.kth.se (log in) emergency reboot due to excess use of resources.
2006-01-30 at 13:00
Network problems in spite (because?) of newly replaced Extreme BD router management card. After running diagnostics 14:30 the router started to behave again and did not show any errors. Case opened with router vendor.
2006-01-25 at 01:52 [xxx (lenngren)]
mpi/lenngren/lise/juliana/quantum chemistry: a large set of node-licence-daemons got deadlocks obtaining licenses for mpijobs. These daemons are being restarted.
2006-01-23 at 16:19
The PDC network was rendered unreliable this afternoon (due to DNS/router failures). It is now stable again, but we want to caution our users that there might occur relapses.
2006-01-16 at 16:00 [xxx (lenngren)]
Node encountered a file system bug, needs a reboot
2006-01-04 at 16:27
Informational: echo for nada/KTH CSC users:The fileserver gre.nada.kth.se will be taken off-line for preventive maintenance between 17:00 and 19:00 today, 2006-01-04.
All flash news for 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995

Back to PDC
Subscribe to rss