Events:
- 2003-09-24 at 09:38
- Christmas holidays are approaching.
The PDC helpdesk will be closed from Dec 22nd, 2003.
It will open again on Jan 7th, 2004
- 2003-12-09 at 22:35
[xxx (strindberg)]
- Old SP system; restart and verification complete. further
testing of gpfs/projects will be performed with the filesystem
off-line (unavailable) later this week.
- 2003-12-09 at 14:49
[xxx (strindberg)]
- Old SP system; restarting gpfs on all nodes and also
eventually rebooting the log in node.
- 2003-12-08 at 07:35
- Relocating: the telephone routing might experience
hickups during the relocation of our offices.
- 2003-12-08 at 07:29
[xxx (strindberg)]
- /gpfs/projects on the old SP. There is a disk that
is reporting errors. As the machine is to be retired,
the files by default replicated, and there are backups,
we will take no action repairing the /gpfs/projects.
In case you have changed the replication factor your
file might contain unreadable sections (I/O error.)
- 2003-11-29 at 15:54
- The fileserver should now be back on-line.
- 2003-11-28 at 18:34
- One AFS server has crashed; unknown reason. Salvage will probably take a while.
- 2003-11-27 at 11:49
- All production systems: Node allocation
paused on production systems during, and until,
secondary server move complete.
- 2003-11-18 at 19:42
- PDC is relocating during November 2003 to Februray 2004. Please
find further information through the 'Upcoming Events' page.
- 2003-11-08 at 00:36
[xxx (SBC / CBR)]
- The SBC AFS file server alanine.pdc.kth.se has crashed due to faulty hardware. Further investigation will start on Saturday morning. Because of the crash, the SBC cluster queue has been temporarily halted.
- 2003-11-07 at 14:45
[xxx (lucidor)]
- Lucidor: the log in node, blumino, will be
rebooted to resolve paging problem.
- 2003-11-04 at 15:49
- ... but apparently not. Repeat until false.
- 2003-11-04 at 15:00
- One of the AFS fileservers had to be restarted. Everything should be back to normal now.
- 2003-11-06 at 13:00
[xxx (HSM)]
- The HSM system will be reconfigured on Thursday at 13:00.
It should only be down for an hour or so (or so goes the
plan).
- 2003-10-28 at 14:44
[xxx (HSM)]
- HSM system is back up again after a change of powersupply.
- 2003-10-27 at 14:38
- Network maintenance prior to PDC move planned
for tomorrow, 2003-10-28, between 1300 and 1500.
No side effects expected.
- 2003-10-24 at 13:50
[xxx (HSM)]
- Due to a server failure the HSM system will be unavailable
at least until monday.
- 2003-10-24 at 13:00
[xxx (HSM)]
- Due to a disk failure the HSM disksystem is being replaced.
- 2003-10-19 at 12:52
[xxx (strindberg)]
- The nighthawk log in node did crash earlier on today.
It has been recovered.
- 2003-10-15 at 19:30
[xxx (lucidor)]
- Job startup code slightly modified. Please report
any unexpected behaviour to pdc-staff@pdc.kth.se
- 2003-10-13 at 12:38
[xxx (strindberg)]
- Nighthawk - the interactive nighthawk is temporarily unavailable.
- 2003-10-12 at 20:01
[xxx (lucidor)]
- Lucidor: Reboot the log-in node (blumino/h06n05.)
- 2003-10-06 at 20:50
[xxx (lucidor)]
- Upgrade complete. Note that you might have to relink
your code in case of gm (myricom) or kernel dependencies.
'module add mpich' gives proper mpich/gm default path.
Please report any strange behaviour to pdc-staff@pdc.kth.se!
- 2003-10-03 at 17:17
[xxx (lucidor)]
- All nodes will be upgraded to linux kernel version 2.4.22 Monday
2003-10-06. We will at the same upgrade to myricom gm
2.0.6. You might have to relink your code if you are
using the myrinet.
- 2003-09-24 at 21:36
[xxx (strindberg)]
- The power2 and ppc sections of strindberg are
back in line.
- 2003-09-24 at 09:38
[xxx (lucidor)]
- System work on Lucidor.
Operator is enforcing an allocation pause.
- 2003-09-24 at 09:38
[xxx (strindberg)]
- The pwr2 part of Strindberg is still down. Investigation is in progress.
- 2003-09-23 at 17:30
[xxx (strindberg)]
- The Power 2 part of Strindberg is currently down. The system probably won't be accessible again until some time tomorrow.
- 2003-09-12 at 10:53
- The SSH server will be brought down today (September 12th) for hardware and software upgrade.
- 2003-08-22 at 11:22
[xxx (SBC / CBR)]
- Updated Intel compilers to 7.1-31 build 20030813
- 2003-08-18 at 15:29
- The fileserver process on alanine crashed 20 minutes ago. Salvaging is in progress, but might take considerable time.
- 2003-08-11 at 17:10
- At 17:10 a short network outage will occur for some systems at PDC in order to upgrade the router software of pdc1-gw to
a more recent level. Most services will still be available through pdc2-gw. The planned duration is under 5 minutes
- 2003-08-04 at 09:47
[xxx (SBC / CBR)]
- The scheduler node is down. Investigation in progress.
- 2003-07-30 at 10:13
- The license server will be rebooted at 14.00 on Thursday 31/7.
During the reboot software licenses will be unavailable.
- 2003-07-30 at 10:03
[xxx (HSM)]
- The HSM server will be rebooted at 14.00 on Thursday the 31/7.
Service downtime should only be a few minutes.
- 2003-07-28 at 15:57
[xxx (SBC / CBR)]
- Updated Intelcompilers and Intel MKL libraries
- 2003-07-22 at 08:59
[xxx (SBC / CBR)]
- We are experiencing some scheduling problems with nodes
flapping up and down. Investigation in progress.
- 2003-06-30 at 21:47
- We regret to announce that PDC on-call service is discontinued from 1 July. We do not anymore guarantee that malfunctioning systems at PDC are repaired during holydays, nights, and weekends. For further information and questions contact Per Öster per@pdc.kth.se, +46 8 790 6261. Please, let us know about any inconvenience that this decision will cause you as a PDC-user.
- 2003-07-09 at 15:00
- At 2003-07-09 15:00 we will restart our AFS servers for upgrade of the AFS server software. Queues will be stopped and you may not be able to access your home directory for a short while. Duration 10 minutes (if sucessful) to 5 hours (if unsuccessful).
- 2003-06-27 at 10:13
[xxx (SBC / CBR)]
- Allocation is paused due to the AFS server problems.
- 2003-06-27 at 09:34
[xxx (SBC / CBR)]
- The AFS server crashed again about 10 minutes ago. We're working on it.
- 2003-06-26 at 11:57
[xxx (SBC / CBR)]
- Schedule pause tonight because of AFS server problems.
Sorry for the short notice.
/Harald.
- 2003-06-26 at 09:49
[xxx (SBC / CBR)]
- The AFS server crashed again yesterday evening. Investigation and repairs are in progress.
- 2003-06-25 at 13:07
[xxx (SBC / CBR)]
- One AFS server is having problems. Investigation going on.
- 2003-06-24 at 09:29
[xxx (strindberg)]
- /gpfs/scratch on the interactive nighthawk node (nf01n05) is currently unavailable. Investigation is in progress.
- 2003-06-19 at 13:47
- PDC Helpdesk is closed for holiday 2003-06-20 (Midsummer's eve).
We reopen at 08:00, 2003-06-23.
- 2003-06-09 at 15:00
[xxx (strindberg)]
- One node serving gpfs/bins is gone bad and data residing
on that node is not available until repaired.
- 2003-06-07 at 10:00
[xxx (strindberg)]
- One log in node on the old SP system (strindberg) had
its resources overused; Jobs connected to the node
were not able to start until it was restarted.
- 2003-06-06 at 17:03
[xxx (strindberg)]
- The broken node has been replaced and all
files should now be available again.
- 2003-06-06 at 16:00
[xxx (strindberg)]
- One node serving gpfs/projects and gpfs/scratch is gone bad and
data on that node is currently not available.
- 2003-05-30 at 10:58
- PDC Helpdesk is closed for holiday from 11:00, 2003-05-30. We reopen at 08:00, 2003-06-02.
- 2003-05-15 at 18:00
- Due to an electrical rework starting 2003-05-21 at 2100 we
will as a precaution put several production systems on hold.
Some filesystems, i.e., gpfs will also be unmounted during
the rework, this to extend UPS battery lifetime. The rework
itself is supposed to take 12 minutes, resuming all operations
will take longer.
- 2003-05-13 at 13:52
- PDC:s main mailserver is currently down.
- 2003-04-30 at 12:00
- PDC Helpdesk is closed for May First from 12:00, 2003-04-30. We reopen at 08:00, 2003-05-05.
- 2003-04-27 at 15:56
[xxx (HSM)]
- Due to a hardware failure, the HSM system won't be able to fetch
files that reside on tape. Files that are already on disc will
still be accessible and new files can be added as long as there
is free space on the discs.
- 2003-04-24 at 14:00
[xxx (HSM)]
- The HSM server will receive an OS upgrade and will be down
for two hours.
- 2003-04-16 at 16:51
[xxx (strindberg)]
- Maintenance/rearrangement of the old SP will be performed
over the Easter holidays. The availability of resources
within it will vary.
- 2003-04-17 at 12:00
- PDC helpdesk is closed for Easter Holidays from 12:00, 2003-04-17. We reopen at 08:00, 2003-04-22.
- 2003-04-14 at 12:54
- boye has a powersupply failure -> VRcube down.
Service requested from SGI.
- 2002-04-10 at 10:23
[xxx (SBC / CBR)]
- The AFS problem is NOW. Previous event has incorrect date-tag.
- 2002-03-26 at 09:53
[xxx (SBC / CBR)]
- One SBC AFS server is confused, causing some volumes to be
unavailable.
Investigation in progress.
- 2003-04-08 at 14:24
- Informational: Users with home at /afs/nada.kth.se:
The AFS servers at Nada will be upgraded on 9 April
starting at 18:00. Other nada-services will also
be affected. Pure PDC users should not be affected.
- 2003-04-04 at 13:55
[xxx (strindberg)]
- Nighthawk: new default IBM C and Fortran compilers has
been changed to vac 6.0 and xlf 8.1.
- 2003-03-31 at 13:00
[xxx (strindberg)]
- Nighthawk:node serving parts of /gpfs/scratch back online. The
filesystem should operate normally.
- 2003-03-30 at 15:55
[xxx (strindberg)]
- Nighthawk: one node serving /gpfs/scratch is signaling
a power supply fault. Reduced capacity/availability
of nighthawk:/gpfs/scratch.
- 2003-03-26 at 17:27
[xxx (strindberg)]
- CFD program Fluent 6.1.18 installed
- 2003-03-24 at 16:27
[xxx (SBC / CBR)]
- At 17:00 20030403 the SBC-cluster login node itchy.pdc.kth.se will be taken down for physical relocation. The downtime is estimated to 30 minutes. Please use krusty.pdc.kth.se during the move.
- 2003-03-24 at 15:25
[xxx (linux lab)]
- NAGWare Fortran Tools installed
- 2003-03-21 at 17:15
[xxx (strindberg)]
- Nighthawk: the interactive nf01n05 is put into service.
- 2003-03-21 at 16:52
[xxx (strindberg)]
- Nighthawk: same node did dump again. Note: the affected
filesystem should read /gpfs/scratch!
- 2003-03-21 at 15:16
- Fileserver carp restarted due to excessive load.
- 2003-03-20 at 23:08
[xxx (strindberg)]
- 2003-03-20 at 12:30 [Strindberg]
Nighthawk: one node serving /gpfs/projects did once again
dump causing a temporary unavailability of /gpfs/projects.
Once again resumed. Investigating more thoroughly.
- 2003-03-20 at 12:30
[xxx (strindberg)]
- Nighthawk: one node serving /gpfs/projects did dump causing
a temporary unavailable /gpfs/projects. Now resumed.
- 2003-02-21 at 18:30
[xxx (SBC / CBR)]
- afs - overcame problems related to testing of new sbc
afs-servers. scheduling resumed.
- 2003-02-15 at 13:45
[xxx (strindberg)]
- Old SP; there are problems with the HA subsystem. node allocation
is paused until the problem is resolved.
- 2003-02-10 at 17:15
[xxx (selma)]
- The disk holding the /scratch partition has been scratched for good - a controller card broke down and will not be repaired.
NQS will be stopped until the system is reconfigured to run
with another (smaller) /scratch.
- 2003-02-10 at 12:30
[xxx (strindberg)]
- Nighthawk; one frame (with K-nodes) has power supply problems.
- 2003-02-04 at 09:34
[xxx (linux lab)]
- Allocation paused. We will do some small network adjustments in the internal (ethernet) network of the cluster during the day.
- 2003-02-04 at 00:06
[xxx (strindberg)]
- Strindberg (old SP system): as there are aftershake-problems
with parts of the hardware after the power outage, we will
insert several blanks in the node allocation in the coming days.
Also, please do report eventual problems.
- 2003-02-03 at 18:34
[xxx (strindberg)]
- Switch instabilities on the old SP system; jobs on the
old system most certainly affected.
- 2003-02-03 at 15:02
[xxx (strindberg)]
- Scheduling resumed on both SP systems.
Please report any bogusities to sp2-staff@pdc.kth.se.
- 2003-02-03 at 12:09
- AFS recovery completed. Still holding queues because we have not verified the batch functionality yet. Rumours at KTH say that a faulty transformer on the KTH grouds was to blame for the outage.
- 2003-02-05 at 00:30
- Major power outage at KTH. Will take some time.
Working on getting up servers / haba.
- 2003-01-31 at 01:24
[xxx (SBC / CBR)]
- A non-responding fileserver process on one
afs-server has been restarted.
- 2003-01-28 at 10:42
- We're having routing problems which cause a lot of
dropped traffic in to and out of PDC at present. We
are working on fixing the problem.
- 2003-01-24 at 15:25
[xxx (strindberg)]
- Kerberos 5/Heimdal upgrade on the system. If you get the
message: "kauth: unparsable time: -1" when acquiring long
lasting tickets on the SP. Use the command
kauth -l 1y
instead of -l -1
- 2003-01-24 at 15:25
[xxx (strindberg)]
- MASS libraries (Mathematical Acceleration Subsystem) updated
to version 3.2. This includes both scalar and vector versions.
- 2003-01-15 at 13:19
[xxx (strindberg)]
- /gpfs/projects is getting full. Please remove unneeded files.
Note that the nighthawk system is not affected.
- 2003-01-09 at 13:19
[xxx (strindberg)]
- Strindberg/Nighthawk; nf01n05 (shared/interactive) ran out of paging space and user processes was terminated.
All flash news for
2025,
2024,
2023,
2022,
2021,
2020,
2019,
2018,
2017,
2016,
2015,
2014,
2013,
2012,
2011,
2010,
2009,
2008,
2007,
2006,
2005,
2004,
2003,
2002,
2001,
2000,
1999,
1998,
1997,
1996,
1995
Back to PDC
Subscribe to rss