Events:
- 1999-12-20 at 16:20
- License server : The PDC license server was rebooted after patching. It should not have affected any running programs.
- 1999-12-15 at 16:35
[xxx (strindberg)]
- Strindberg: Switch fault recovered, resuming batch.
- 1999-12-14 at 16:20
[xxx (strindberg)]
- Strindberg: Switch fault.
- 1999-12-14 at 12:20
- Selma: Major failure on the system disk caused us a lost day,
but no lost user data. Please let us know otherwise.
- 1999-12-10 at 15:14
[xxx (strindberg)]
- Strindberg: Switch/GPFS back.
- 1999-12-10 at 13:55
[xxx (strindberg)]
- Strindberg: Switch/GPFS problem.
- 1999-11-30 at 15:46
[xxx (strindberg)]
- Strindberg : job allocation restarted. GPFS was gone on
a number of nodes. Now resolved.
- 1999-11-30 at 13:27
[xxx (strindberg)]
- Strindberg : job allocation stopped. Investigating
possible problems.
- 1999-11-26 at 09:00
- Kallsup : The machine will be brought down to change a broken disk at
14.00 on Monday 29/11.
- 1999-11-24 at 13:00
[xxx (strindberg)]
- Strindberg: /gpfs/projects is unavailable during reconfiguration.
- 1999-11-24 at 10:00
- There were intermittent nameserver problems during the night.
- 1999-11-23 at 23:00
- Things should now be back to normal.
- 1999-11-23 at 22:00
- We're having some namesever trouble.
- 1999-11-10 at 15:00
- The PDC license server will be brought down at 16.00 to change a faulty
memory module.
- 1999-11-10 at 14:00
- All PDC Systems will be unavailable from 1999-11-13 09:00
to 1999-11-14 due to power maintenance work (UPS).
- 1999-11-10 at 13:00
- Kallsup&HSM: HSM is know working normally.
- 1999-11-09 at 11:00
- Kallsup&HSM: HSM unreachable, service in progress
- 1999-11-02 at 01:00
- license server about to be restarted.
- 1999-11-01 at 20:00
- All production resumed.
- 1999-11-01 at 15:00
- Further problems with the UPS caused a major power outage.
All production was stopped.
Problem hopefully (again) circumvented for now.
- 1999-11-01 at 02:00
- All production is resumed.
- 1999-11-01 at 01:00
[xxx (strindberg)]
- All systems currently up except GPFS on Strindberg/August
on which we do file system consistency checks, fsck.
These expected to complete in quite some time, allocation
is paused during fsck.
We expect to run further tests on most systems later
today, 1999-11-01, during daytime.
The UPS is bypassed for the time being.
- 1999-10-31 at 20:00
- We hopefully have power now.
Don't expect anything to work for a
while, though. The problem might
be because of a faulty UPS.
- 1999-10-31 at 17:45
- We're currently having a major
power outage. To be continued...
- 1999-10-28 at 15:00
- HSM: currently disabled because of hardware trouble. We're
waiting for a service technician.
- 1999-10-22 at 12:00
- AFS: stability problems with 1 vldb (volume
location database) server.
- 1999-10-18 at 14:00
[xxx (strindberg)]
- Strindberg: Unfence operation causes gpfs-unmount.
- 1999-10-10 at 22:30
- Kallsup&HSM: probable cause - LD cache error caused panic.
- 1999-10-10 at 20:40
- Kallsup&HSM: Kallsup unreachable, fault search to begin.
- 1999-10-05 at 21:00
[xxx (strindberg)]
- Strindberg: During the Wednesday service window enabled tomorrow,
1999-10-06, users might not be able to access any GPFS files.
- 1999-09-21 at 10:00
- ADSM/HSM/DSM: tape robot offline due to maintenance/expansion.
- 1999-09-20 at 09:00
[xxx (strindberg)]
- Strindberg: possible filesystem hang on the log in.
- 1999-09-13 at 17:15
[xxx (strindberg)]
- Strindberg: Switch adapter replaced.
- 1999-09-09 at 20:00
[xxx (strindberg)]
- Strindberg: global switch fault.
- 1999-09-09 at 19:30
- Sprat : Back online after the OS upgrade.
- 1999-09-09 at 15:00
[xxx (strindberg)]
- Strindberg: Strindberg: switch adapter fault probably caused
problems for a few running jobs.
- 1999-09-09 at 09:00
- Sprat : The machine will be brought off-line for an OS upgrade
during the day. It should be back up by 12.00 on friday (10/9).
- 1999-09-07 at 14:00
[xxx (strindberg)]
- Strindberg: switch adapter fault probably caused
problems for a few running jobs.
- 1999-09-02 at 11:20
- Selma: Selma is having trouble.
- 1999-08-25 at 12:30
- Mail: the PDC mail server has disk problems.
- 1999-08-23 at 11:04
[xxx (strindberg)]
- Strindberg: Recovery completed.
- 1999-08-23 at 08:30
[xxx (strindberg)]
- Strindberg: Job scheduling paused. Control Workstation / switch problems.
Recovery in progress.
- 1999-08-18 at 11:00
[xxx (strindberg)]
- Strindberg: Job scheduling resumed.
- 1999-08-18 at 08:50
[xxx (strindberg)]
- Strindberg: Job scheduling stoped due to unstable fileserver.
- 1999-08-17 at 18:10
[xxx (strindberg)]
- Strindberg: Job scheduling resumed.
- 1999-08-17 at 16:20
[xxx (strindberg)]
- Strindberg: Job scheduling stoped due to fileserver
fault. Some users and programs (g98) affected.
- 1999-08-10 at 10:00
[xxx (strindberg)]
- Strindberg: we are about to resume batch runs within short.
- 1999-08-10 at 09:00
[xxx (strindberg)]
- Strindberg: global switch fault. All running jobs affected.
- 1999-08-03 at 17:00
[xxx (strindberg)]
- Strindberg: The log in node strindberg will be rebooted
prior 18:00, 1999-08-03.
- 1999-08-03 at 15:00
[xxx (strindberg)]
- Strindberg: Log in node strindberg is stuck, fault search
in progress. Please use the log in august in case you are
not CPU dependent.
- 1999-07-03 at 23:00
[xxx (strindberg)]
- Strindberg: Problems with the LoadLeveler caused the scheduling
system to be down most of the night.
- 1999-07-02 at 10:08
[xxx (strindberg)]
- Strindberg: Gaussian98 defaults changed through creating a
Default.Route file of -M- 140MB and -#- MaxDisk=2048MB
- 1999-06-29 at 15:40
- General: There seem to be a routing problem at KTH that
may cause users to have problem locating DNS servers and/or
hosts in the KTH domain. The problem is outside the control
of PDC but is worked on. If you have experienced problems they
are hopefully resolved by now.
- 1999-06-17 at 09:30
[xxx (strindberg)]
- Strindberg: unfence operation might have caused
switch related problems for running jobs.
- 1999-06-08 at 09:50
- Kallsup&HSM: Kallsup is back online, however due to security problems
rxtelnet does not work for now, regular kerberized telnet should
work fine though.
- 1999-06-08 at 09:20
- Kallsup&HSM: Machine hung. Dump and reboot in progress.
- 1999-06-04 at 12:00
- Hardware maintenance on Boye
- 1999-06-02 at 12:30
- License server problems resolved.
- 1999-06-02 at 12:00
- Hardware problems with license server.
- 1999-05-29 at 10:00
- Kallsup: full usr/spool filesystem might
have caused problems for some nqs jobs.
- 1999-05-27 at 16:00
[xxx (strindberg)]
- Strindberg: Login node problems. The login node will be available
again within short.
- 1999-05-21 at 10:00
- Boye: Hardware problems, some parts will be replaced during the day.
- 1999-05-17 at 12:00
- GPFS: rebalancing the filesystem. The rebalance
will reduce filesystem performance during the
next two hours.
- 1999-05-10 at 14:30
- Mail: maintenance service on one PDC mail-server.
- 1999-05-05 at 22:30
[xxx (strindberg)]
- Strindberg: running with latest software.
- 1999-05-05 at 13:30
[xxx (strindberg)]
- Strindberg: software (PTFs) upgrade started.
- 1999-04-21 at 19:25
- Kallsup&HSM: Reboot due to problems with the tape subsystem.
- 1999-04-20 at 09:45
[xxx (strindberg)]
- Strindberg: systems is back up.
- 1999-04-20 at 08:30
[xxx (strindberg)]
- Strindberg: several systems down.
- 1999-04-07 at 21:20
[xxx (strindberg)]
- Strindberg: the log in `strindberg' is repeatadly panic'ing.
The strindberg log in is once again moved back to the G-node.
It might take a while to move all submitted jobs, until then
node allocation is paused.
- 1999-04-06 at 22:30
[xxx (strindberg)]
- Strindberg: node allocation resumed.
- 1999-03-28 at 12:00
- General preventive: If you have
switched to daylight savings time period and have
problems to authenticate, please find the information
in `guided tours' about proper time.
- 1999-03-25 at 17:00
[xxx (strindberg)]
- Strindberg/gpfs: the gpfs-filesystem will be restarted.
- 1999-03-24 at 13:30
[xxx (strindberg)]
- Strindberg: todays hardware maintenance complete.
- 1999-03-24 at 10:00
- Kallsup: Kallsup will be brought down for hardware maintenance
08.00 on the 24/3.
- 1999-03-23 at 17:00
[xxx (strindberg)]
- Strindberg: hardware maintenance on the SMP log-in (august),
the batch-system node, and several batch-nodes. Starting
1999-03-24 at 10:00.
- 1999-03-22 at 13:00
[xxx (strindberg)]
- All systems/AFS: Problems with one AFS server have made
some directories unvisible from some file system clients.
The problem should be fixed around 18:00 today. Try to
use another computer (example: strindberg) to access your
files during that period.
- 1999-03-22 at 12:00
[xxx (strindberg)]
- Strindberg/gpfs: gpfs salvage complete. All files, except
those written with input/output error status during adapter
fault, should be correct.
- 1999-03-20 at 22:30
[xxx (strindberg)]
- Strindberg/gpfs: one broken disk/adapter have probably caused
loss of files created prior late friday night, 1999-03-19 or
possibly during saturday 1999-03-20.
- 1999-03-19 at 00:00
[xxx (strindberg)]
- Strindberg: batch lines let loose until next
maintenance window, at 17:00 later today.
- 1999-03-18 at 13:00
[xxx (strindberg)]
- Strindberg: gpfs hang on one of the log in nodes. We
have allocated a minimum of one hour maintenance window
each weekday starting at 17:00 hours, for the time being.
- 1999-03-17 at 15:30
[xxx (strindberg)]
- Strindberg: gpfs hang on the log in node. The log in
node will be moved. Please relogin to strindberg.pdc.kth.se
after the reboot.
- 1999-03-17 at 10:00
- Selma: OS upgrade in progress. See news page.
- 1999-03-16 at 21:00
[xxx (strindberg)]
- Strindberg: reformat of /gpfs/scratch filesystem is imminent.
- 1999-03-16 at 15:30
[xxx (strindberg)]
- Strindberg: has been running since approximately 14:00.
- 1999-03-16 at 13:30
[xxx (strindberg)]
- Strindberg: You can log in but we
are applying a few last minute changes
so batch is currently on hold.
See Getting Restarted after
the Upgrade
- 1999-03-13 at 17:30
- Kallsup: Back up after dump/reboot.
- 1999-03-13 at 16:30
- Kallsup: Uncorrectable error in kernel; dump,
analysis and restart in progress.
- 1999-03-11 at 15:00
[xxx (strindberg)]
- Strindberg: We estimate that the upgrade will
be complete and Strindberg back in production
Tuesday, 1999-03-16 at 13:00. As of now only
test-users have permission to run.
- 1999-03-10 at 21:00
[xxx (strindberg)]
- Strindberg: Further information about upgrade
process will follow tomorrow, 1999-03-11.
- 1999-03-10 at 14:00
- Kallsup&HSM : The machine is now open for users again.
- 1999-03-06 at 18:00
- Kallsup&HSM : Due to the delayed delivery of the replacement for the
failed disk drive, Kallsup and the HSM will not be back up
until thuesday 9/3 at the earliest.
- 1999-03-06 at 09:00
[xxx (strindberg)]
- Strindberg: system software and hardware upgrade in progress.
Please monitor this news page for further information as the
upgrade proceedes.
- 1999-03-05 at 09:00
- Kallsup&HSM : The current status of Kallsup and the HSM is that we
have had one disk-failure and problems installing a new version
of DMF. We are currently awaiting delivery of a replacement disk
and a new copy of DMF from Cray.
- 1999-03-01 at 09:00
- Kallsup&HSM : Kallsup is undergoing an OS upgrade, both kallsup and
the HSM will be unavailable during this period and will be back up
Monday 8/3.
- 1999-02-24 at 12:00
[xxx (strindberg)]
- Strindberg: Switch problems. Scheduler stopped. Await
running jobs the switch will be restated at appr. 16:00
and scheduling resumed.
- 1999-02-18 at 22:00
- Kallsup&HSM: Hung, reset and restarted. Cause yet unknown.
- 1999-02-18 at 20:48
- Kallsup&HSM: Problems, investigation in progress.
- 1999-02-08 at 22:55
[xxx (strindberg)]
- Fileserver back on-line and
EASY allocation on strindberg resumed.
- 1999-02-08 at 22:02
[xxx (strindberg)]
- EASY on strindberg: allocation of new jobs paused.
- 1999-02-08 at 21:52
- AFS: one fileserver is not responding.
Investigation in progress.
- 1999-02-06 at 14:12
[xxx (strindberg)]
- Strindberg: Allocation resumed.
- 1999-02-06 at 13:00
[xxx (strindberg)]
- Strindberg: PIOFS down due to checkstop (hardware fault) in
one server node. Running jobs using piofs affected.
- 1999-02-06 at 12:30
[xxx (strindberg)]
- Strindberg: PIOFS problems, the queues will be stopped.
- 1999-02-05 at 23:00
- Kallsup/HSM: Recovery of the failed disks took long time but should be resolved now.
The machine is up and the queues have been restarted.
- 1999-02-05 at 12:00
- Kallsup/HSM: The change of power supply has been delayed a few hours.
- 1999-02-04 at 15:00
- Kallsup/HSM: The power supply will be changed tomorrow morning. Hopefully the machine will be back up by 12:00 am.
- 1999-02-04 at 14:00
- Kallsup/HSM: Disk failures probably caused by a broken power supply.
- 1999-02-04 at 09:00
- Kallsup/HSM: Kallsup shut down while investigating disk failures.
- 1999-01-27 at 11:00
- Network: One broken router card caused network dropout.
We now route affected networks through a backup router.
- 1999-01-24 at 19:30
[xxx (strindberg)]
- Strindberg: Job allocation resumed.
- 1999-01-24 at 12:00
[xxx (strindberg)]
- Strindberg: SDR and job-manager out of sync. Job
allocation held until all running jobs have completed.
- 1999-01-15 at 10:00
- Network: there were a network dropout a few minutes ago.
- 1999-01-05 at 15:30
- Licenses: one license server broken. You might experience
problems executing certain licensed software.
All flash news for
2024,
2023,
2022,
2021,
2020,
2019,
2018,
2017,
2016,
2015,
2014,
2013,
2012,
2011,
2010,
2009,
2008,
2007,
2006,
2005,
2004,
2003,
2002,
2001,
2000,
1999,
1998,
1997,
1996,
1995
Back to PDC
Subscribe to rss