Events:
- 1997-12-23 at 12:00
[xxx (strindberg)]
- PDC: The Strindberg shop will close down from January 15 to 21.
During this period, availability of all PDC systems will vary
from limited to none due to hardware maintenance and electrical
work.
- 1997-12-10 at 12:00
[xxx (strindberg)]
- Strindberg: batch line enabled.
- 1997-12-08 at 21:00
[xxx (strindberg)]
- Strindberg: We will enable batch lines again but reserve the
afternoon of tomorrow, 1997-12-09, and will probably use the
Wednesday service window for more fixes and testing. Note: This
will probably involve several reboots of the log in node.
- 1997-12-08 at 18:00
[xxx (strindberg)]
- Strindberg: Batch processing stopped because of software maintainance
and bug fixes in the parallel environment. Strindberg will be accessible
for program development but no batch runs will be possible.
- 1997-12-08 at 14:20
[xxx (strindberg)]
- Strindberg: Hanging NFS file systems. Node allocation temporary
stopped. Plan to be back in service: 16:00.
- 1997-12-04 at 10:20
- On December 18th, the KALLSUP system will be brought
down for file system reorganizations in conjunction with the
installation of a new disk system from MAXSTRAT. The estimated down
time for this operation is two days.
- 1997-12-02 at 14:30
[xxx (strindberg)]
- Strindberg: The switch is restarted and new jobs are now allowed
to start.
- 1997-12-02 at 13:40
- Selma: Disk errors. Checking file system integrity. Up again at
1600 MET.
- 1997-12-02 at 12:30
[xxx (strindberg)]
- Strindberg: A switch fault caused all running jobs to crash.
- 1997-11-27 at 14:20
[xxx (strindberg)]
- Strindberg: Problems to connect to strindberg. Resolved.
- 1997-11-27 at 14:05
[xxx (strindberg)]
- Strindberg: Problems to connect to strindberg. Investigation
in progress.
- 1997-11-26 at 16:50
- Networks: We have had unstable local networks during
the past hours.
- 1997-11-26 at 09:45
[xxx (strindberg)]
- Strindberg: Piofs up and running again. Node allocation
is resumed.
- 1997-11-26 at 08:41
[xxx (strindberg)]
- Strindberg: Piofs had problems starting at around 02:00 this
morning. Due to this node allocation has stopped. Investigation
is in progress.
- 1997-11-25 at 23:40
[xxx (strindberg)]
- Strindberg: Node allocation is resumed.
- 1997-11-25 at 22:20
[xxx (strindberg)]
- Strindberg: Node allocation is paused. Simply put, there
have been problems with one server answering `no' to
authentication questions though things were OK.
- 1997-11-24 at 10:37
[xxx (strindberg)]
- Strindberg: You are welcome to log in to the upgraded Strindberg!
- 1997-11-19 at 12:00
- Selma: From 1200 to 1600 Selma will be
unavailable because of regular hardware
maintainance.
- 1997-11-13 at 12:08
[xxx (strindberg)]
- Strindberg: Around 1600 this afternoon we will
turn off several major fileservers for disk
reconfiguration. This means that many user's files will
not be accesible until tomorrow morning.
- 1997-11-12 at 14:43
[xxx (strindberg)]
- Strindberg: Since the upgrade has gone faster than
expected we will now start the most dramatic stage of the
whole procedure: switch replacement. More news to come.
- 1997-11-06 at 10:41
[xxx (strindberg)]
- Strindberg: At 08.00 1997-11-13 a gradual
upgrade of Strindberg will start. We expect to be back
in full production 17.00 1997-11-24. The upgrade
includes a change of hardware and software.
- 1997-11-03 at 09:00
- Kallsup: The machine will be brought down for hardware maintenance. The system is expected to back online again at 12.
- 1997-10-29 at 18:15
[xxx (strindberg)]
- Strindberg: Today a large amount of old stalled
mail from strindberg/easy was found and released. For those of
you who fancy archeology and received some pieces, please take part in the
excavation.
- 1997-10-21 at 04:00
- AFS: one fileserver down, node allocation is paused
until repair is finished.
- 1997-10-20 at 09:00 - 12:00
- Kallsup: Kallsup will be rebooted to reconfigure file systems.
Probably kallsup will be back into service earlier.
- 1997-10-13 at 08:50
[xxx (strindberg)]
- Strindberg: Node allocation file recreated. Node allocaton restarted.
- 1997-10-13 at 03:42
[xxx (strindberg)]
- Strindberg: Node allocation file lost. Node allocaton stopped.
- 1997-10-02 at 16:45
- Selma: Operating system crash. Dump and reboot in progress.
- 1997-09-26 at 09:00
- Check out our new System Usage page where all
running and queued jobs are listed.
- 1997-09-22 at 09:00
[xxx (strindberg)]
- Strindberg: unstable node-status-determination. Allocation Paused
during fault search.
- 1997-09-07 at 18:00
[xxx (strindberg)]
- Strindberg: A network adapter problem (HIPPI) caused a hang
of the login node (strindberg.pdc.kth.se). Recovery in progress...
- 1997-09-02 at 19:00
[xxx (strindberg)]
- Strindberg/AFS/networks: service window tomorrow, Wednesday
1997-09-03 between 13:00 and 15:00.
- 1997-08-31 at 11:00
- Kerberos: master (admin) server down - you cannot change
passwd until it's back.
- 1997-08-22 at 11:00
- Selma: /scratch was cleaned of all data older than 1 week.
Older data might be retrieved from /test until 1997-08-29.
- 1997-08-21 at 18:10
- Selma: More disk will be made available 97-08-22 sometime
between 09:30 and 11:00. This will include a reboot.
The NQS system will be stopped during the upgrade.
- 1997-08-21 at 08:30
- Kallsup: One IOP seems to be broken. Recovery
and reconfiguration to manage without it in progress.
- 1997-08-15 at 14:30
- Ongoing upgrade of kerberos authentification programs.
Affected programs: rxtelnet, kx and depending on your version,
ftp. Other programs should work just fine. To upgrade your
binaries, fetch a new travelkit.tar. See
http://www.pdc.kth.se/support/kerberos-tour.html for
guidelines concerning your operation system.
- 1997-08-15 at 11:30
- General: UPS exercise complete.
- 1997-08-15 at 07:28
- General: UPS (pdc power supply) service scheduled to
start today at 0900 hours. This is considered a
low risk operation, but to play it safe we have now
drained the whole machine of jobs. Hope to be back
again int the early afternoon.
- 1997-08-12 at 01:45
- KALLSUP: back online
- 1997-08-12 at 01:15
- KALLSUP: disk problem. Reboot and recovery in
progress. Running jobs withoutcheckpoint files were lost.
- 1997-08-08 at 12:00
[xxx (strindberg)]
- General: UPS (pdc power supply) service scheduled to
start 1997-08-15 at 0900 hours. This is considered a
low risk operation. To play it safe we will hold
Strindberg batch-lines and keep afs-servers off-line
during service anyhow.
- 1997-08-08 at 08:00
- One AFS fileserver began to have problems
early in the morning (round 02:00.)
- 1997-08-06 at 19:30
- The CrayDoc webserver has found a permanent home
at http://craydoc.pdc.kth.se:8080.
- 1997-08-06 at 18:00
- The CrayDoc webserver is moving around a little due to network
activities. Look at http://www.pdc.kth.se/kallsup
to find its present whereabouts.
- 1997-08-03 at 09:00
[xxx (strindberg)]
- Strindberg/Info - changed job priority on three jobs.
They will run out of ordinary queue-order.
- 1997-08-01 at 14:00
- Mail back to normal.
- 1997-08-01 at 12:30
- Temporary mail problems may cause email
to lists like "pdc-staff@pdc.kth.se" and
"sp2-staff@pdc.kth.se" to bounce. Please
use "pdc-staff@nada.kth.se" instead until
we've solved the problem.
- 1997-07-29 at 18:20
[xxx (strindberg)]
- Strindberg/Piofs, formatted and clean. Please
note that the the path of your pfs-catalogue
is /pfs/home/f/foo as it has been since May 1997.
- 1997-07-29 at 17:05
[xxx (strindberg)]
- Strindberg, log in node (syk-0606) reboot.
- 1997-07-29 at 16:45
[xxx (strindberg)]
- Strindberg/Piofs, replacement installed and
included in the config of one of the fileserver
nodes. To do: Eunfencing and formatting of
the /pfs.
- 1997-07-29 at 13:30
- Strinderg/Piofs, a replacement disk is on its way.
- 1997-07-29 at 12:00
[xxx (strindberg)]
- Strindberg: Problems with the parallel filesystem.
Fault search is in progress. Allocation is stopped.
- 1997-07-29 at 11:25
[xxx (strindberg)]
- Strindberg: Restart of the parallel filesystem servers.
- 1997-07-24 at 14:00
- Networks: there was a network dropout between
the SP and some file-servers.
- 1997-07-23 at 08:00
[xxx (strindberg)]
- Strindberg: hardware failure on one node. Switch did restart
by itself.
- 1997-07-21 at 09:50
- The mail should be working again. The work to move things from dolphin (which has a bad disk) is in progress. This should not affect most users.
- 1997-07-21 at 08:40
- The AFS server dolphin is currently down, this has unfortunately
affected the pdc-staff mail alias. Work is in progress.
- 1997-07-15 at 22:45
- The AFS servers at nada.kth.se are back in business.
- 1997-07-15 at 18:30
- In case you still rely on nada.kth.se you will have
serious problems right now - the main computer room
is being rearranged.
- 1997-07-07 at 09:35
- HSM system back online.
- 1997-07-07 at 08:00
- Upgrade of HSM system software and database conversions in progress.
The HSM system will be back again during the day.
- 1997-07-03 at 17:59
- Kallsup up again. Even checkpointed NQS jobs may
have been lost.
- 1997-07-03 at 17:17
- Kallsup: We have problems with hung disk controllers.
The system will be rebooted.
- 1997-07-01 at 08:00
- About to restart one fileserver.
- 1997-06-24 at 18:10
- Selma back up.
Please report any strange behaviour which might be
due to new software. ("This worked last week...")
- 1997-06-24 at 09:00
- Selma will be brought down for hardware upgrade.
Selma will be up again in the evening.
- 1997-06-20
- Midsummer holiday, the helpdesk will not be open during
Friday.
- 1997-06-16 at 10:30
[xxx (strindberg)]
- Reboot of log in node (syk-0606/strindberg).
- 1997-06-10 at 08:00
- Mail: one of the main mail-servers beneath kth.se
have a diskcrash. Expect delayed mail.
- 1997-06-05 at 11:00
[xxx (strindberg)]
- Strindberg: JobManager (JM) restarted as a consequence of
control-work-station crash.
- 1997-06-05 at 09:00
[xxx (strindberg)]
- Strindberg: Peculiar crash of the control-work-station.
Scheduler allocation stopped during fault recovery.
- 1997-06-02 at 09:20
- KALLSUP: On Monday 970616, KALLSUP will be broght down for
a minor hardware upgrade (more SCSI adapters). The upgrade is
estimated to start at 11:00 AM and require six hours of downtime
for upgrade, reconfiguration and testing.
- 1997-05-26 at 11:00
- KALLSUP: The programming envirment (complilers and libraries) has been
upgraded. See "Current News" below for details.
- 1997-05-21 at 21:00
[xxx (strindberg)]
- Strindberg: Gaussian94 revision update 970522. Gaussian
users should read "Current news" below.
- 1997-05-21 at 09:00
- Selma: System will be unavailable due to disk
repairs. Back again around noon.
- 1997-05-17 at 14:30
- Sheduler stopped due to local networking maintenance.
- 1997-05-09 at 20:20
- AFS: all users moved into afs-cell pdc.kth.se.
- 1997-05-09 at 17:00
- AFS: move into pdc-cell continuous, expect a flaky
filesystem for a couple of hours.
- 1997-05-07 at 10:00
[xxx (strindberg)]
- General: UPS (pdc power supply) service scheduled to
start 1997-05-13 at 0900 hours. This is considered a
low risk operation. To play it safe we will hold
Strindberg batch-lines during service anyhow.
- 1997-05-07 at 03:05
[xxx (strindberg)]
- Strindberg + AFS: Pike, a major fileserver halted.
Investigations are going on, node allocation on SP2
is paused until more facts are known.
- 1997-05-07 at 01:44
- Selma: Halted and rebooted due to memory allocation error of the OS.
- 1997-05-05 at 14:30
- AFS: we are about to reboot one fileserver, some home
directories will be inaccessible during reboot.
- 1997-05-03 at 10:00
[xxx (strindberg)]
- Strindberg: PIOFS did go out of order for a couple of
minutes. Please let us know if your job have been hit.
- 1997-04-29 at 09:00
- Selma: Scheduled maintainance, benchmarking an tuning of I/O system.
No batch or interactive login during Tuesday. Up again Wednesday.
- 1997-04-15 at 16:20
[xxx (strindberg)]
- Strindberg: New rules for PIOFS usage. PIOFS users should read
/pfs/README.PFS for details.
- 1997-04-14 at 18:00
[xxx (strindberg)]
- Strindberg: Hardware maintenance in PIOFS server nodes. New jobs will
start at 1000.
- 1997-04-09 at 18:00
[xxx (strindberg)]
- Strindberg: The system will be rebooted to activate installed software updates.
Please note the HPS switch still has stability problems.
- 1997-04-09 at 08:00
[xxx (strindberg)]
- Strindberg: Switch fault. Recovery in progress
- 1997-04-08 at 13:00
[xxx (strindberg)]
- Strindberg: Hardware maintenance completed.
- 1997-04-07 at 11:00
- AFS: restart of file servers in the afs-cell nada.kth.se.
switch
diagnostics. The operation is estimated to be completed before 18.00
the same day.
- 1997-04-07 at 08:30
[xxx (strindberg)]
- Strindberg: Switch fault and restart in progress.
- 1997-04-04 at 13:00
[xxx (strindberg)]
- Strindberg: Switch fault and restart in progress.
- 1997-04-02 at 17:00
- AFS: running salvage on one fileserver in the afs-cell nada.kth.se.
- 1997-04-02 at 12:00
[xxx (strindberg)]
- Strindberg: Replacement of a SSA adapter used by
piofs scheduled at 1430. No new jobs started until then.
- 1997-04-02 at 09:00
- Kallsup will be brought down for hardware maintenance (CPU replacement)
1997-04-03. The system will be unavailable from 0900 until 1800.
This maintenance operation will also affect HSM users.
- 1997-04-01 at 10:00
- Today, PDC signed a contract for a major upgrade of PDC's CM200, 'Bellman'.
The new system, which will contain 64k processors, is expected to be functional
May 18th. At the same time, PDC has also aquired two new DataVault mass storage
units from another supercomputing center.
- 1997-03-31 at 00:20
- Job started that cross tomorrow mornings, 1997-03-31
0900, boundary due to switch fault.
- 1997-03-30 at 22:30
- Restart of switch due to switch fault.
- 1997-03-27 at 07:00
[xxx (strindberg)]
- Strindberg HPS switch fault, restart in progress.
- 1997-03-26 at 08:15
[xxx (strindberg)]
- Strindberg HPS switch fault. Some night jobs may not have completed.
The system is now running normally again.
- 1997-03-19 at 16:10
- We will reboot one fileserver and the control-work-station
at approx 16:45. Users residing on the fileserver will be
affected during reboot.
- 1997-03-19 at 14:05
- Switch fault - repair in progress.
- 1997-03-17 at 14:00
- Running SP jobs were lost due to a major switch fault.
The SP is now up and running again.
- 1997-03-10 at 19:00
[xxx (strindberg)]
- Strindberg: Please note that batch is
lagging in time because of the reboot.
- 1997-03-10 at 16:45
[xxx (strindberg)]
- Reboot of Strindberg to activate new software.
- 1997-03-10 at 08:30
- Kallsup (the Cray system) will be dumped and restarted due to a broken CPU. Note that this interrupt also affects HSM users.
- 1997-03-04 at 16:30
- Kallsup will be brought down for hardware maintenance Wednesday 1997-03-05,
starting 16.30. The system is expected to be back online again the same night.
- 1997-03-03 at 16:00
[xxx (strindberg)]
- Restart of strindberg parallel file system.
- 1997-03-02 at 18:00
[xxx (strindberg)]
- Hung parallel file system (PIOFS) on Strindberg. PIOFS is now
up and running again after server software restart.
- 1997-02-28 at 16:00
- Sligthly reduced number of batch-nodes to prevent switch
instability - nodes with [switch problem] indications
powered off during weekend batch.
- 1997-02-26 at 10:00
- No more tests to do - all back.
- 1997-02-26 at 09:55
- A few more tests to do.
- 1997-02-26 at 08:00
[xxx (strindberg)]
- Strindberg switch fault.
- 1997-02-24 at 11:30
- Kallsup dumped and restarted due to hanged CPUs.
- 1997-02-21 at 19:30
- Switch fault, about to restart the weekend jobs.
- 1997-02-20 at 10:20
[xxx (strindberg)]
- Strindberg back in shape since 1000.
- 1997-02-20 at 10:00
- Kallsup hardware replacements. Two failing CPU modules
will be replaced on Thursday, February 20th. The system
will be shut down at 11 am.
- 1997-02-20 at 08:35
[xxx (strindberg)]
- Problems with the switch of strindberg. At present no new
jobs are started.
- 1997-02-19 at 12:00
- Restart of one fileserver.
- 1997-02-19 at 08:00
- Selma upgarde started. Details here
- 1997-02-17 at 22:22
- All file servers OK - Node allocation restarted
- 1997-02-17 at 18:00
- Some user file systems might be unavailable - Node allocation stopped
- 1997-02-17 at 17:37
- A file server needs to be restarted.
- 1997-02-13 at 12:00
[xxx (strindberg)]
- Strindberg, switch restarted.
- 1997-02-12 at 15:50
- Selma will be down for OS upgrade 97-02-18 12.00 to 97-02-21 18.00
Read "Current news" below for detailed information.
- 1997-02-09 at 00:01
- Switch, PIOFS and Easy restarted. Up and running again.
- 1997-02-08 at 22:37
- Something fishy with node allocation - allocation paused.
- 1997-02-07 at 18:35
- Kallsup is now running jobs again.
- 1997-02-07 at 11:40
- Kallsup will be dumped, examined and restarted due to CPU problems.
- 1997-02-04 at 19:00
- Restart of one file-server in the afs cell pdc.kth.se.
- 1997-02-04 at 09:45
- Kallsup has problems with a few CPUs. The system has been rebooted
with 3 CPUs disabled
- 1997-02-03 at 15:00
[xxx (strindberg)]
- Strindberg: switch fault restart in progress.
- 1997-02-03 at 13:55
- Kallsup is now running again. The system may still be unstable due
to I/O related problems. The impact of previous errors has
been reduced enough to consider the system stable enough to run
jobs. However, the system will be brought down for further
maintenance a number of times in the near future.
- 1997-02-01 at 22:25
- Problem with scheduler - restarted. Probably no jobs lost
- analyze will follow.
- 1997-01-31 at 14:45
- Kallsup still has stability problems. The system will be unavailable
the entire weekend. Cray personell from UK are currently diagnosing
our problems on site.
- 1997-01-29 at 10:25
- Kallsup users should still read "Current news" below for more
information about the upgrade process.
- 1997-01-28 at 13:20
- Batch enabled again. More info on power outage in current news below.
- 1997-01-28 at 12:50
- Ordinary power is back. We start the process of powering
on what was shut down.
- 1997-01-28 at 11:50
- Power failure, turning on backup power. Estimated to last
for at least 25 minutes.
- 1997-01-20 at 15:00
- Several nodes going down, fault search in progress.
- 1997-01-20 at 14:17
[xxx (strindberg)]
- The log in node syk-0606/strindberg seem unstable. Fault
search in progress. Batch run as usual.
- 1997-01-20 at 11:00
[xxx (strindberg)]
- Strindberg: log in node syk-0606/strindberg rebooted.
- 1997-01-20 at 09:20
- Kallsup users should read "Current news" below for more
information about the upgrade process.
- 1997-01-17 at 16:00
[xxx (strindberg)]
- Strindberg: Switch fault repair finished.
- 1997-01-17 at 15:30
[xxx (strindberg)]
- Strindberg: Switch fault, repair in progress.
- 1997-01-14 at 12:40
- Problems with Kallsup remains. Possible due to memory problems.
System is now available, but not reliably. Don't submit any jobs.
HSM is up and running though.
.
- 1997-01-13 at 21:10
- Login procedure on Kallsup hung. Will probably be down until
sometime tomorrow (tuesday).
- 1997-01-10 at 20:31
- Kallsup up and running. Performing tests. NQS will be started
sometime during the weekend. We expect to be back in full
production monday morning.
- 1997-01-09 at 23:12
- Kallsup is down because of a memory module error. Spare parts
arrive tomorrow, friday. Do not expect kallsup working this
week. This includes the HSM system.
- 1997-01-09 at 22:00
- Some fileservers severly damaged (wrinkles in their filesystems).
Complete repair might take quite some time. Batch lines enabled.
Please let us know if you were hit!
- 1997-01-09 at 19:00
- Accidental emergency power shutdown in the server/router room. We
expect to be back at 20:00. The person hitting the button is
an engineer from one of our vendors. At least for a couple of
more hours. Sigh.
- 1997-01-07 at 15:40
- Kallsup upgrade stage two is in action. The system will not be
reliably accessible until friday (97-01-10). This extra down time
has been caused by a defective cpu-slot which was not in use
previously and thus not discovered.
- 1997-01-07 at 15:00
- One of our file servers got out of sync for about 30 minutes. During
that time interval users may have had problems accessing there files
in afs.
All flash news for
2024,
2023,
2022,
2021,
2020,
2019,
2018,
2017,
2016,
2015,
2014,
2013,
2012,
2011,
2010,
2009,
2008,
2007,
2006,
2005,
2004,
2003,
2002,
2001,
2000,
1999,
1998,
1997,
1996,
1995
Back to PDC
Subscribe to rss