Events:

1995-12-27 at 21:00: For the curious: the cooling went weak since the cooling water became to hot which in its turn was caused by a frozen water-pipe!
1995-12-27 at 14:20: All systems back. The power failure was caused by the temperature in the machine exceeding a limit. The reason for this is under investigation.
1995-12-27 at 13:00: Power failure in the machine room.
1995-12-14 at 10:00: Scheduler delayed for about two hours.
1995-12-12 at 16:00: Thursday, 1995-12-14 between 0800 and 1700, 16 thin nodes will be reserved for benchmarks.
1995-12-08 at 11:36: A clarification: because of the benchmark running on 32 thin nodes + 1 wide node, jobs that should have been run this morning are delayed.
1995-12-08 at 07:50: Info about EASY progress can be found in the news page.
1995-12-08 at 07:49: The new scheduling system, EASY, has now been running on the full machine for over a week. All things considered, it has been successful and we will continue running it. Obviously there are a couple of rough edges that need smoothing and we will go to work on these as soon as we have finished digesting all user feedback we have received. We extend a collective thanks to those who have given us feedback and we certainly welcome more and continued user comments and ideas. Take the EASY way out:)
1995-12-06 at 17:00: Friday, 1995-12-08 at 0800 hours, 32 thin and one wide node will be reserved for benchmarks. This will take about five hours.
1995-12-06 at 10:17: Demons of the past have now been put to rest. All ok again.
1995-12-06 at 07:35: ---No it is not dead, just resting:) Obviously something fishy with the machine. Investigations underway.
1995-12-03 at 14:30: Batch queue resumed. Any server indication will cause queue to be stopped again.
1995-12-02 at 17:00: Severe server problems persists. Batch queue stopped.
1995-12-02 at 13:00: Server problems causing job delay.
1995-12-01 at 18:00: Tuesday, 1995-12-05 at 1700 hours, 32 thin and one wide node will be reserved for benchmarks. This will take about five hours.
1995-11-29 at 10:00: Easy running on all of the machine.
1995-11-29 at 09:00: All nodes moved into easy. We are waiting for two old loadleveler jobs to finish. The will happen at 1000 at the very latest.
1995-11-29 at 07:00: We will move the rest of the nodes into easy round 0800, 1995-11-29. Please have patience with a short delay. We will also try to make same preventive actions to try to make the job manager stop filling disk.
1995-11-28 at 12:00: Info about easy progress to be seen in the news page.
1995-11-26 at 00:20: System is back. No jobs waiting in line lost.
1995-11-25 at 22:10: Problems with deamon running amok. Struve rebooted. Please do not start interactive jobs before LoadL queues are resumed.
1995-11-25 at 21:40 [xxx (strindberg)]: Unidentified problems with Strindberg. All LoadL queues drained.
1995-11-24 at 11:00 [xxx (strindberg)]: We will run a full scale test of easy on all of Strindberg starting Wednesday next week, 1995-11-28. We will run for about one week.
1995-11-23 at 16:00: All systems back. Loadleveler changed to allow maximum one running job for each user.
1995-11-23 at 15:00: Power failure in NADA computer room. A number of servers went down, including some that PDC uses. All loadleveler jobs lost, running as well as in line. On the contrary nothing seem to have happened to the easy jobs!
1995-11-23 at 13:00: Problems with the job manager. It keeps filling it's disk.
1995-11-22 at 13:30: Control work station and most other PDC machines that moved are back on the network. Loadleveler and easy resumed.
1995-11-21 at 10:00: We will prepare for future expansion of machine room. Some machines will be moved to a `safer' area, most notably the SP2 control work station and a few file-servers. This will take place tomorrow 1995-11-22, somewhere between 0800 and 1300 hours. No running jobs possible during the move.
1995-11-21 at 09:50: The machine is up. No running jobs or jobs in lines lost.
1995-11-21 at 09:08: We are in the processes of gearing up the machine. Do not use the it until we say go!
1995-11-20 at 21:00: Two jobs still running in the loadleveler section. Most probably we will restart the job-manager tomorrow, 1995-11-21, round 0800 hours.
1995-11-20 at 12:00: Though not obvious most of the machine is busy running user batch jobs. We will let the jobs finish prior resuming loadleveler lines. This will happen 1995-11-21 at 0400 or earlier.
1995-11-20 at 07:00: The job manager had problems last night. Loadleveler lines are drained while waiting for running jobs to finish. Do not start interactive jobs, ie explicitly by poe. You may still use easy.
1995-11-12 at 20:05: Nodes syk-5 through syk-16 still running a new Scheduling system called EASY. If you have any questions or want to try it out, contact us.
1995-11-08 at 07:50: Tomorrow (95-11-09) between 0900 and as latest 1530 the wide nodes (syk-49 trough syk-63 will be reserved for special use (benchmark validation). LoadL small-wide queue will be drained and jobs remaining after 0900 will be removed.
1995-11-07 at 17:50: Job manager died. All running jobs lost.
1995-10-30 at 12:30: Job manager died. All running jobs lost.
1995-10-30 at 08:00: Nodes syk-5 through syk-16 still running easy. Please try them out.
1995-10-26 at 20:00: Nodes syk-5 through syk-16 running easy. They will be reserved for a class tomorrow, 1995-10-27 between 1300 and 1800. Please try it out until then.
1995-10-24 at 12:00: Debug-version of job-manager installed. Nodes syk-9 trough syk-16 drained awaiting tests of easy.
1995-10-24 at 06:30: Loadleveler drained while installing debug-version of the job manager.
1995-10-23 at 10:00: Job-manager restarted. Loadleveler resumed with old submissions intact.
1995-10-22 at 11:00: Job-manager not stable. Draining loadleveler. This will prevent jobs in lines to become running jobs. Will wait for jobs currently running to complete until resuming loadleveler. This will take place, by earliest, Monday morning, 1995-10-23.
1995-10-19 at 17:00: Loadleveler resumed on nodes syk-9 through syk-16.
1995-10-16 at 14:00: Exercises for a PDC course will take place 17/10 to 19/10 between 1300 and 1700. To give the students a fair chance to run, nodes syk-9 through syk-16 are drained in LoadLeveler.
1995-10-12 at 21:00: Job-manager died. All running jobs lost.
1995-10-11 at 04:55 [xxx (strindberg)]: Strindberg is back. System software upgraded. Please submit new jobs.
1995-10-10 at 19:20 [xxx (strindberg)]: Plan for tomorrow, 1995-10-10: Strindberg will be in an undefined state starting at 1200 hours. We will apply patches to job manager and loadleveler. Please use it until then.
1995-10-09 at 12:30: We are about to install patches to the job-manager and loadleveler. This might affect the availability of the machine tomorrow, 1995-10-10.
1995-10-07 at 14:30: The running jobs did not survive. Please resubmit them.
From now on we will do batch scheduling by hand! Submit your batch jobs as usual and we will let them start manually. Interactive use, ie poe and less than a few hours runtime, encouraged on nodes syk-9 to syk-16.
1995-10-07 at 14:00: Job manager does not show jobs though they are still running. Loadleveler draining until jobs running are finished. This will take 24 hours.
1995-10-06 at 13:30: Special use is finished. Syk-17 to syk-48 moved back into the ordinary loadleveler lines.
1995-10-05 at 11:30: Tomorrow (95-10-06) between 8.00 and 12.30 (as worst) at least 32 nodes will be allocated for special use. Nodes syk-17 to syk-48 will be drained from now.
You are welcome to use the nodes interactively, ie using poe, until then. Processes running at that moment will be removed without further notice.
1995-10-05 at 07:00 [xxx (strindberg)]: Strindberg back in normal operation since 12 hours. Fileserver problems will probably cause a few users to take special actions in a not so distant future. Please read coming announcements.
1995-10-04 at 10:00: Loadleveler draining while configuring disk and restarting syk-51.
1995-10-03 at 16:15: Bellman is back.
1995-10-03 at 16:00 [xxx (strindberg)]: Strindberg is back. Node syk-51 is missing. It has a broken disk. Please submit your jobs again.
1995-10-03 at 14:30: Large parts of KTH experienced a loss of power. It seems even the subway is affected. We are restarting the systems. It must be one of those days...
1995-10-03 at 13:00: The whole switch in the third frame was switched. Doing a few tests.
1995-10-03 at 09:25: We are about to switch the switch power supply. As a safety precaution we will drain loadleveler on all nodes for a while, ie keeping old jobs running but preventing new to start. Running interactively using, ie poe, still possible.
1995-10-03 at 07:30: The fileserver with problems is back. We are waiting for a new switch power supply for the third frame.
1995-10-03 at 07:12: Tomorrow there will be an extended service window from 09.00--15.00 for the CM. This is due to preventive maintenance.
1995-10-02 at 19:30: The third frame is out of 5V power again. Nodes syk-33 through syk-48 not available.
1995-10-02 at 19:00: One fileserver is down. A few users having their home residing on the server will not be able to access their files but will see `connection timed out'.
1995-10-02 at 13:00: Service is complete. Frame three is available. Please observe the new news web slot.
1995-10-01 at 15:00: The third frame has a 5V power supply failure. Service is on its way. Nodes syk-33 through syk-48 are not available.
1995-09-27 at 22:30: Mess up in LoadL - all queues drained.
1995-09-26 at 14:43: Wide nodes will have their max CPU limit changed from 48 h to 24 h. This affects the class small-wide. The classes develop and develop-thin will have their CPU limit changed from 5 minutes to 20 minutes.
1995-09-20 at 13:30: We have two fddi-rings now. Old jobs still running. Jobs in line removed. Please submit again.
1995-09-20 at 12:41: FYI: There are now no less than five persons in the machine room adding another fddi-ring. Old jobs still running. Please do not submit new jobs yet!
1995-09-20 at 09:51: Still a lot of jobs running. We are waiting for them to finish before fixing the network problems.
1995-09-19 at 20:00: Network problems. All running jobs still running. Jobs in line are still in line. We will let running jobs finish, jobs in line stay in line, and deal with the network problems when the number of running jobs is lower.
1995-09-17 at 10:00: Control Work Station hangup caused Job Manager to die around 0600. Jobs running at that moment damaged, jobs in line not affected.
1995-09-15 at 11:40: Job manager resumed. You may submit new jobs. Old jobs still in line.
1995-09-15 at 06:49: Job manager obviously down. Don't submit new jobs to LL. They will surely be lost when we clean up this mess. Looks like it will take several hours.
1995-09-09 at 12:00: Syk-27 and syk-33 are back. See the new slot in the web - `What's Up?'
1995-09-08 at 15:55: Now ! Everything is up except syk-27 and syk-33.
1995-09-08 at 14:42: Still agonizing with a still confused and evermore senile LoadLeveler! No new jobs will be started.
1995-09-08 at 12:05: Problems with LoadLeveler. No new jobs will be started. Most of running jobs were lost. LoadL will be restarted ASAP. Please do not submit any new jobs.
1995-09-06 at 17:30: New jobs allowed again. A few old jobs that are not visible through loadleveler have their nodes marked `draining' in loadleveler to let them finish. Two jobs were lost.
1995-09-06 at 13:55: Problems with the job manager. No new jobs allowed to enter running state.
1995-09-06 at 11:15: Problems with Loadleveler and job manager solved. No jobs running or waiting in line lost.
1995-09-06 at 10:15: Problems with Loadleveler and job manager. You might experience difficulties starting new jobs.
1995-09-04 at 11:30: Problem with LoadLeveler and job manager. Some jobs in in LoadLeveler may have died.
1995-08-27 at 20:00: A few users might be affected during a non-trivial move of their home directories.
1995-08-18 at 08:00: All passwords have been changed tonight to increase the system security. If you have not already received your new password, please contact us at phone +46 8 790 7907 or email pdc-staff@pdc.kth.se.
1995-08-14 at 13:25: Loadleveler is back. Please resubmit your job.
1995-08-14 at 13:20: Loadleveler on its way back. All jobs were lost.
1995-08-14 at 11:30 [xxx (strindberg)]: Problems with the jobmanager at Strindberg. You might experience difficulties to start programs.
1995-08-10 at 12:00: LoadL accepting new jobs. Nodes containing old jobs draining to let them finish.
1995-08-10 at 09:45: LoadL draining - no new jobs started. This to resolve some LoadL problems - lost jobs etc... We will try to save jobs still running but lost by LoadL.
1995-08-09 at 17:30 [xxx (strindberg)]: Job manager and LoadLeveler restarted. Strindberg up and running.
1995-08-09 at 15:23: SP-2 has problems. Investigation under way.
1995-08-02 at 18:55 [xxx (strindberg)]: SP2 Strindberg up and running (since some time...). Running jobs were lost.
1995-08-02 at 17:25 [xxx (strindberg)]: SP2 Strindberg down.
1995-08-01 at 16:10 [xxx (strindberg)]: SP2 Strindberg restarted.
1995-08-01 at 15:00: All systems down due to fileserver error. Restart in progress. Please do not submit jobs until restart is complete!
1995-07-27 at 17:30: The SP2 job manager is restarted. Please resubmit your job.
1995-07-27 at 16:00 [xxx (strindberg)]: We have problems with the job manager on the SP2 Strindberg. You might experience problems with node allocation.
1995-07-21 at 13:00: We had problems with Loadleveler yesterday afternoon. Some jobs were removed.
1995-07-06 at 16:20: Node syk-53 down due to HW memory error.
1995-07-03 at 15:15: All running jobs crashed due to system software problems. System now restarted.
1995-06-30 at 16:10: All running jobs crashed due to system software problems. All jobs in LoadLeveler queue removed.
1995-06-29 at 15:00 [xxx (strindberg)]: Service of SP2 Strindberg at Wednesday 1995-07-05. Thin nodes (1..48) will be unavailable from 0900 to 1100 and wide nodes (49..63) from 0900 to 1700. This is an estimate.
1995-06-22 at 15:59: Full 16K of CM Bellman available again.
1995-06-22 at 12:17: Power supply will be changed at 15:00 hours on CM Bellman.
1995-06-21 at 08:44: Changed start time of production queue on CM Bellman from 22:00:00 to 20:30:00.
1995-06-20 at 13:58: Broken power supply in upper pint of CM Bellman. This means that only half the machine is available. Namely sequencer 1.; Queue production will only use sequencer 1, ie half the machine.
1995-06-08 at 14:40: Bellman (CM-200) is up and running.
1995-06-08 at 14:20 [xxx (strindberg)]: Strindberg (SP2) up and running.
1995-06-08 at 13:50 [xxx (strindberg)]: Strindberg (SP2) and Bellman (CM-200) down due to power faliure.
1995-06-08 at 09:30: license server for fortran and c++ compilers down.
1995-05-31 at 06:00: Running experimental batch system on nodes 5-8. You will not be able to login to these nodes.
1995-05-24 at 09:00 [xxx (strindberg)]: Strindberg up and running with latest software.
1995-05-23 at 14:00 [xxx (strindberg)]: Software upgrade on SP2 Strindberg in progress.
1995-05-22 at 16:00 [xxx (strindberg)]: Software upgrade on SP2 Strindberg. Might take all hours left of today.
1995-05-22 at 14:15 [xxx (strindberg)]: Preparation for future software upgrade did not go as smooth as expected. SP2 Strindberg will be reachable again as soon as possible.
1995-05-22 at 14:05: Solved hardware problem in CM Bellman. Broken power supply replaced. Production queues back to normal.
1995-05-18 at 13:05: Hardware problem in upper pint of CM Bellman. This means that only half the machine is available. Namely sequencer 1.
1995-05-16 at 15:47: CM Bellman will be not be available Monday May 29:th between 09.00--18.00 due to regular hardware maintenance.
1995-05-10 at 10:00: Exercises for the PDC SP2 course will take place between 1300 and 1700. To give the students a fair chance to run interactive jobs we might put loadleveler jobs on hold without further notice. Same hours apply for tomorrow, 1995-05-11.
1995-05-05 at 09:00: SP2 loadleveler is enabled.
1995-05-04 at 23:00: SP2 is back up. We will not restart loadleveler until tomorrow, 1995-05-05.
1995-05-04 at 21:00: CM is back up.
1995-05-04 at 18:00: Central KTH cooling system broken. SP2 and CM shutdown as a safety precaution against overheating.
1995-04-28 at 16:00: SP2 loadleveler restarted. Please note you might have to submit your job again.
1995-04-25 at 11:00 [xxx (strindberg)]: Problems with the SP2 control work station. You might experience difficulties getting access to Strindberg.
1995-04-21 at 10:00: Yesterday we had severe problems with primary and backup file-servers for the SP2. This is now fixed.
1995-04-20 at 18:00: We have simultaneous problems with primary and backup file-servers. A temporary fix is on its way. A more stable situation is to be expected tomorrow, 1995-04-21.
1995-04-11 at 16:20: The new disk is installed and the SP2 available again.
1995-04-11 at 13:00: A new disk has arrived. The SP2 will not be available until the old one is replaced.
1995-04-11 at 11:00: Please note that the broken hard-disk does not affect our CM systems, only the SP2. It is still possible to run serial SP2 jobs. We are still waiting for a disk replacement.
1995-04-10 at 18:00: Due to an unstable hard-disk you might have experienced problems submitting parallel jobs today. We are expecting a replacement to be delivered as soon as possible. Which seem to be tomorrow, 1995-04-11.
1995-03-31 at 15:00: All systems restarted and running.
1995-03-31 at 13:00: All systems shut down. Smoke and fire indication.
1995-03-30 at 10:00: Parts of KTH experienced a loss of power. Systems are about to be restarted.
1995-03-29 at 18:00 and onwards: KTHLAN will experience a number of cuts, due to upgrade in software for the routers. This means that access to the PDC machines will be problematic.
1995-03-22 at 10:00: Change of SP2 hardware done.
1995-03-21 at 21:00 [xxx (strindberg)]: Change of more SP2 hardware at 1995-03-22 between 0900 and 1300. Parts or all of Strindberg might become unavailable. The change is supposed to be swift.
1995-03-20 at 18:00 [xxx (strindberg)]: Change of some SP2 hardware scheduled for 1995-03-21 at 0830. Parts or all of Strindberg might become unavailable.
1995-03-20 at 13:30: SP2 and CM systems are running.
1995-03-20 at 09:00: SP2 wide nodes (51, 53 .. 63) once again available for public use.
1995-03-17 at 09:00: SP2 wide nodes (49, 51, 53 .. 63) unavailable for public use.
1995-03-14 at 16:00: Problems with SP2 cws (control work station) are solved.
1995-03-14 at 14:00: You might experience difficulties when submitting jobs for the SP2. This because of problems with its cws (control work station.)
1995-03-09 at 18:00: SP2 wide nodes (49, 51, 53 .. 63) are still being serviced. Node 49 will get a new disk tomorrow, March 10. Node 51 and on will be started as soon as their ethernets are `rewired' around node 49.
1995-03-09 at 12:00: SP2 wide nodes (49, 51, 53 .. 63) are still being serviced - a hard disc might have to be replaced.
1995-03-08 at 18:00: SP2 wide nodes (49, 51, 53 .. 63) not available until after service, which is scheduled for tomorrow, March 9.
1995-03-07 at 15:29: CM systems and all SP2 nodes except eight are back up.
1995-03-07 at 14:29: Due to bogus power-and-smoke indications both cooling and power was shut down. We plan to have all systems back up again as soon as possible.

All flash news for 2025, 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995

Back to PDC
Subscribe to rss