Scheduled Maintenance on Grace

Dear Grace, Omega and Farnam Users,

Out-of-cycle scheduled maintenance will be performed on Grace and Omega beginning Monday, November 4, 2019, at 8:00 am. Maintenance is expected to be completed by the end of the day Monday. 

This maintenance is required in order to prepare the Infiniband network for the upcoming deployment of additional common and PI-purchased nodes. As such, there are no changes that impact users and no new functionality is being introduced. We plan, however, to bring online 174 new common nodes and 76 Pi-purchased nodes soon after the maintenance window, once necessary validation testing has been completed.

During this time, logins will be disabled and connections via Globus will be unavailable. The Loomis storage will not be available on the Farnam cluster but the cluster itself will remain available. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the clusters are available.

As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on November 4, 2019). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail”. (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.