Scheduled Maintenance on Grace

8/26 - 08:00 AM 

Grace, Omega and Farnam Users,
 

Scheduled maintenance will be performed on Grace and Omega beginning Monday, August 26, 2019, at 8:00 am. Maintenance is expected to be completed by the end of day, Wednesday, August 28, 2019. During this time, logins will be disabled and connections via Globus will be unavailable. The Loomis storage will not be available on the Farnam cluster. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the clusters are available.

Aside from the system and security updates we perform during maintenance, we want you to know about the following changes.
 
Module Changes on Grace: The software available via the modules system on Grace is being upgraded to be more consistent with our other clusters. During the maintenance, we will change the default module list to a new module collection. We encourage you to look at the new collection today if you have not already done so. To try the new collection, run the following on the login node:

source /apps/bin/try_new_modules.sh

Then you can run “module avail” to see the list of available software in the new collection. To return to the old collection, simply log out and log back into the cluster. The old installations will remain, but all new software will be installed into the new collection. More information about this transition is available on our website at http://docs.ycrc.yale.edu/clusters-at-yale/applications/new-modules-grace.
 
Grace Login Nodes IP Addresses: During the maintenance the IP addresses of Grace’s two login nodes will change to 10.181.0.11 and 10.181.0.14. If part of your workflow involves directly accessing or white-listing these IP addresses, they will need to be updated. Most users will not be impacted by this change.

As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on August 26, 2019). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail”. (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

Please visit the status page at research.computing.yale.edu for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.