Reinstalling ROCKS compute cluster node

If you have any faulty HPC node and want to reinstall it for instance in case of hard drive replacement you should bare in mind several things:

  • Make sure xinetd is listening on 65 for tftpd requests on frontend.
  • Check for firewall rules. But you can simply switch it off during install. Otherwise you’ll get PXE-E32: TFTP open timeout.
  • Then you should configure your frontend to force compute node reinstallation. If you won’t do that you’ll just see PXE-M0F: Exiting HP PXE ROM or similar. Execute the following command on frontend: rocks set host boot <nodename> action=install.
  • In case you get an unable to read package metadata error during installation then go to /export/rocks/install/, remove rocks-dist folder and recreate installation tree by running rocks create distro.
  • After host installation put all  additional packages (like IB, MVAPICH, etc) into /share/apps and run rocks run host <nodename> “rpm -Uvh /share/apps/*.rpm”. Make necessary packages (like openibd and/or opensmd) to run upon startup via chkconfig and start them up. You may also need to copy some manually installed packages to compute node’s /opt directory.
  • In case you commented out faulty node earlier in /opt/torque/server_priv/nodes uncomment it and restart pbs_server service.

This is it. Now you should be good to go.

Advertisement

Tags: , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: