-
Notifications
You must be signed in to change notification settings - Fork 0
How to run R studio off the Xanadu cluster?
Acknowledgment: Many thanks to Partik Mehta for sharing his insight into getting this working.
Before Starting
If you are a single user and would be the only one using this setup then carry out the instructions in your home directory. If you would like this installation to be shared with your lab members, do it in your lab directory on cluster.
This will run R version 4.3.0, and that cannot be changed as it is running from a singularity container.
NOTE1 : Use xanadu-submit-int to connect to the cluster. The instructions won't work on xanadu-submit-ext.
NOTE2 : Have your UCHC VPN connected. This is pulsesecure VPN, not Cisco.
-
STEP1 : Create a directory
RStudioInstances
andLibrary.Packages
and.lablocal/rstudio_gpu_pythonpackages
as subdirectories.mkdir -p RStudioInstances/{Library.Packages,.lablocal/rstudio_gpu_pythonpackages}
-
STEP2 : Execute the
cp
command to copy the singularity container and GPU drivers.cp -a /core/cbc/CommonResources/* RStudioInstances/
-
STEP3 : Execute the command below to get the aboslute path to
RStudioInstances
and copy it and replaceAbsolute/path/to/RStudioInstances
in the script below.readlink -f RStudioInstances
cd
in RStudioInstances
and create the script [RStudioSever.sh
] below.
NOTE: Copy and pasting the code from github can be tricky as it may carry over some unwanted special characters. The script can be copied using the code cp /core/globus/cgi/.shared/rstudio_server.sh /path/to/desired/location
#!/bin/sh
#SBATCH --job-name=Rstudio_gpu # name of script
#SBATCH -p general # specify Xanadu partition
#SBATCH -q general # specify Xanadu partition
#SBATCH -t 1-00:00:00 # time limit: (D-HH:MM)
#SBATCH -c 1 # number of cpus per tasks
#SBATCH --constraint=simulations
#SBATCH --mem=20G # Memory 20G
#SBATCH -o ./rstudio-server.job
#SBATCH -e ./rstudio-server.err
module purge
module load singularity/3.10.0
module list
module unload squashfs/4.3
module list
# ........ CHANGE HERE .............From step 3 above
Path2RStudioInstances=Absolute/path/to/RStudioInstances
# .........MAKE CHANGE ABOVE BEFORE GOING FURTHER.............
cd ${Path2RStudioInstances}
export gpu_nvidia_drivers=${Path2RStudioInstances}/nvidia_gpu_drivers/NVIDIA-Linux-x86_64-495.29.05
export workdir_user_gpu=${Path2RStudioInstances}/tmp_rstudio_files
mkdir -p -m 700 ${workdir_user_gpu}/run ${workdir_user_gpu}/tmp ${workdir_user_gpu}/var/lib/rstudio-server ${workdir_user_gpu}/home/${USER}
cat > ${workdir_user_gpu}/rsession.sh <<END
#!/bin/sh
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}
export R_LIBS_USER=${Path2RStudioInstances}/Library.Packages
exec rsession "\${@}"
END
chmod +x ${workdir_user_gpu}/rsession.sh
export RSS_XANADU_NODE_NAME=$(hostname)
readonly SINGULARITYENV_RSPORT_ROS_SEC=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
random_port_gen_for_local=$(shuf -i 2222-22222 -n 1)
cat 1>&2 <<END
0. Login on UCHC Pulse Secure VPN
1. SSH tunnel from your workstation using the following command:
ssh -N -f -L ${random_port_gen_for_local}:${RSS_XANADU_NODE_NAME}:${SINGULARITYENV_RSPORT_ROS_SEC} ${USER}@xanadu-submit-int.cam.uchc.edu
http://localhost:${random_port_gen_for_local}
When done using RStudio Server, terminate the job by:
1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:
scancel -f ${SLURM_JOB_ID}
END
SINGULARITY_CACHEDIR=${workdir_user_gpu}/tmp
SINGULARITY_TMPDIR=${workdir_user_gpu}/tmp
export NVIDIA_DRIVER_CAPABILITIES=all
cat > ${workdir_user_gpu}/singularity_container_environment_file <<END
workdir_user_gpu=${workdir_user_gpu}
R_LIBS=${Path2RStudioInstances}/Library.Packages
R_LIBS_SITE=${Path2RStudioInstances}/Library.Packages
R_LIBS_USER=${Path2RStudioInstances}/Library.Packages
SINGULARITY_CACHEDIR=${workdir_user_gpu}/tmp
SINGULARITY_TMPDIR=${workdir_user_gpu}/tmp
NVIDIA_DRIVER_CAPABILITIES=all
END
cat > ${workdir_user_gpu}/home/${USER}/.bashrc <<END
export workdir_user_gpu=${workdir_user_gpu}
export LD_LIBRARY_PATH='/nvlib:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs:/pythonpackages:/usr/lib/python3.8'
export PYTHONPATH=/pythonpackages
export PIPPATH=/usr/bin/pip
export SINGULARITY_CACHEDIR=${workdir_user_gpu}/tmp
export SINGULARITY_TMPDIR=${workdir_user_gpu}/tmp
export NVIDIA_DRIVER_CAPABILITIES=all
END
cat > ${workdir_user_gpu}/home/${USER}/.Rprofile <<END
Sys.setenv(LD_LIBRARY_PATH = '/nvlib:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs:/pythonpackages:/usr/lib/python3.8')
Sys.setenv(PYTHONPATH= '/pythonpackages')
Sys.setenv(PIPPATH = '/usr/bin/pip')
Sys.setenv(PATH= "/usr/bin:/nvbin:/usr/local/cuda/bin:/usr/lib/rstudio-server/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/sbin:/bin")
Sys.setenv(CPATH= "/usr/local/cuda/include")
Sys.setenv(CUDA_HOME= "/usr/local/cuda")
END
export SINGULARITYENV_PREPEND_PATH=/usr/bin/python3.8
export SINGULARITYENV_PYTHONPATH=/pythonpackages
export SINGULARITYENV_NVIDIA_DRIVER_CAPABILITIES=all
singularity exec \
--contain \
--nv \
--pwd ${PWD} \
--env-file ${workdir_user_gpu}/singularity_container_environment_file \
--nv \
--home ${workdir_user_gpu}/home/${USER} \
--bind ${Path2RStudioInstances}/.lablocal/rstudio_gpu_pythonpackages:/pythonpackages,${gpu_nvidia_drivers}:/nvlib,${gpu_nvidia_drivers}:/nvbin,${HOME},${workdir_user_gpu}/run:/run,${workdir_user_gpu}/tmp:/tmp,${workdir_user_gpu}/rsession.sh:/etc/rstudio/rsession.sh,${workdir_user_gpu}/var/lib/rstudio-server:/var/lib/rstudio-server \
${Path2RStudioInstances}/rstudio_server_cuda_gpu8.5_2023_27_04.sif \
rserver \
--server-user=${USER} \
--www-port ${SINGULARITYENV_RSPORT_ROS_SEC} \
--auth-none=1 \
--auth-stay-signed-in-days=30 \
--auth-timeout-minutes=0 \
--rsession-path=/etc/rstudio/rsession.sh
printf 'rserver exited' 1>&2
#SBATCH -c 1 Do not increase the CPU requirement unless you will be running parallel processes. #SBATCH --mem=20G Increase or decrease this value as per need; please do not oversubscribe. #SBATCH -t 1-00:00:00 This will run the Rstudio session for a day.
-
STEP4 : Execute the above script by submitting it.
sbatch RStudioSever.sh
Now monitor the job with squeue
and wait till the job starts running i.e, STATUS
changes to R
. After that, open the rstudio-server.err
file created in the directory. You will see the following message.
Currently Loaded Modulefiles:
1) squashfs/4.3 2) singularity/3.10.0
Currently Loaded Modulefiles:
1) singularity/3.10.0
0. Login on UCHC Pulse Secure VPN
1. SSH tunnel from your workstation using the following command:
ssh -N -f -L 15072:xanadu-40:46016 vsingh@xanadu-submit-int.cam.uchc.edu
http://localhost:15072
When done using RStudio Server, terminate the job by:
1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:
scancel -f 6626591
GID: readonly variable
UID: readonly variable
NOTE: The port numbers are randomly generated and hence may not be the same as shown here. That's fine, make a note of it.
Copy the ssh -N -f -L 15072:xanadu-40:46016 vsingh@xanadu-submit-int.cam.uchc.edu
line from the file and open another terminal on your local machine [ Do not connect to cluster ]and paste the copied command and hit ENTER. Provide a password if requested/asked.
Copy http://localhost:15072
from rstudio-server.err
. Now open a browser on your local machine and paste this address and hit enter. If successful, it will R-studio running off the cluster.
** Install packages required for analysis through the Rstudio, and you are ready to go**
CLOSING SESSION
-
STEP1: Save the scripts etc.
-
STEP2: Exit the RStudio Session ("power" button in the top right corner of the RStudio window).
-
STEP3: Close the ssh tunnel with the commands below (replace 8889 with your port number)
lsof -i tcp:8889
This will list process IDs (PID). Kill process
kill -9 <PID>
-STEP4: Logon to Xanadu cluster and cancel the job with scancel
scancel <JOBID>
Please be considerate and do not spin multiple sessions.