Synopsis

Generally, there are two ways to transfer files between the HPC systems and your PC:

  1. Via the terminal, using commands such as scp, rsync, or rclone  
  2. Via a FTP file browser


Transferring files using the terminal

scp

Scp works similar to the copy (cp) command on Linux, except that it copies data between different machines. Scp uses the ssh protocol to ensure a safe transfer of the data. The following commands are all issued from a terminal on your local machine. For MobaXterm users: you can use the plus sign to open a new tab, in which a local terminal will open.

To copy a file sourcefile to the $HOME/destinationdir folder on the HPC systems, use

Snellius
scp sourcefile <username>@snellius.surf.nl:destinationdir


Alternatively, you can rename the file on the HPC system (it will overwrite the existing destinationfile if it exists) by issuing the command

Snellius
scp sourcefile <username>@snellius.surf.nl:destinationdir/destinationfile


To copy a complete directory, use the -r argument, e.g.

Snellius
scp -r sourcedir <username>@snellius.surf.nl:destinationdir


This will create the directory destinationdir/sourcedir on the HPC system and copy the contents of your local sourcedir to that new directory.

To copy from the HPC system to your local PC, simply reverse the order of arguments. For example, to copy the file sourcefile from the remote home directory to the local home directory (~), use

Snellius
scp -r <username>@snellius.surf.nl:sourcefile ~

Rclone

What is Rclone?

Rclone is a program that's often dubbed the Swiss army knife of cloud storage. It can synchronize files and has capabilities similar to rsync such as retaking a transfer after it was interrupted, even in the middle of a file. It can mount remote directories like SSHFS, or just listing or comparing content or remote or local directories.  It includes backends for practically any type of cloud or remote storage, from SFTP to Google Drive, including ResearchDrive, Surfdrive or just plain FTP. You can find the exhaustive list of capabilities, options, and backends in Rclone's website: 
https://rclone.org/.  Rclone is installed on the login nodes of Snellius and Lisa and does not require loading any module.

How does it work?

Unlike other programs, Rclone requires that every remote connection (called remote in Rclone lingo) is configured beforehand. The advantage is that you can call Rclone with very simple command syntax:  rclone command <remote> .  Below are two examples cases. One shows how to list the content of your Google Drive in a handy tree format, and the second how to check if your Dropbox bucket is in sync with the local copy:

rclone tree googledrive
rclone check dropbx01 mylocalcopy

To be able to do that, you first need to create a configuration for the remote connection (remote). This configuration is saved to your home directory and is persistent.

Creating the config file

For this example, we will assume that you are transferring you are working from Snellius to transfer the data from a hypothetical HPC cluster called "fubar" that accepts SFTP / SSH connections and with the hostname fubar.wherever.org

  1. Type  rclone config 
  2. select option n  (New Remote)
  3. Search for SSH/SFTP Connection (usually option 33, but it can vary)
  4. Type in the number (for  instance 33)
  5. host> fubar.wherever.org
  6. user> leave blank
  7. port> leave blank
  8. SSH Password  leave blank if in doubt 
  9. key_pem leave blank if in doubt 
  10. key_file leave blank if in doubt 
  11. Only PEM encrypted key files Leave blank
  12. key_use_agent leave blank
  13. user_insecure_cipher leave blank
  14. disable_hashcheck leave blank
  15. Edit advanced config NO
  16. Confirm with Y (or restart with e if you made an error)
  17. q  to quit

Using Rclone to copy your data

For this example, we will be copying a directory from Fubar to Snellius. These hypothetical files will be called:

  • Target on Snellius: /projects/0/snelliusspace  

To proceed to synchronize all the content of the directory on Fubar with the target directory on Snellius, run:

rclone sync snellius:/projects/0/snelliusspace /project/original

Synchronizing means that both directories will have the same content: If you add or remove something from one of them, this change will be replicated to others.

If you just want to copy it, you can use the command below. This one will also make it possible to restart a copy in the case that the connection drops. It works at a block level, which means that it is able to retake the transfer even in the middle of a file, saving time:

rclone copyto snellius:/projects/0/snelliusspace /project/original

And, finally, to check if everything was done correctly: 

rclone check snellius:/projects/0/snelliusspace /project/original

Rclone has a wide variety of backends, which you can find in the Rclone site. It is also widely used in our own services, for instance for connecting your ResearchDrive storage to the Data Archive.

Rclone does not support 2FA right now. Should you have this option enabled, you can use rsync instead.

Other remote file-copying tools

  1. sftp
  2. rsync

Transferring files using a file browser

FTP file browsers are generally very similar to e.g. the Windows File Explorer and Finder in MacOS. In most FTP browers, you can drag-and-drop files to copy files from your own PC to the HPC system, or vice versa. There is one difference with Windows File Explorer and Finder: folders are not automatically updated if their content is changed. So, if a file is added or removed (through a command in the terminal, or by one of your programs), this change does not show up in the FTP file browser until you refresh the folder.

Generally, FTP browsers support SFTP, i.e. the SSH File Transfer Protocol. Like scp, the use of the SFTP allows safe, encrypted transfer of data.

SFTP for MobaXterm users

MobaXterm has an integrated FTP file browser. Once you have logged in to the HPC system, you will see the file browser to the left of the terminal window, where it shows the contents of your home folder. You can browse through these folders, and drag-and-drop files and folders between this FTP file browser and the Windows File Explorer. Alternatively, you can use the download/upload buttons at the top of the FTP file browser window. A green refresh button is also located there to refresh the contents of the current folder. You can also open files in the FTP file browser to edit them directly. Upon saving, you'll be asked if you want change these files on the HPC system.

Other SFTP browsers

There are a large number of free FTP browser out there. Some examples are

  1. Filezilla (Windows, MacOS, Linux)
  2. Cyberduck (Windows, MacOS)
  3. WinSCP (Windows)
  4. gFTP (Linux)

Although you'll need to read the documentation on these specific FTP browsers to know how to use them, some aspects are generic. In each of these programs, you'll need

  • The name of the host you want to connect to: snellius.surf.nl 
  • The protocol that should be used for connecting: SCP, SSH2, SFTP or similar
  • The port number: 22 (although many ftp browsers 'guess' this automatically)

Using SFTP browsers with 2-factor authentication

If 2-factor authentication is enabled for your login, you should make sure that your SFTP client allows you to input your OTP token when connecting. Generally, that means

  • you don't want your SFTP client to store your password
  • you'll probably want your SFTP client configured in such a way to limit the number of connections to 1, in order to prevent it from asking your OTP token multiple times

The way to set this differs per client, but for example in FileZilla you can set the 'Logon Type' to 'Interactive' and in the 'Transfer settings' tab tick 'Limit number of simultaneous connections' (and set it to 1). Other clients probably offer similar options, but please refer to the manual of your client.

Transferring data to and from the archive

On Snellius, it is recommended to use the staging partition to copy large datasets from or to the Data Archive.  For example, to stage a file from archive to  the scratch filesystem, you can either invoke srun directly

srun -t 1:00:00 -p staging cp -r /archive/<<< your username >>>/<<< file >>> /scratch-shared/<<< your username >>>/

or submit a slurm job with the following parameters:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=staging  
#SBATCH --time=01:00:00       # estimated runtime

Please note, that only 1/4 node per job is allowed on the staging partition. It is also possible to use dmftar for optimal and archiving on nodes of the staging partition.


Transferring data to and from SURF Research Drive

In order to use Rclone, an Rclone configuration must be created first.  You can specify multiple remotes in a single rclone config file to create separate connections to Research Drive or other cloud storages. First create a rclone.conf file resides in .config/rclone/ and contain the following contents:

[RD]
type = webdav
url = https://researchdrive.surfsara.nl/remote.php/webdav/
vendor = owncloud
user = <user name>
pass = <encrypted password> 

...
[uni1]
type = union
upstreams = rand
...

If you create and place the Rclone config file in other non-standard locations, you need to run your rclone commands in the following manner:

rclone --config /path/to/rclone.conf <command> ......

Example commands

To get an overview of your current files on Research Drive enter the following:

rclone ls RD:

 Copy source directory to destination directory:

rclone copy /my/folder RD:my/destination/folder

Use Rclone to mount a file system in user space is done as follows:

rrclone mount --use-cookies --timeout 15m RD:[path/to/dir] /path/to/local/mount

 For more information, please check Access Research Drive via Rclone.


Transferring data to and from Grid storage/ dCache

The Grid storage/ dCache is a storage system that supports many protocols and authentication/authorisation schemes. For WebDAV clients, it allows users to authenticate with username and password (BASIC), X.509, Kerberos, and various bearer tokens, including Macaroons and OpenID-Connect access tokens.

To transfer data to and from Grid storage/ dCache in Snellius, you can use CUA login and password for authentication.  The prerequisite for this is that the user mapping are done for your CUA login and the project dCache storage. If not, please submit a servicedesk ticket.

To begin with, make the Rclone configuration by running rclone with the config option*:

rclone config

*Please note that the Rclone config command seems to only obscure the password automatically when used in the interactive mode. 

Follow the instruction and fill in the information below:

n) New remote
name> <a name for the remote>
Storage> webdav
url> https://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/<project>
vendor> other
user> <your CUA login>
option pass> y) Yes type in my own password
password> <your CUA password>
bearer_token> <Enter (default)>

Your Rclone configuration file contain login information. Therefore it is not only important to keep your rclone.conf file in a secure location, but also recommended to add a password to your configuration. To add a password to your rclone configuration, execute rclone config:

>rclone config
Current remotes:

e) Edit existing remote
n) New remote
d) Delete remote
s) Set configuration password
q) Quit config
e/n/d/s/q>

Go into s, follow the manual and set configuration password. Please note that every time you start rclone you will have to supply the password, and if you forget the password you can't recover the configuration.

To use the Rclone configuration transferring data from dCache to Snellius, 

rclone copy <remote name>:/path/to/file <directory in Snellius>

A token based authentication can also be used for Grid storage/ dCache. For more information, please see ADA interface or watch Demo: using the Advanced dCache API (ADA) tool.


  • No labels