Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

There are several ways to migrate data between two instances, which method is most suitable for you depends on several factors.

This tutorial aims to explain these methods and give examples of how to do this. In the examples we'll migrate a dataset from (the source) researchdrive.surfsara.nl to (the destination) the branded instance of the fictional Miskatonic University.


The methods

There are three methods:

  1. Local copy
    Easy, but limited
  2. Client-side copy
    Slow, but thorough
  3. Server-side copy
    Fast, but with caveats


Overview local copy

Panel
borderColorgreen
borderStylesolid
titleLocal Copy

If you have all your data stored locally, or are able to download everything (possibly with the help of the sync client), then migrating data is just a matter of reuploading to the target environment. Recommended methods are the sync client or rclone.

Downsides
  • Your entire dataset must fit on your computer, or you must be willing to download/upload in parts
  • The process is manual
  • Your computer needs to be online throughout the process
  • Downloading and uploading consume your bandwidth
Upsides
  • The process is easy, if you know how to upload files to Research Drive, you know how to do this
  • Errors are easily spotted
Recommended for
  • Small datasets
Main documentation


Overview client-side copy

Panel
borderColorgreen
borderStylesolid
titleClient-side copy

Client-side means, that there's a process on your side that downloads all the files from the source and uploads it to the destination. In most cases, this means an rclone process running on your laptop.

Downsides
  • The process is slow
  • All files in the dataset will be downloaded and re-uploaded using your computer
  • Downloading and uploading consume your bandwidth
Upsides
  • The process can be fully automated
  • Though all files pass through your computer, they are not stored there so you don't need much local storage
  • The process can be interrupted and restarted later
  • The process is thorough
  • skips files that are already copied but sees modifications between source and destination and reuploads if the source and destination differ
  • Timestamps are preserved
Recommended for
  • All migrations
Main documentation


Overview server-side copy

Panel
borderColorgreen
borderStylesolid
titleServer-side copy

Server-side means, files are migrated between the source and destination instances directly without passing through your computer. An active process on your computer only instructs the servers to do the work.

Downsides
  • Timestamp information will be lost
  • You need permission to create a federated share
  • The process cannot cope with changes between source and destination very well
  • The process cannot overwrite files on the destination
  • You are likely to have to do a client-side migration after the server-side migration is done
  • Error messages are likely to occur and are rather obscure, unrelated or difficult to interpret
Upsides
  • The process is fast
  • Files don't pass through your computer, so bandwidth usage is minimal
Recommended for
  • Large datasets where timestamp information is not relevant
Caveat
  • Because the server-side copy cannot overwrite files on the destination, if there is any chance of changes on the source it is highly recommended to run a client-side copy after you did the server-side copy. The client-side copy can overwrite files on the destination, but will skip files that are already the same on both source and destination. So the slowness of the client-side copy is less of an issue.
  • Timestamps of files will be reset to the moment of the migration. With a client-side copy, timestamp information is kept. Also, many tools will see the files on the destination as newer as on the source, which may interfere with checking for data integrity.

All in all, these caveats may outweigh the benefit of performance but this depends on your use case.

Main documentation