You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

The SURFsara Data Archive allows the user to safely archive up to petabytes of valuable research data. The Data Archive uses tape library technology to store data sets for the long term and allows access at any time.

Data ingested to the Data Archive of SURFsara is kept in two different tape libraries located at two different locations in Amsterdam, the Netherlands. The Data Archive is connected with our compute infrastructure via a fast network connection allowing fast staging of archived data. Users are given a login, which enables immediate, 24/7 access to the service.

Why?

Modern scientific research tends to generate more and more data. It would be possible to keep this data always directly accessible online in the HOME directories, but that would be too costly. Experience learns that very large datasets are not always needed online, so they can be transported to tape, a much cheaper solution than disk. Luckily, with the advent of tape robots and appropriate software, the storage on tape is almost transparent to the end-user.
 

The Data Archive supports:

  • Long-term and safe preservation of data
  • High-level support concerning the optimal use of the service

Data access is facilitated by several data transfer protocols that can be employed in a Linux or Windows environment:

  • Via the internet, using SSH, (HPN)SCP, SFTP, rsync, GridFTP;
  • Via iRODS federations that allow implementation and execution of user-defined data policies. Currently, the EUDAT-B2SAFE policies are available: http://www.eudat.eu/services/b2safe

For more details and user stories please refer to the website from SURF. For access and account information, visit the access page.

How?

Disk space capacity in the archive is handled by selecting which file systems the Data Migration Facility (DMF) will manage and by specifying the volume of free space that will be maintained on each files system. Space management begins with a list of user files that are ranked according to file size and file age.


Users can access the archive infrastructure via the computing infrastructure of SURF (shown in red) or directly via Services Noces, iRODS, and the repository (shown in blue). High-performance transfer protocols are possible using dCache (shown in green). 

Once the data is in the DMF File system it can be copied to two tape libraries where it is safely stored offline. Data migration occurs in two stages. First, a file is migrated to an offline medium (tape). Once the offline copy is secure, the file is eligible to have its data blocks released (this usually occurs after a minimum space threshold is reached). A file with all offline copies completed is called fully backed up. A file that is fully backed up but whose data blocks have not been released is called a dual-state file; its data exists both online and offline, simultaneously. After a file's data blocks have been released, the file is called an offline file.

Migrated files remain catalogued in their original directories and are accessed as if they were still on disk. The only difference users might notice is a delay in access time.

Obtain an account

We are pleased to help you with gaining access to the Data Archive, answer your questions or assist you to specific requests you may have about the service and SURFsara. Please just contact us by email via helpdesk@surfsara.nl.

Specific details on obtaining accounts by affiliates of one of the Dutch Universities or Grand Technology Institutes can be found on our main website. Detailed information on the Data Archive is provided in the Service Level Specification.

  • No labels