Information for GWS Managers

Your Group Workspace

Your GWS is an allocation of high-performance, parallel-access disk, with associated near-line tape storage, for your project. The GWS disk area is designed to be accessed by multiple members of a project, for the purpose of storing and sharing data during the course of bona fide scientific research or a related activity. These resources are valuable commodities and should be managed carefully.

Your responsibilities

As GWS manager, you are responsible for the data in your GWS, and for the users who access it.

User management

In addition to the named GWS Manager, other JASMIN users may be granted access to the GWS. A link enabling users to apply for access to the GWS is published on the JASMIN web site and any user may request access to a GWS by means of a web form. Submission of this form results in an email being sent to the named GWS manager to approve the request. If approved, the user will be granted access by means of adding them to the UNIX group associated with the GWS. The unix group associated with the GWS is usually named gws_<workspace_name> where <workspace_name> is the name of the GWS.

Data management

Data in the GWS (and all associated value) are the sole responsibility of the GWS Manager. The data are not backed up so deleted data cannot ordinarily be recovered unless preventative actions have been taken by the GWS Manager.

Directory structure

The GWS consists of a top-level directory (named as per the GWS). All directories and files created beneath this top-level directory are the responsibility of the GWS manager, as are any permissions that define who (either inside or outside the group) can read or write particular data. By convention, it is advised that if you are intending to give individual users areas within your GWS, that you create a directory “users” and that individual users create their own directory, named as per their username, within this directory. e.g.

top-level directory/
     users/
          fred/

A service is available whereby particular data in the GWS may be exposed read-only via HTTP to a wider audience. See below for details, but to enable this, a directory named public should be created beneath the top-level directory.

Permissions & ownership

The top-level directory has the group ownership of the associated UNIX group. The “setgid” bit is set on the group permissions of this top-level directory, to ensure that files and directories created beneath this directory inherit (by default) the same group ownership as this top-level directory. This overrides any defaults that the user may have set within their own shell environment.

> ls -ld /group_workspaces/jasmin/myworkspace
drwxrws--- 4 root gws_myworkspace 4096 Nov 26 2014 /group_workspaces/jasmin/myworkspace

The s denotes that the setgid bit is set on the directory.

There are many use cases for a GWS and hence it is difficult to dictate / enforce a policy of file ownership & permissions that suits all cases, but clearly GWS managers need to give some thought to their own ownership & permissions policy to ensure that users have appropriate levels of access to areas with the GWS, that match the needs of their Project but which preserve sensible data management practices. Allowing all files to be readable only by their creator is unlikely to be useful in most cases. Similarly, if the group ownership is set to group users and the group permission set to readable (or writeable), then this will result in all system users (not just those members of the workspace group) being able to read/write the files: again, this is unlikely to be the desired situation in most cases.

Security

Data in the GWS are NOT BACKED UP. It is the responsibility of the GWS manager to ensure the safety and security of the data in the GWS. Tools are provided to enable GWS managers to make a secondary copy of the data on tape using the Elastic Tape service (see below), but no provision is made for automated full or incremental backups. If data is deleted, it will not ordinarily be recoverable unless the GWS has previously taken action to duplicate it somewhere else.

User account security is very important in a multi-user environment such as JASMIN. As a GWS Manager you have a responsibility to users of your GWS but also to all other GWS users in helping to maintain a safe and secure system in which productive scientific work can be done. There is a strict policy of one-user-one-key, and on no account must any user make use of the SSH key of another user to gain access to any part of the JASMIN infrastructure. Private keys MUST be protected by a strong passphrase. Please encourage adherence to these rules by users of your GWS. Any infringements may be dealt with swiftly by removal of user access. No offensive, obscene or otherwise unauthorised data may be stored in the GWS or anywhere else within JASMIN. Users should not store any data of a personal or sensitive nature in the GWS.

Sharing information about data in a GWS

It is common for a project or scientist to require access to data sets held on another GWS. In some cases two projects might be storing the same data set without knowing.

We encourage all GWS owners and users to share information about the data sets they are storing using the following approach. Simply add a text file called dataset_metadata.txt to any directory that you wish to advertise as a "data set". The text file should follow the following format (the only required fields are "Dataset long name" and "Dataset short name"):

Dataset description
===================

# NOTE 1: This information will be made public. It is the responsibility of the author to ensure that it is accurate.
# NOTE 2: Any lines beginning with a "#" character will be ignored.
# NOTE 3: To include multiple lines indent the text by four spaces after the first line.

Dataset long name: Long human-readable name for the data set
Dataset short name: short id using lower case, digits and "_" or "-" characters
Usage conditions/licence: Any useful information about the licence conditions
Description: A detailed description of the dataset - if using multiple lines then indent following lines by 4 spaces.
Contact:  simon.human@some.where.com SimonHuman
Keywords:  e.g. climate, projections, global
Acknowledgements: e.g. where the data came from

# For dates please use either "YYYY-MM-DD" format or "PRESENT"
Data start date: 1999-01-01
Data end date: 2015-08-24

# For longitudes and latitudes please just provide two numbers separated by whitespace
West-east extent: -180.0 180.0
North-south extent: 90.0 -90.0

The GWSs are scanned weekly and all information from these files is compiled into a GWS data sets web page.

Lifecycle

Change of size

Although it is helpful to provide the best estimate of required allocation at the time of initially requesting the GWS, a GWS Manager may request a change in size (increase or decrease) of the GWS during its lifetime. We would positively encourage you to be honest about your requirements so that others can make use of this expensive resource if you are not using it until later in your project, or if you no longer require all the space you originally requested.

Requests for an increase in GWS size will be considered by the “consortium manager” with responsibility for managing an overall allocation to that particular scientific community. Depending on available resources and competing demand, it may not always be possible to increase the allocation, and you may be asked to move data to Elastic Tape to free up disk space.

Keeping informed

Please maintain contact throughout the life of the GWS via the following channels:

If you are aware that a user who has access to the GWS leaves your project or, for whatever reason, no longer needs to be a member of the GWS, please let the helpdesk know, as arrangements may need to be made to transfer the ownership of files and/or directories to another member of the GWS (e.g. the manager) to ensure continued access to the data.

Related Services

Elastic Tape

Details of the Elastic Tape service are available here. As GWS Manager, you are expected to use this service to manage the movement of your data between disk and tape and to make most efficient use of your allocation of online storage.

HTTP access to GWS data

Details of how to set up HTTP access to the public directory of your GWS are available here.

This website and others run by CEDA use cookies. By continuing to use this website you are agreeing to our use of cookies.