JASMIN Phase 4

JASMIN Phase 4

JASMIN set for major upgrade

Over the next few weeks, following a multi-milliion pound NERC investment, JASMIN will undergo a major capacity and capability upgrade. Read on for details of what’s happening and how JASMIN Phase 4 will affect you.  

Important Dates

7 March 2018
LOTUS draining starts 
Long LOTUS jobs not scheduled
14 March 2018  07:00-19:00
Home directory migration 
JASMIN UNAVAILABLE
Feb/March 2018
Group Workspace migration
Temporary interruptions on per-GWS basis
JASMIN Phase 4 Upgrade

Overview

Storage

After 6 years of faithful service, over 5 Petabytes (PB) of storage from JASMIN Phase 1 needs to be retired and taken out of operation. By the end of 2018, a further 6 PB from JASMIN Phase 2 will also need to be retired. However, a total of 38.5 PB of new storage is currently being added, consisting of:

  • 3 PB of Parallel File System storage (PFS)
  • 30 PB of Scale-Out File System storage (SOF)
  • 5 PB of Object Storage (OS)
  • 0.5 PB of dedicated high-performance storage for scratch and home directory use.

These will be brought into operation in stages over the next few weeks/months. Apart from the extra volume, the biggest change is moving from one type of storage to multiple different types of storage. These different storage types will allow more scalability and more flexibilty in how we support the various different JASMIN users - and the increased flexibility will keep JASMIN at the cutting edge of petascale environmental science.

See the sections below for further important information:

Compute 

Along with the increased storage, we will be deploying 210 additional servers  for the LOTUS batch cluster and the JASMIN community cloud, and 10 more servers for the JASMIN Data Transfer Zone.

Network and Software 

With all the new storage and compute an internal network and many software upgrades will be necessary.

The software changes will include:

  • Deploying OpenStack as a replacement for our previous cloud management infrastructure.
  • A new version of the JASMIN Cloud Portal to work with OpenStack.
  • A JASMIN-account based identity service.
  • An external data access service for group workspaces (OPeNDAP for Group Workspaces) -  enabling GWS managers to expose data more easily.
  • Support for fast access to  GWS and CEDA Archive data from within the JASMIN Unmanaged Cloud (also via OPenDAP).
  • Deploying interfaces to the object stores for climate science data.
  • Support for Cluster-as-a-Service in the JASMIN Unmanaged Cloud.

Watch this space for further details about these new developments and the benefits they will bring to users of JASMIN.

Home Directory Storage

Dedicated flash-based storage has been purchased for use as storage for user home directories. This will enable users to have a larger home directory quota (just how big is yet to be decided), and should significantly increase performance when performing tasks involving the handling of small files (for example code compilation). It should also increase system uptime and perceived reliability, by decoupling home storage from the high-volume storage.

We will be migrating all user home directories to this new storage on Wednesday 14th March. Users will not be able to access their home directories, and will therefore be unable to log in to JASMIN systems on this date. The operation of other JASMIN and CEDA services should be considered “at risk” on this date.

Group Workspace Storage

We have already (behind the scenes) migrated over 2 PB of CEDA Archive data from its previous location to some of the new storage detailed above. We now need to complete a similar task for those Group Workspaces which currently reside on JASMIN Phase 1 storage. This task must be complete by the end of May 2018 so this will start immediately - before all the new storage described above will be available - so this may involve some short term inconvenience.

Some of the data will be migrated to the first generation of the new Scale-Out FileSystem (SOF) storage which has different properties than the existing Panasas parallel file system which has provided the bulk of the existing JASMIN storage.

This affects all group workspaces with the following paths:

  • /group_workspaces/jasmin/ (not jasmin2)
  • /group_workspaces/cems/ (not cems2)

In all cases, the path to the workspace WILL change. It will likely be of the form /group_workspaces/jasmin[23]/NAME. Note that the cems prefix is now deprecated and will no longer be used for newly-created storage volumes: the infrastructure is now simply known as JASMIN.

  • You are strongly advised to ensure that any scripts, programs or references DO NOT USE ABSOLUTE PATHS to the workspace.
  • Please avoid using inter-volume symlinks ( this article explains why).

In SOME cases (denoted by paths /group_workspaces/jasmin4/NAME) the destination storage will be SOF storage, which is not (in its current configuration) capable of shared-file MPI-IO. If you run code which makes use of this feature, and this is essential to your work, please let the JASMIN Team know via the CEDA Helpdesk support@ceda.ac.uk so that we can address this (and consider an alternative destination). These codes MUST NOT be run against storage without this capability.

Although data in GWS on jasmin2 and cems2 will not be moving in the first phase, some of the moving data will move onto the same underlying storage, which may cause some short-term limits on expansion within those GWS - and at some time during 2018 all these data will be migrated as well so that the Phase 2 storage can be retired.

All migrations will be done behind the scenes initially to a new (hidden) volume, while the old volume remains live. At a time to be arranged with the Group Workspace Manager, a final “sync” will take place before the new volume will be renamed to replace the old.  During this short period (hours) the GWS will not be available. Once the change has taken place, the old volume will no longer be accessible.

Unfortunately because of the timing of the various procurements and retirements, and the reorganisations necessary to take advantage of the new storage, some GWS may require more than one move during 2018. We will of course try to minimize disruption.

Where has my group workspace gone?


Please note that storage locations with paths starting /group_workspaces/jasmin4 are now automounted. This means that they are not mounted by a particular host until the moment they are first accessed. If the workspace you are expecting to see is not listed at the top level ( /group_workspaces/jasmin4/) you should lsthe full path of the workspace, and after a very short delay the workspace should appear. This also explains reports of different workspaces being mounted on different jasmin machines: only those which are being actively used are mounted at any one time. 

New Scratch Storage

On 14th March, /work/scratch (used as intermediate storage by LOTUS jobs, not for general interactive use) will be set up on new storage. The following steps will take place:

  • Existing /work/scratch has now been renamed /work/scratch-OLDand made READ-ONLY. Any data you may have stored there will NOT BE MIGRATED FOR YOU, as it is only intended for LOTUS intermediate storage, not for general use). The storage system housing this volume will be turned off mid-April, so act now if you need to migrate any of your data from   /work/scratch-OLD
  • A new /work/scratch area has been created (same size) on newer storage. However, you should configure your software to use this ONLY if you think you need shared file writes with MPI-IO.
  • A second, larger area, /work/scratch-nompiio has been created (size 250TB) on new flash-based storage which should have significant performance benefits particularly for operations involving lots of small files.

In advance of the above work, a reservation is already in place to drain LOTUS of jobs so that this work can take place as scheduled. From today 7th March 2018, 7-day jobs will not be scheduled. From 6th March, 6-day jobs will not, and so on. Normal LOTUS service should resume once the work has completed on 14th March.

Compute & Network

Further details to follow soon

Software

Further details to follow soon

This website and others run by CEDA use cookies. By continuing to use this website you are agreeing to our use of cookies.