Over the next few weeks, following a multi-milliion pound NERC investment, JASMIN will undergo a major capacity and capability upgrade. Read on for details of what’s happening and how JASMIN Phase 4 will affect you.
After 6 years of faithful service, over 5 Petabytes (PB) of storage from JASMIN Phase 1 needs to be retired and taken out of operation. By the end of 2018, a further 6 PB from JASMIN Phase 2 will also need to be retired. However, a total of 38.5 PB of new storage is currently being added, consisting of:
These will be brought into operation in stages over the next few weeks/months. Apart from the extra volume, the biggest change is moving from one type of storage to multiple different types of storage. These different storage types will allow more scalability and more flexibilty in how we support the various different JASMIN users - and the increased flexibility will keep JASMIN at the cutting edge of petascale environmental science.
See the sections below for further important information:
Along with the increased storage, we will be deploying 210 additional servers for the LOTUS batch cluster and the JASMIN community cloud, and 10 more servers for the JASMIN Data Transfer Zone.
With all the new storage and compute an internal network and many software upgrades will be necessary.
The software changes will include:
Watch this space for further details about these new developments and the benefits they will bring to users of JASMIN.
Dedicated flash-based storage has been purchased for use as storage for user home directories. This will enable users to have a larger home directory quota (just how big is yet to be decided), and should significantly increase performance when performing tasks involving the handling of small files (for example code compilation). It should also increase system uptime and perceived reliability, by decoupling home storage from the high-volume storage.
We will be migrating all user home directories to this new storage on Wednesday 14th March. Users will not be able to access their home directories, and will therefore be unable to log in to JASMIN systems on this date. The operation of other JASMIN and CEDA services should be considered “at risk” on this date.
We have already (behind the scenes) migrated over 2 PB of CEDA Archive data from its previous location to some of the new storage detailed above. We now need to complete a similar task for those Group Workspaces which currently reside on JASMIN Phase 1 storage. This task must be complete by the end of May 2018 so this will start immediately - before all the new storage described above will be available - so this may involve some short term inconvenience.
Some of the data will be migrated to the first generation of the new Scale-Out FileSystem (SOF) storage which has different properties than the existing Panasas parallel file system which has provided the bulk of the existing JASMIN storage.
This affects all group workspaces with the following paths:
In all cases, the path to the workspace WILL change. It will likely be of the form
/group_workspaces/jasmin/NAME. Note that the
cems prefix is now deprecated and will no longer be used for newly-created storage volumes: the infrastructure is now simply known as JASMIN.
In SOME cases (denoted by paths
/group_workspaces/jasmin4/NAME) the destination storage will be SOF storage, which is not (in its current configuration) capable of shared-file MPI-IO. If you run code which makes use of this feature, and this is essential to your work, please let the JASMIN Team know via the CEDA Helpdesk firstname.lastname@example.org so that we can address this (and consider an alternative destination). These codes MUST NOT be run against storage without this capability.
Although data in GWS on
cems2 will not be moving in the first phase, some of the moving data will move onto the same underlying storage, which may cause some short-term limits on expansion within those GWS - and at some time during 2018 all these data will be migrated as well so that the Phase 2 storage can be retired.
All migrations will be done behind the scenes initially to a new (hidden) volume, while the old volume remains live. At a time to be arranged with the Group Workspace Manager, a final “sync” will take place before the new volume will be renamed to replace the old. During this short period (hours) the GWS will not be available. Once the change has taken place, the old volume will no longer be accessible.
Unfortunately because of the timing of the various procurements and retirements, and the reorganisations necessary to take advantage of the new storage, some GWS may require more than one move during 2018. We will of course try to minimize disruption.
/group_workspaces/jasmin4are now automounted. This means that they are not mounted by a particular host until the moment they are first accessed. If the workspace you are expecting to see is not listed at the top level (
/group_workspaces/jasmin4/) you should
lsthe full path of the workspace, and after a very short delay the workspace should appear. This also explains reports of different workspaces being mounted on different jasmin machines: only those which are being actively used are mounted at any one time.
On 14th March,
/work/scratch (used as intermediate storage by LOTUS jobs, not for general interactive use) will be set up on new storage. The following steps will take place:
/work/scratchhas now been renamed
/work/scratch-OLDand made READ-ONLY. Any data you may have stored there will NOT BE MIGRATED FOR YOU, as it is only intended for LOTUS intermediate storage, not for general use). The storage system housing this volume will be turned off mid-April, so act now if you need to migrate any of your data from
/work/scratcharea has been created (same size) on newer storage. However, you should configure your software to use this ONLY if you think you need shared file writes with MPI-IO.
/work/scratch-nompiiohas been created (size 250TB) on new flash-based storage which should have significant performance benefits particularly for operations involving lots of small files.
In advance of the above work, a reservation is already in place to drain LOTUS of jobs so that this work can take place as scheduled. From today 7th March 2018, 7-day jobs will not be scheduled. From 6th March, 6-day jobs will not, and so on. Normal LOTUS service should resume once the work has completed on 14th March.
Further details to follow soon
Further details to follow soon