Since JASMIN came into being in early 2012, it has grown significantly in scale and complexity but also in the number and variety of users it serves, and the types of scientific workflow it supports. As the requirements of its user community evolve, so does JASMIN. The Phases below describe the major procurement and upgrade projects which have taken place. These have been complemented by the work of teams within CEDA and STFC's scientific computing department in developing and maintaining the infrastructure and its component services and software to create the major e-infrastructure facility now familiar to over 1,500 users and 200 science projects.


Phase 1 (2011/2012)

A "super-data-cluster" is born

The initial technical architecture was selected to provide a flexible, high-performance storage and data analysis environment, supporting batch computing, hosted processing and a cloud environment. The CEDA Archive had outgrown its previous hosting environment and the increasing need for scientific workdlows to "bring the compute to the data" drove the development of an infrastructure to support analysis of archive data alongside datasets brought into or generated by projects in their own collaborative workspaces.

The first components deployed in this phase were:

  • Low latency core network
  • High-performance disk storage system supporting parallel write
  • Access to expandable tape storage for near-line storage
  • Resources to support bare-metal and virtualised compute
  • A batch scheduler
  • Block storage for storing virtual machine images

A paper describing the initial architecture is available (doi:10.1109/BigData.2013.6691556).

Learn more
Phase 1 details
Component Details
Disk storage Initial fast disk 4.6 PB RAL (0.5 PB Reading, 0.15 PB Leeds)
Batch compute Initial compute for LOTUS 650 cores
Network Initial Gnodal-based network
Virtual compute VM licences Virtualisation software licenses for hosting virtual machines
Tape storage Tape drives Media 4 x T10KC drives 2.5 PB
Software Data movement software, Community intercomparison suite
Other Machine room environment monitoring equipment

Phase 1.5 (2012/2013)

Enabling NERC Big Data projects

Already establishing its ability to facilitate projects with data-intensive workflows, JASMIN was given additional capability to support several NERC "Big Data" projects across a range of disciplines: near-real-time processing of EO data, Earth surface deformation analysis and seismic hazard analysis, along with supporting a cloud infrastructure used within the Genomics community.

Phase 1.5 details
Component Details
Disk storage Minor addition to fast disk storage 0.4 PB PFS
Batch compute Interim expansion 1920 cores
Network Core network upgrade
Virtual compute Virtualisation licenses: expansion of licensed estate
Tape storage Tape drives & servers Tape media 2 x T10KC drives 3.5 PB media
Software Initial version of Elastic Tape interface (ET) JASMIN Analysis Platform (JAP)

Phases 2 & 3 (2013-15)

Major expansion over a 2-year period

Having proved its worth as a concept able to facilitate many large data-intensive environmental science projects, JASMIN underwent a major upgrade to provide the necessary storage and compute for its stakeholder community. Its remit now extended beyond the initial NCAS and NCEO stakeholders to serve the whole of the NERC community.

Phases 2 and 3 details
Component Details
Disk storage Major expansion to fast storage Block storage for VM hostingHigh-performance storage for databases 11 PB PFS 0.9 TB BLK0.05 TB high-IOPS BLK
Batch compute Major expansion to LOTUS compute Dual capability as hypervisors for virtual machines, or as LOTUS nodes 3800 cores, 4 high-memory nodes (2 TB RAM)
Network Major redesign & implementation
Virtual compute Expansion of licensed estate
Tape storage Major expansion 7.5 PB tape media
Software Community Intercomparison Suite JASMIN Cloud Portal Scientific end-user software Cloud tenancy management interface
Other User documentation WebsiteDataset construction

Phase 3.5 (2016-17)

Interim upgrades and strategic proof-of-concept projects

Ahead of larger investments in years to come, limited but carefully-targetted upgrades ensured that key systems continued to operate at the scales needed. A proof-of-concept project tested the feasibility of using OpenStack instead of a proprietary solution for JASMIN's growing Community Cloud infrastructure.

Phase 3.5 details
Component Details
Disk storage Object store proof of concept Replacement of cloud block storageContinued use of Phase 1, 2 storage inc. battery replacements 1.2 PB HPOS 0.4 PB BLK
Batch compute Interim expansion of batch compute Continued use of Phase 1.5 & 2 compute (~4000 cores) 1120 cores
Network Essential network & firewall support
Virtual compute Cloud software support
Tape storage Tape media 5 PB
Software OpenStack proof of concept

Phase 4 (2017/18)

Major expansion with new technologies

Phase 4 introduced new types of storage at the scales needed to support scientific workflows into the future. Successful proofs-of-concept with Scale Out Filesystem (SOF) and high-performance object storage (HPOS) enabled large deployments of these, with SOF adopted as the primary storage medium for Group Workspace storage, and tooling and services under development to enable use of object storage within cloud-based workflows. LOTUS gained a major upgrade of >5000 cores, in a network enhanced for future expansion. Cloud tenancies were migrated to an OpenStack platform and management interfaces adapted to match. Meanwhile testbeds for Cluster-as-a-Service and JuPyter Notebooks provided previews of exciting capabilities to come.

Phase 4 details
Component Details
Disk storage BLK storage for cloud, Major expansion of SOF Object storage (HPOS)New SSD for home areasReplacement of earlier PFS 0.4 PB BLK 30 PB SOF5 PB HPOS0.5 PB SSD3 PB PFS
Batch & physical compute Expansion of batch compute New servers for Data Transfer Zone 210 servers, 5040 cores 10 servers for DTZ
Network Implementation of "super-spine" network Expansion & upgrade to management network Ensuring future connectivity on site
Virtual compute Production deployment of OpenStack as cloud platform, migration of tenancies
Software OpenStack upgrade for JASMIN cloud portal OpenDAP4GWSCluster-as-a-Service testbedContainerised Jupyter Notebook deployed in Kubernetes Management capability for OpenStack cloud tenancies Autonomous exposure of data from GWSsDynamic virtualized batch computePoC for Python Notebook service
Other Bulk migration of data from Phase 1 hardware Machine room hardware Ahead of retirement of old hardware Racks, PDUs, cabling, environment monitoring equipment

Phase 5 (2018/2019)

Tape storage & other strategic upgrades

Together with STFC's IRIS consortium, a major upgrade to a shared tape storage facility was procured with capacity for 65 PB of near-line storage. JASMIN also acquired its first GPU servers: a small prrof-of-concept cluster of 5 systems.

It was time to say goodbye to several tonnes of storage and compute hardware from previous phases which were now retired, and needed to be removed to make room for new equipment.

Phase 5 details
Component Details
Batch compute Initial GPU servers Extra SSD disks for Phase 4 batch compute PoC with 2 x small, 1 x large system
Network Firewall hardware Routers and 100G connectivity
Virtual compute New hypervisor servers New backup appliance For "cattle-class" virtual machines
Tape storage Replacement of tape library Tape media Shared procurement with STFC IRIS. 65 PB capacity. 11 PB (LTO and TS1160)
Software OpenStack software development Cluster-as-a-Service development
Other Decommissioning of Phase 2 hardware

Phase 6 (2019/20)

Batch compute upgrade and network improvements

LOTUS was the main focus of this phase with the replacement of old compute nodes with new higher-memory servers and work to migrate from Platform LSF to SLURM as the scheduler. A change of operating system also meant redeployment of CEDA and JASMIN service hosts throughout the system.

Phase 6 details
Component Details
Disk storage BLK storage replacement Multiple retirement dates but avoiding transition all at once. To run alongside then replace existing hardware.
Batch compute Replacement of Phase 1 and 2 compute nodes Solves flow control issue for interaction with Phase 4 storage. Current 4 x 2 TB high-memory nodes to be replaced with 132 x 1 TB nodes
Network Improvements to "exit pod" network Enhance connectivity between JASMIN & wider internet
Virtual compute Replacement of virtualisation servers For “pet” class virtual machines where reliability is important
Software Replacement of Platform LSF with SLURM scheduler Change of operating system Move to open-source scheduler with lower ongoing costs Move from RedHat Enterprise to Centos7

Phase 7 (2020/2021)

TBC

Details of this procurement phase are still being finalised.