E3S: ENEA Staging Storage Sharing (rif.F.Iannone)

The storage service at ENEA is based on the distributed filesystem AFS since 1998. The AFS cell enea.it provides users with home-dir as well as storage areas for software and database projects. It can be accessed from geographically distributed ENEA premises. ENEA storage service includes also GPFS, the high performance filesystem for HPC multi-core clusters, (the biggest in ENEA, CRESCO6, is 1.4 PFlops). GPFS is able to provide massive IO required by parallel jobs running on HPC clusters as well as to store big data projects. The ENEA HPC clusters are installed in some ENEA data centres with LSF multi-cluster as resources management system, making the ENEAGRID infrastructure accessible on WAN.

A cloud solution named E3S (ENEA Staging Storage Sharing) has been developed to extend the storage services in ENEA. E3S is based on OwnCloud framework and  allows to sync the staging areas of small experimental labs data acquisition systems with the ENEA ICT storage areas, AFS and GPFS, that are used as back-end.

The data integrity is guaranteed by file synchronization between staging and storage areas. The security is provided by the single-signed on authentication/authorisation system based on Kerberos 5 and Active Directory, that includes the Access Control List for storage areas under AFS and posix-unix access rights for GPFS filesystem. The scalability is achieved separating the hardware resources of the staging components from the ones of storage and sharing: the storage area components are already scalable, since they are computing and storage resources of ENEAGRID infrastructure, and the staging, as well as the sharing components, are based on cloud solutions that are scalable by definition. The reliability requirement is satisfied as data are stored in the local staging area and managed by hardware resource components that are close to the data acquisition systems and that are independent from the storage services over Wide Area Network out of ENEAGRID.

The staging areas are handled by the component named Gateway Node (GWN). It provides a cloud service able to sync filesystems on the Data Acquisition Systems (DAS) with its own local disks, configured as staging areas. In such a way the data acquisition systems can store data quickly even in the case of a network link failure, and then upload files in asynchronous mode, by means of the HTTP methods PUT/GET on GWN local disks, used as storage back-end. The GWN can also provide data access services based on client/server tools for data analysis and visualisation as well as remote control of data acquisition systems of instrumented experiments.

On one hand the GWN is a server node for the cloud storage service, on the other hand it is an AFS/GPFS client node, in order to sync the local staging areas with the distributed storage areas of the ENEAGRID infrastructure. Synchronization is achieved by means of a batch process running periodically on the node with the authorisations to access the storage areas in read/write mode. Once the data have been copied in the ENEAGRID storage areas, they can be shared with different access tools:

- a web portal for all users on the component Middleware Node (MWN): it provides cloud sharing services using storage ENEAGRID as a back-end;

- the Application Servers for professional users running specific applications to analyse and visualise data using Graphic Processing Units (GPU);

- HPC clusters running user parallel codes that are able to mount AFS/GPFS filesystem and access to the storage areas.

The basic object of the E3S architecture is the Staging Storage Folder (SSF), corresponding to a directory in the local staging area of the GWN, as well as to a directory in ENEAGRID storage areas.

In order to optimize the management of the storage areas, the size of a folder associated to an SSF object is limited both on storage area side and on staging area side. The dynamic of the SSF object is defined by the value of the status attribute: active or idle. A newly created SSF object is in the active state. When the size limit on the GWN staging area is reached, a stage-out process starts to remove the older SSF folders from the GWN staging area until its size decreases below the limit. The folders on the storage area are untouched. The timechange attribute of the SSF object is updated periodically by the batch process, synchronizing the local staging and distributed storage areas.

Both the GWN and MWN are virtual computers configured in the ENEA Cloud infrastructure, and are based on the VMware framework. Their main hardware settings are: 1 or more CPU X86_64, 16 GB of RAM, 1 TB of local disk. The operating system is Linux Centos 7 with OpenAFS client of the cell enea.it installed. The GPFS filsystem is imported by means NFS via GEthernet.

The MWN is an OwnCloud server with the filesystem AFS or GPFS as backend. Conversely from GWN, the file sharing app of the MWN The OwnCloud server is enabled, and the LDAP app is configured to use the ENEA ICT access service based on Active Directory. The ENEA users in charge of the experimental labs, can share filesystems AFS or GPFS synchronized with the staging area of the GWN defining their own data-policy for any file or folder.

The SSF object methods have been developed in the Staging Storage Manager (SSM) with Javascript and PHP scripts for OwnCloud web service and Python scripts for batch mode. The figure shows the activity diagram of the SSM with the sequence of actions and conditions of the SSF object workflow model. The SSM is based on a Master & Commander architecture where the master process executes actions and the commander is the controller process issuing commands to master.

A context-sensitive user interface can automatically choose from a multiplicity of options based on the current or previous state(s) of the object. Context sensitivity is almost ubiquitous in modern Graphical User Interfaces (GUIs), usually in the form of context menus. Context sensitivity, when operating correctly, should be practically transparent to the user.

The E3S GUI, shown in the above picture, is based on context sensitive menus and it has been developed using the OwnCloud REST API that allows to access and integrate the functionality of OwnCloud. In our case the GUI in the OwnCloud web service allows to handle a SSF object (with the ENEA tag in the fig.4) by means the SSF menu item that provides to create, stage-out/in the SSF objects depending on the context. It can also display the main info on their own attributes.

How to use E3S in a laboratory/experiment/project is as follow:

- Make a request to TERIN-ICT division of a E3S system for the laboratory/experiment/project by means formal letter, indicating the data size to handle. A E3S solution will be designed by TERIN-ICT-HPC to fit the laboratory/experiment/project requests and E3S account will be enabled.

Gateway and Middleware nodes will be configured for the laboratory/experiment/project as :

gwe3s_.enea.it for the Gateway node

we3s_.enea.it for the Middleware node

- Access to Gatewy node: Log-in with your E3S account from inside the ENEA network (you can use VPN from other sites) open in your web-browser (Firefox, Safari, Chrome) the following URL: https://gwe3s_.enea.it /owncloud/

A pop-up window will ask you to log-in with your ENEA-GRID account

If you don't have any ENEAGRID account you can ask for one at https://gridaccount.enea.it/ . After that the log-in page of the Gateway Node OwnCloud server will appear.

- Handle your E3S account: clicking on your user name in the upper-right corner of the page you can access to the “Personal” page. Here you can change your password.

Here you will also find the link to download the OwnCloud Desktop application for your operating system. Download it and follow the instructions to install it on your PC of the Data Acquisition System. If you have already the OwnCloud client installed on your system you can select the:

Account → 'Add new' button.

In both cases the OwnCloud configuration wizard will show up.

- Set https://gwe3s_.enea.it/owncloud as server address. Insert your E3S username and password. Select the local folder on your DAS (or PC). This will be the root folder where all the acquired data will be saved in subfolders. You can select an existing folder or create a new one.

- Create your Staging Storage Folder (SSF): from your web-browser access to the URL https://gwe3s-enea.it/owncloud and log-in. At your home directory level, click on the '+' icon to create a new folder, naming it: afs:xxxxxx . Note: the 'afs:' suffix is mandatory, while xxxxxx is a name of your choice (max 10 characters). - Click on the '…' icon related to the newly created folder and choose 'SSF' menu item. The folder will loose the 'afs:' suffix and its icon will be marked with the ENEA logo. A SSF in the staging area will be synchronized automatically every 2 hours with the Storage Folder Shared in AFS.

- Stage-out: when the amount of data in your staging area will increase up to near the maximum quota (50 GB), the older SSF will be 'staged-out'. I.E. the folder will be removed from the staging area, while its copy will last on the AFS long-term storage area. In the staging area a small text file with the same name of the SSF and '.fli' (Folder Link Idle) extension will be created. This file cannot be removed. The stage-out process can be forced on individual SSF by selecting the star icon near it, and then by clicking on '…' icon and choosing 'SSF' menu item. You will notice the '.fli' file that will replace the folder. This will happen both on the OwnCloud server stagin area, open in your browser, and on your PC-DAS.

- Stage-in: To get an SSF folder in Staging Area back from the AFS Long term storage, you have to select the star icon near the related '.fli' file, then click on '…' icon and choose 'SSF'menu item. The '.fli' file will be replace by the actual folder. This will happen both on the ownCloud server staging area, open in your browser, and on your DAS. As an SSF can store files up to maximum quota (50 GB), the stage-in process can take long time. However you can close your browser and the stage-in process will continue in background, whilst the 'size' value of the SSF folder will be as 'Pending' . The actual size will be showed when the stage-in process will be terminated and a manual refresh of the web page will be executed.

- Share your data: In your browser, open the URL https://mwe3s-.enea.it. This URL can be accessed from anywhere. Login to the Middleware Node OwnCloud server:

Please note that the password setting on this server is independent from the password of Gateway Node OwnCloud server. Here you can access in read-only mode to the files and folders in your AFS storage area like ENEABOX sharing tool. You can share entire folder or individual files with any user that have an ENEA ASIE account, by clicking on the icon near it. In order to share folders or files with external users, you can also define local user from the 'user' page, that you can access by clicking on your user name in the upper right corner and selecting the 'users' menu.

Cookies Policy