ENEAGRID Parallel File Systems

The user HOME in ENEAGRID is located in an OpenAFS file system.  AFS has been designed as a geographically distributed file system and so it ensures the homogenety of the user environment in any of the hosts belonging to ENEAGRID. AFS performances depends strongly on the host location and its network connection to the AFS fileserver. Presently, at their best with 1 Gb/s connection, they can attain values similar to the values characterizing the access to a single local disk (80 MB/s) while much lower values can be found in other cases.

While AFS provides a common general work environment it is not designed for parallel I/O which implies concurrent access to the same file by many different processes. Parallel I/O can improve very much the scalability of applications on large HPC systems but it requires a parallel file system (PFS). ENEAGRID provides PFS based on IBM Spectrum Scale (GPFS) over a fast interconnection InfiniBand (IB). or Omnipath (OPA) according to the cluster.

PFS at the moment is available on sites: Portici, Frascati and Casaccia. On each site two data spaces are available to users:

  • SCRATCH space, with high performances, large quota and a delete policy for older files and no backup. The file systems are local to ENEAGRID sites and they are not accessible directly over wide area. The environment variable name is $PFS_SCRATCH0 in all clusters and the link in the HOME is ~/PFS/tmp. The file systems characteristics are:i
  • Portici Cluster, 10 TB quota, delete policy: 90 days since last data access.
  • Frascati Cluster, 0.5 TB quota, delete policy: 30 days since last data access.
  • Casaccia Cluster, 0.5 TB quota, delete policy: 30 days since last data access.

  • STORE0 space, lower performances, reduced quota and daily backup. The file systems characteristics are:
  • Portici Cluster, environment variable $PFS_POR_STORE0, HOME link ~/PFS/por, 100 GB quota
  • Frascati Cluster, environment variable $PFS_FRA_STORE0, HOME link ~/PFS/fra, 100 GB quota.
  • Casaccia Cluster, environment variable $PFS_CAS_STORE0, HOME link ~/PFS/cas, 50 GB quota.
  • There is backup for data on STORE0 but inactive files, that is files deleted on the file system, will be kept in the backup for one year.

In order to prevent vulnerabilty issues the native GPFS "mmlsquota" command, which shows user's disk quota, has been replaced by the "gpfsquota" command.

The SHELL and the navigation through symbolic links

The access from user HOME to the PFS is done via symbolic links and the navigation through symbolic links depends somewhat from the current shell and its settings.

For bash and ksh the downward and upword track is the same. It a directory is accessed through the link the command cd .. and pwd follow the link notation. The option -P in both commands permits to access the through file system names.

For csh and tcsh the default behaviour in navigating symbolic links is the opposite one, that is the file system names are always used, even if  a directory is reached using a symbolic link. The environment variable symlinks must be set to the value "ignore" in order to follow the symbolic link in the upward track. pwd command returns alwayst the file system names.
To set the symlinks variable the user .login or .tcshrc must contain the statement:

set symlinks=ignore

How to use shared directories in GPFS using UNIX groups

UNIX groups permit to manage the access to shared directories. This applies mainly to project area in the GPFS file systems.

To take advantage of this feature the users must provide to ENEAGRID administrators the definition of the group and of the area that should be shared  by sending a ticket to https://gridticket.enea.it/ providing
    1.  a group name 
    2. the list of users that should belong to the group
    3. the path of the directory that must be shared
The user must be aware in accessing the shared area that the new files will belong to the user and to the UNIX group that has been provided in the previous step. The user must specify which kind of access mode will have the other members of the group by setting the proper umask, as described in the following:
  • login in an ENEAGRID node with access to the shared area
  • Choose the access mode for the directory in the current session; this mode will affect only the files created in the current session:
    • enter the command umask 0007  to grant to all the users in the group a full control of all the new files.
    • enter the command umask 0027  to allow all the users in the group only to read and execute the new files.
  •  Enter into the shared directory e.g. cd /gpor_proj/PROG_ROOT and start working.

Cookies Policy