ENEAGRID Parallel File Systems
The user HOME in ENEAGRID is located in an OpenAFS file system.
AFS has been designed as a geographically distributed file system
and so it ensures the homogenety of the user environment in any of the
hosts belonging to ENEAGRID. AFS performances depends strongly on the
host location and its network connection to the AFS fileserver.
Presently, at their best with 1 Gb/s connection, they can attain values
similar to the values characterizing the access to a single local disk
(80 MB/s) while much lower values can be found in other cases.
While AFS provides a common general work environment
it is not designed for
parallel I/O
which implies concurrent access to the same file by many different processes.
Parallel I/O can improve very much the scalability of applications on large HPC systems
but it requires a parallel file system (PFS).
ENEAGRID provides PFS based on
IBM Spectrum Scale (GPFS) over a fast interconnection
InfiniBand (IB). or
Omnipath (OPA) according to the cluster.
PFS at the moment is available on sites: Portici, Frascati and Casaccia. On each site two data spaces are available to users:
- SCRATCH space, with high performances, large quota and a delete policy for older files and no backup. The file systems are local to ENEAGRID sites and they are not accessible directly over wide area. The environment variable name is $PFS_SCRATCH0 in all clusters and the link in the HOME is ~/PFS/tmp. The file systems characteristics are:i
- Portici Cluster, 10 TB quota, delete policy: 90 days since last data access.
- Frascati Cluster, 0.5 TB quota, delete policy: 30 days since last data access.
- Casaccia Cluster, 0.5 TB quota, delete policy: 30 days since last data access.
- STORE0 space, lower performances, reduced quota and daily backup. The file systems characteristics are:
- Portici Cluster, environment variable $PFS_POR_STORE0, HOME link ~/PFS/por, 100 GB quota
- Frascati Cluster, environment variable $PFS_FRA_STORE0, HOME link ~/PFS/fra, 100 GB quota.
- Casaccia Cluster, environment variable $PFS_CAS_STORE0, HOME link ~/PFS/cas, 50 GB quota.
- There is backup for data on STORE0 but inactive files, that is files deleted on the file system, will be kept in the backup for one year.
In order to prevent vulnerabilty issues the native GPFS
"mmlsquota" command, which shows user's disk quota, has been replaced by the
"gpfsquota" command.
The
SHELL and the navigation through symbolic links
The access from user HOME to the PFS is done via symbolic links and the
navigation through symbolic links depends somewhat from the current
shell and its settings.
For
bash and
ksh
the downward and upword track is the same. It a directory is accessed
through the link the command cd .. and pwd follow the link notation.
The option -P in both commands permits to access the through file
system names.
For
csh and
tcsh the
default behaviour in navigating symbolic links is the opposite one,
that is the file system names are always used, even if a
directory is reached using a symbolic link. The environment variable
symlinks must be set to the value "
ignore" in order to follow the
symbolic link in the upward track. pwd command returns alwayst the file
system names.
To set the symlinks variable the user
.login
or
.tcshrc must contain the
statement:
set symlinks=ignore
How to use shared directories in GPFS using
UNIX groups
UNIX groups permit to manage the access to shared directories. This
applies mainly to project area in the GPFS file systems.
To
take advantage of this feature the users must provide to ENEAGRID
administrators the definition of the group and of the area that should
be shared by sending a ticket to
https://gridticket.enea.it/
providing
- a group name
- the list of users that should belong to the group
- the path of the directory that must be shared
The
user must be aware in accessing the shared area that the new files will
belong to the user and to the UNIX group that has been provided in the
previous step. The user must specify which kind of access mode will
have the other members of the group by setting the proper umask, as
described in the following:
- login in an ENEAGRID node with access to the shared area
- Choose
the access mode for the directory in the current session; this mode
will affect only the files created in the current session:
- enter the command umask 0007
to grant to all the users in the group a full control of all the
new files.
- enter the command umask 0027
to allow all the users in the group only to read and execute the
new files.
- Enter into the shared directory e.g. cd
/gpor_proj/PROG_ROOT and start working.