CINECA 2005 Minutes
-
Presentation from the ESPRESSO development team.
- History
- Current state of development and main issues
- Future plans
-
An overall discussion that focused mostly on questions about ESPRESSO and how information technology (IT) can improve it.
-
VLab available to a wider audience brings a problem:
- Crude-to-wrong calculations made by people without good background on quantum physics and electronic structure. Most are related to choice of pseudo-potentials (PPs) and associated energy cut-off.
- Solution proposed: A database of good pseudo-potentials and means to upload PPs for a given project. No availability of online tools for PPs generation.
-
Discussion on the distributed computing problem. Typical problem is to run parameter space studies. Each point in the parameter space is decoupled of each other –> Parallel workflow suitable for implementing in distributed systems. VLab must have means to monitor and steer the execution of a workflow.
-
We agree that four Web services are very basic to get the VLab up:
- Build the PWSCF environment (create directories, install input in the compute nodes) and run PWSCF
- PseudoPotentials data base access
- Build the PHONON environment
- Workflow management
Day 1
Click here to see Notes from the ESPRESSO group presentation
- CINECA ("Chinaca"): consortium of universities. High performance computing research and development. Developing Web applications, portals, common file systems, etc. Portal development is high on the list of their projects. Common schedular.
Currently: all codes have same input structure, output structure is not quite consistent among each other. (for human readable output). Working on common file structure to store huge datasets.
Ideally, should try to merge the internal datastructure (probably to reduce human development costs). Lower priority work.
Main objective: independent codes that can communicate easily one with another.
CP (Lausanne, Princeton) and FPMD (Bologna, Trieste) will be merged into a unique code.
PWscf will remain a separate code
PHONON will remain a separate code
Also there is a number of post processing applications
Heavily based on 3d FFT and level 3 linear algebra. BLAS packages and other common routines.
Plans to integrate other codes that do completely different things (e.g. electronic codes, devel. in Trieste by Ralph) and time-dependent responses developed by Ralph and Baroni. Will become available in the Espresso package.
Most groups have chosen to implement one application with many functionalities. It is not necessarily the best idea. Problem: must be maintained by the upgrades.
Current attempts are to merge the splinter applications: make them interoperable rather than one giant application. First step is to unify the exterior shell of the codes (the user interaction step). To also make the codes use a unified input structure. Currently the output structure is not the same. Also working on a common file structure to store huge data sets produced by the code.
Main target is to have independent codes that can communicate with one another easily. All use common data sets, common file formats, interoperable data. Important to maintain as much flexibility as you can in terms of adding functionalities. Must agree on the smallest number possible of rules, to allow interoperability. Should allow a contributor, shoudl be possible to write the application, interfacing to other codes, but should be free not to follow the future upgrades, etc. In the same distribution, could have redundancy: same codes might do the same thing.
At the highest level, one communicates through files.
Common data structures are useful for faster updates. e.g., choice of smaller
communities.
protocols: XML-like structures.
Four main codes (in ESPRESSO): PWSCF, CP, FPMD, PHONON:
I/O and data files only refer to these three codes.
PHONON code: uses a perturbation method code that requires some sort of base state.
Currently: inputs identical. Variables are a superset of variables used in each.
Output: initial idea: library of I/O blocks to be called in a pre-defined sequence (to write restart file, etc.)
However: not as flexible as desired.
Switched to XML-like structure for output.
Want the freedom to add information without worrying about order.
Idea: HDF5: not sufficiently flexible. Therefore, used XML approach.
Notes from the overall discussion
Some files can be Gbytes in size. For main usage, single matrices remain binary file.
GE: issues of Fortran Unformatted format
Other issue: input file: is XML required, because namelists not standardized.
Graphical user interface: heavily based on namelist structure. Should perhaps be low priority if not proven to necessary.
XML interfaces: first priority is that all the codes should share the same input format. Can restart code with its own output in most cases, and there is some limited ability to make the codes share i/o as files.
Gordon: is there a library of i/o classes for handling the files? No, it is a mess.
- Codes are Fortran. Fortran binary can be incompatible between different versions of Fortran. Gordon suggests writing simple C/C++ libraries to make this more portable.
Pseudopotential files:
- 1 file per element Input files: described by 6-7 namelists
Discussion on PPs (not very interesting -- jump to basic conclusion)
For each atomic species: name of element + name of pseudo-potential files.
200k per file, 1000 files.
Should not expect the users to understand reliability and accuracy of pseudopotential (PP). Must include description of suggested planewaves, etc.
Should introduce some kind of accuracy indicator (Baroni):
1 PP per element is necessary for a PWSCF run, 10 elements, and 10 different PP based on various code/method/approximations.
Why is it hard to make the pseudo potential files available from a database or a few databases? These don't have to be local. Also, how often do these pseudopotential files change?
Sounds like it pseudo potential sharing is an important community/gridproblem.
Must be able to guarantee different versions of the pseudo potential for backward compatibility. It is a problem of backward compatibility.
I would say that the managment of PsP is important, should be separated from code distribution, should be searchable online. I will use PsP for PseudoPotential.
Problems with PsP is that non-specialists can't judge the quality of a pseudo potential. It should be a) blessed by particular groups, and b) it should describe how and which problems it should be used for, and c) the accuracy. This should be automated. But is this a Vlab problem?
Cesar: only a small subset of elements really interesting in geophysics. So PsP management is not really a huge problem. O, Si, Mg, Ca, maybe C.
Basic conclusion (from the discussion on pseudo-potentials)
- A databased wrapped service for online PseudoPotential files interesting to Vlab problems, curated and described by various criteria. Must determine the criteria for the entries. Can use this eventually to build an expert system.
Back to overall Discussion
What is the purpose of VLab? What do we want to use all of this grid stuff for?
What is the distributed computing problem? Too much discussion on user interface issues and job management. Typical problem is to run parameter space studies. Very decoupled. (The boldface is my idea)
They (espresso group) plan to use common file system and common schedular for all collaborators in Europe. File system: SFS (an nfs variant from IBM). The schedular they will use is LSF. Currently using many different schedulars (LSF, LoadLeveler, PBS, etc). They are building a portal interface to LSF. They use Unicore for workflow/job managemnet. You can specify dependecies, graphs of jobs, etc.
Gordon: Purpose of VLab is this. Renata needs to run a series of codes. Must run many many many of these since she wants to calculate the free energy and its derivatives. Must minimize the management of these complicated goals and maximize the number of collaborators.
How do novice users determine if their results are acceptable? How to you prevent code from being misused?
Back from lunch.
Recap, what do we want? System that will connect Espresso codes with clients, doing so using a inherently collaborative framework.
Basic portal: upload input (prepare input, instead), run, and get output. The GUI is used separately to generate an input file on the user's desktop (I disagree with this part).
Basic problem: provide a Web service for the solver. Some discussion on using Unicore.
Their portal approach is based on product of company called Nice, "EnginFrame". They say that it should be compatible with portlets (www.nice.it). They are tightly coupled with Platform Computing.
Web services:
- run PwSCF
- access PseudoPotentials
- workflow
- We want to better understand the workflow specific to this application.
One issue is that "convergence" is tricky. The user must provide the guidance on how long to run, when convergence is obtained.
Some discussion of collaborative, vendor independent visualization over NB.
Back to workflow. Cesar will answer all questions.
- First, create input file
- Run PwSCF
- Get two files: standard output and very big data file
- For 10 atoms of MgSiO3 (small) takes about 2 hours
- Need to do individual calculations at many different
- parameter points. Convergence at different points can
- vary widely
- How do you know when a code has converged?
- When structure has converged, run phonon
- wave form files can be very large (GB's or larger)
- 34 MB is typical output file size according to Cesar
- 100 atoms is 1-2 GB output file for each parameter point
- At a minimum, you can store charge density file and throw away the wave functions that you calculate The wF can be regenerated from the charge density. This regeneration time is not trivial but an order of magnitude less (or less) than the original full calculations.
- Typically 12 deformations and 15 pressures define your phase space.
- Some discussion on how to do steering to avoid restart.
Day 2
-
Installing ESPRESSO. Have to download a fortran compiler.
-
Marlon will build web services for running the codes, uploading and downloading the files.
-
Baroni: PHONON code that we depend upon is probably the least well maintained.
Things to do:
- Install software (Espresso)
- Web-service for input files (Italians)
- Web service to run code on at 2 machines
- Web service to access pseudopotential database
- Create a simple workflow: run PWSCF followed by PHONON without any output checking
- investigate use of binary file on multiple computers. Try to find C library
- interface to Fortran I/O