| MEDSBIO | HDRMX List Server |

MX raw image data formats, metadata and validation

We note with deep sadness the passing of Andreas Förster of DECTRIS who was instrumental in the organization of the HDRMX effort and the Gold Standard workshops in particular. He will be missed.

IUCr XXV for Prague, CZ included what had been planned as a hybrid Workshop on MX raw image data formats, metadata and validation on this date, but circumstances have resulted in our converting it to a pure virtual e-meeting via zoom on Saturday, 14 August 2021, from 9 am until 3 pm, Prague Time (GMT+2) which is 1 hour later than London time (8 am until 2 pm), 6 hours later than New York City time (3 am until 9 am). This is the agenda.

The most important background information for this workshop is the Gold Standard paper, [Herbert J. Bernstein, Andreas Förster, Asmit Bhowmick, Aaron S. Brewster, Sandor Brockhauser, Luca Gelisio, David R. Hall, Filip Leonarski, Valeria Mariani, Gianluca Santoni, Clemens Vonrhein, Graeme Winter, "Gold Standard for macromolecular crystallography diffraction data," IUCrJ 7, no. 5 (2020)] https://doi.org/10.1107/S2052252520008672


The registered participants were:

Oskar Aurelius MAX IV Laboratory oskar dot aurelius at maxiv dot lu dot se
Frances C. Bernstein Bernstein + Sons fcb at bernstein-plus-sons dot com
Herbert J. Bernstein Ronin Institute for Independent Scholarship,c/o NSLS-II BNL yayahjb at gmail dot com
Aaron Brewster LBL asbrewster at lbl dot gov
Max Burian DECTRIS max dot burian at dectrisdot com
Tom Caradoc-Davies Australian Synchrotron -- ANSTO thomasc at ansto dot gov dot au
Daniel Eriksson Australian Synchrotron -- ANSTO daniel dot eriksson at ansto dot gov dot au
Diego Gaemperle DECTRIS diego dot gaemperle at dectris dot com
Richard Gildea Diamond Light Source richard dot gildea at diamond dot ac dot uk
David Hall Diamond Light Source david dot hall at diamond dot ac dot uk
John Helliwell School of Chemistry, The University of Manchester, UK john dot helliwell at manchester dot ac dot uk
Mohamed Ibrahim Humboldt University mohamed dot ibrahim at hu-berlin dot de
Natalie Johnson CCDC (Cambridge Crystallographic Data Centre) njohnson at ccdc dot cam dot ac dot uk
Maria Ksiażek University of Silesia in Katowice (Poland) maria dot ksiazek at us dot edu dot pl
Marco Klepoch Institute of Biotechnology, AS CR, Centre of Molecular Structure marco dot Klepoch at ibt dot cas dot cz
Edwin Lazo NSLS-II, BNL elazo at bnl dot gov
Filip Leonarski PSI filip dot leonarski at psi dot ch
Brian McMahon IUCr bm at iucr dot org
Jie Nan MAX IV Laboratory jie dot nan at maxiv dot lu dot se
Ezequiel (Zac) Panepucci Swiss Light Source ezequiel dot panepucci at psi dot ch
Martin Savko Synchrotron SOLEIL savko at synchrotron-soleil dot fr
Jan Stransky Institute of Biotechnology, AS CR, Centre of Molecular Structure jan dot stransky at ibt dot cas dot cz
Clemens Vonrhein Global Phasing Ltd., Cambridge, UK vonrhein at globalphasing dot com
Nadia Zatsepin La Trobe University (La Trobe Institute of Molecular Sciences) n dot zatsepin at latrobe dot edu dot au
The objectives of the workshop were given in the IUCr XXV workshops announcement:
Workshop on MX raw image data formats, metadata and validation

Organized by IUCr Committee on Data

Sponsored in part by Dectris Ltd Saturday August 14 2021

Prague, Czech Republic

"Macromolecular crystallography (MX) is the dominant means of determining the three-dimensional structures of biological macromolecules. Over the last few decades, most MX data have been collected at synchrotron beamlines using a large number of different detectors produced by various manufacturers and taking advantage of various protocols and goniometry. These data came in their own formats, sometimes proprietary, sometimes open. The associated metadata rarely reached the degree of completeness required for data management according to Findability, Accessibility, Interoperability and Reusability (FAIR) principles. Efforts to reuse old data by other investigators or even by the original investigators some time later were often frustrated. In the culmination of an effort dating back more than two decades, a large portion of the research community concerned with High Data-Rate Macromolecular Crystallography (HDRMX) has now agreed to an updated specification of data and metadata for diffraction images produced at synchrotron light sources and X-ray free electron lasers (XFELs). This Gold Standard will facilitate processing of datasets independent of the facility at which they were collected and enable data archiving according to FAIR principles, with a particular focus on interoperability and reusability. This agreed standard builds on the NeXus/HDF5 NXmx application definition and the International Union of Crystallography (IUCr) imgCIF/CBF dictionary and is compatible with major data processing programs and pipelines. Just as with the IUCr CBF/imgCIF standard from which it arose and to which it is tied, the NeXus/HDF5 NXmx Gold Standard application definition is intended to be applicable to all detectors used for crystallography, and all hardware and software developers in the field are encouraged to adopt and contribute to the standard." [A. Frster, H. J. Bernstein, A. Bhoemick, A. S. Brewster, S. Brockhauser, L. Geliso, D. R. Hall, F. Leonarski, V. Mariani, G. Santoni, C. Vonrhein, G. Winter, "A Gold Standard for Macromolecular Diffraction Data", in preparation]

This is a tutorial workshop presented by the developers of the Gold Standard to introduce the community to this important upgrade to the interoperability and reusability of macromolecular crystallographic data for both synchrotrons and XFELS,and to give the participants an opportunity to work with and comment on the Gold Standard. To widely share best practices for metadata recording, we encourage the participation of neutron and chemical crystallographers.

The workshop will function as a purely virtual meeting via Zoom from 9 am to 3 pm Prague time (3 am to 9 pm New York time, 8 am to 2 pm London time) on 14 August 2021 with two breaks. In order to participate and receive the Zoom meeting room URL and agenda, contact Herbert J. Bernstein, hbernstein@bnl.gov, giving your name, institutional affiliation and email address by no later than 6 pm Prague time (noon New York time, 5 pm London time) on Friday 6 August 2021. Registration is limited to 100 persons. There will be no additional fee for the workshop virtual meeting registration.

Reminder: The CommDat user forum is at https://forums.iucr.org


Saturday, 14 August 2021
New York
9:00 -- 9:15
8:00 -- 8:15
3:00 -- 3:15
Welcome, Introductions and Set up for zoom, test connections15
9:15 -- 9:45
8:15 -- 8:45
3:15 -- 3:45
Aaron Brewster
Using the Gold Standard for data archival at kilohertz speeds pdf m4a audio clip 30yes
9:45 -- 10:15
8:45 -- 9:15
3:45 -- 4:15
Herbert J Bernstein
(Ronin Institute)
MX raw data formats and the Gold Standard pdf m4a audio clip 30yes
10:15 -- 10:45
9:15 -- 9:45
4:15 -- 4:45
Max Burian and Diego Gaemperle
Stream2 and FileWriter2 pdf m4a audio clip 30yes
10:45 -- 11:15
9:45 -- 10:15
4:45 -- 5:15
Coffee break (bring your own coffee, tea or other refreshments)30
11:15 -- 11:45
10:15 -- 10:45
5:15 -- 5:45
Filip Leonarski
Jungfraujoch: A Data Acquisition and On-the-fly Analysis System for High Data-Rate Macromolecular Crystallography pdf m4a audio clip 30yes
11:45-- 12:15
10:45 -- 11:15
5:45 -- 6:15
Natalie Johnson
Synchrotron Data in the CSD pdf m4a audio clip 30yes
12:15 -- 12:30
11:15 -- 11:30
6:15 -- 6:30
Daniel Eriksson
(Australian Synchrotron)
Facility Report Australian Synchrotron pdf m4a audio clip 15yes
12:30 -- 12:45
11:30 -- 11:45
6:30 -- 6:45
HJB for Dale Kreitler
Facility Report NSLS-II pdf m4a audio clip 15yes
12:45 -- 13:15
11:45 -- 12:15
6:45 -- 7:15
Lunch break (bring your own lunch or breakfast ..., as appropriate to your time zone)30
13:15 -- 14:30
12:15 -- 13:30
7:15 -- 8:30
Facility Reports and Open discussion
Future Directions
What metadata are lacking at the moment?
What problems are we envisioning?
mp4 video clip m4a audio clip

Report of the Workshop

Despite the drastic changes in arrangements necessitated by the impact of the pandemic, including a decision by the United States to severely discourage physical travel to Prague, and difficulty for UK scientists and scientists from other countries in arranging travel to Prague, the move from hybrid to pure virtual was successful.

The workshop had twenty-four participants total with approximately eighteen of them active at most times. Aaron Brewster of LBL presented truly impressive progress in use of the Gold Standard NeXus/HDF5 data format in high-data-rate processing for XFEL experiments worldwide in his talk on "Using the Gold Standard for data archival at kilohertz speeds". Herbert J. Bernstein presented a brief tutorial on the Gold Standard in "MX raw data formats and the Gold Standard". Max Burian and Diego Gaemperle spoke of Dectris' efforts to adopt the Gold Standard and raised the wonderful possibility of going open source on their software in their presentation, "Stream2 and FileWriter2". Filip Leonarski presented an impressive talk on "Jungfraujoch: A Data Acquisition and On-the-fly Analysis System for High Data Rate Macromolecular Crystalography" and raised the question of needing to revisit the decision to use LZ4 and suggested consideration of Zstandard (https://github.com/facebook/zstd), the LZW-family compression supported by facebook. Natalie Johnson reprised her talk from last year on "Synchrotron Data in the CSD". There were three facilities reports: one from NSLS-II from Dale Kreitler of BNL given by Herbert J. Bernstein, one from the Australian Synchrotron given by Daniel Eriksson, and one on Max IV given by Oskar Aurelius.

There was then vigourous discussion for over an hour. The major points raised were:

  1. Clemens Vonrhein asked if it would be possible to adopt and enforce axis naming conventions in the Gold Standard. After considerable debate on the subject it was decided to first treat this as a data validation issue in which software would try to advise people in unfortunate or inconsistent choices of axis names and make an effort to contribute to the nascent IUCr effort on raw data validation with suggestions and offers of support to be channeled from HDRMX to IUCr CommDat via Brian McMahon.
  2. Diego Gaemperle asked for a clear statement in the administrative path DECTRIS (or anybody) should follow in raising issues, asking questions, and making contributions to the Gold Standard. The answer provided was to use the Github issue system for the nexusformat/definitions repository. Most of the Gold Standard is incorporated into the nexusformat/definitions NXmx application definition. Some important details are in the various base classes, such as NXtransformations and NXdetector. For larger issues the interested parties should work against a fork of the main repository and, when ready, prepare a pull request referring to their suggestions in code in that fork.
  3. Jie Nan asked in her name and Filip Leonarski's name if it would be possible to extend the Gold Standard to incorporate tags to support un-indexed diffraction spot data, i.e. lists of full or partial reflections given against centroid pixel coordinates instead of just against [h, k, l]. Jie Nan, Filip Leonarski and Herbert Bernstein will form a small working group, which all interested parties are welcome to join, to make a formal proposal for the necessary tags in both CBF and NeXus for this useful idea. Aaron Brewster has joined the group. The task is basically to clarify the definitions already in NeXus NXreflections so that they are unabiguously tied to the image axis definitions and to tie them unambiguously to equivalent CIF _diffrn_refln... tags. This is a small but necessary cleanup effort. It needs to be done carefully to avoid the axis exchanges and flips for beam centers that have at times delayed the processing of diffraction images. A first strawman proposal hopefully will be done by 22 August and a formal proposal by 1 October. The proposal will be submitted for consideration in NeXus via the route discussed in the prior point. Herbert Bernstein will take care of subsequent transmission to COMCIFS for incorporation in the relevant CIF dictionaries.
  4. The compression issue was raised and Dectris agreed to consider adding the option of using Zstandard as an optional alternative to LZ4 in their software, since the change would have no impact on higher level software and Zstandard is already a well-supported HDF5 plugin. The issue of possibly using a smaller chunk size than full images, perhaps as small as a module, was also raised because it similarly does not impact higher level software and might help with some of the performance issues rased by Aaron Brewster in his talk and in subsequent discussions about the cost of data motion.
  5. In order to gather more information on needed improvements in compression a working group on the topic will be organized. Filip Leonarski and Herbert J. Bernstein have expressed interest.
  6. In his review of this report, John Helliwell noted, "Interoperability has to be defined precisely such as interoperability by different crystallography users. i.e. not between crystallography and marine science, say. This type of interoperability is what CODATA understands by the term, in general. "