Herbert's Meeting Aug 22, 2020 (03:12) Rajnandani: hello everyone, when are we starting? (03:12) Andreas Foerster: in three minutes (03:12) Gianluca Santoni: 9:15 (03:12) Gianluca Santoni: european time (03:12) Rajnandani: 👍 (03:13) Fabio Dall'Antonia: The screen sharing does not seem to work for presentation mode?? (03:15) Fabio Dall'Antonia: Ok all fine .. (03:47) Daniel Eriksson: *clap clap clap clap clap * (03:48) Andreas Foerster: can the licence of the data be restrictive, or does it need to be open access? (04:05) Clemens Vonrhein, Global Phasing, Cambridge/UK: Does the choice of lab coordinate frame really matter - if the complete description of the experimental setup consistently uses the same lab frame for all axes (detector, rotation axes, beam etc)? E.g. an experiment can be given in any given convention for XDS to happpily process the data - similar to the d*TREK system (with the self-contained & complete d*TREK headers) that came out of the EEC workshop (Bricogne, 1986). (04:13) Jie Nan: for sample NXtransformations, currently it seems one can only defines the locations, but not translations. would be nice if this info can be recorded as well, e.x. helical scan (04:14) Gerard Bricogne: Clemens's mention of the EEC Workshops (1986-89) is relevant. The imgCIF conventions are identical to the d*TREK ones, that were the fruit of that workshop - the first effort to accommodate generic descriptions of a whole diffractometer incorporating an area detector. The abstraction of such descriptions in terms of vectors was the work of David Thomas, as part of his PhD thesis with Uli Arndt. Quoting Jim Pflugrath in Methods in Enzymology 276 (1997), p. 287: "These workshops profoundly influenced many of the programmers working in this area. Before the workshops, each software package was tied to a particular detector. After the workshops, many of the packages were upgraded to process data from many different detectors." (04:14) Filip Leonarski: Clemens - convention might not matter for processing, but certainly might be confusing for visualization of images (if for example beam stop shadow is visualized from the different direction as in "optical" image, this might get confusing) (04:15) GOETZ Andrew: Do you or someone knwo how many detectors support the Gold Standard or are planning to support it? (04:15) Gerard Bricogne: The Gold Standard paper traces the origins of the standardisation effort to an 1995 proposal (no reference given) by Andrew Hammersley. However the correct lineage of ideas should perhaps not start in 1995: Andy was a participant in the EEC workshops, which may or may not have been mentioned in his proposal. I am for course delighted that he spread the word. (04:18) Clemens Vonrhein, Global Phasing, Cambridge/UK: Filip: visualisation (even of shadowing) could place itself e.g. at teh "camera man" view (or the "standard" detector, 2D-image view) no matter what teh convention used ... unless I miss something (highly possible) ;-) (04:22) Clemens Vonrhein, Global Phasing, Cambridge/UK: Regarding meta-data (storage and representation): I can highly recommend looking at the original JCSG system - which is still online (as long as the server keeps running I assume): www.jcsg.org. Go to "Structure gallery" and then a given "Target history" (little blue icon): this shows a huge amount of "context" that unfortunately is lost when only diffraction experiment meta data is captured/archived. Of course, how to formalise storage of this kind of information (visual inspection of crystals, f'/f" values from fluuorescence scan etc) is a different matter ;-) (04:22) Smita Yadav: how we can decide the CC half for resolution cut off? (04:24) Herbert J. Bernstein: To reply to Andy Goetz, Dectris explicitly is supporting the Gold Standard. All manufacturers that support full CIF are effectively supporting the Gold Standard, but minicifs leave out a lot. (04:24) Andy Thompson: Crystal to detector distance, wavelength or beam co-ordinates are derived parameters that are checked more or less regularly and by diverse means. For example I always laugh when I see people quote 5 decimal places for the beamline wavelength., as in one of Loes examples. There is a real danger in populating CBF or NEXUS files with numbers which people tend to believe but which may be systematically wrong. Is the gold standard completed with recommendations for validation of derived values (frequency, methods etc)? (04:29) Rajnandani: one of my dataset upon processing via xds or hkl shows one of the axis to be very long. My supervisor says this could be the actual case of crystal. I am unable to solve the data. Please help me how to do it. (04:29) Clemens Vonrhein, Global Phasing, Cambridge/UK: Andy: maybe each value should be given with its SU? (04:29) Tom Caradoc-Davies: I think it is a good point. Should we have a reccomendation on minimum frequency for measuring direct beam position, distance and wavelength? (04:30) Dale Kreitler: If any of those parameters are significantly off wouldn't that preclude successful structure solution/refinement? (04:30) Clemens Vonrhein, Global Phasing, Cambridge/UK: Rajnandani: this might be topic for e.g. CCP4bb (or contacting some of the software developers directly). (04:30) Tom Caradoc-Davies: It can seriously impact automated data processing (04:34) Irwin: Will this meeting be recorded and circulated e.g on CCP4bb? (04:37) Sandor Brockhauser: @Andy: Also precision/accuracy would be good to be attached next to the values (just like units are provided). (04:40) Marina Nikolova: @Andreas: stream used with 1 client with 176 parallel threads @ EMBL (04:41) Marina Nikolova: @Andreas: for the receiver - C++ (04:44) Tom Caradoc-Davies: Will the upgrades reduce arming time? Can we send multile variables at the same tiem when arming? (04:45) Jie Nan: Would be nice if the monitor can send masked data, currently it's not (04:45) Jie Nan: at least for the first gen of Eiger (04:49) Ben Williams (Diamond Light Source): I'll be ready in a mo. (04:53) Filip Leonarski: Sorry, after problems with Mic I got overheating of the laptop - hopefully will be good now (04:55) Herbert J. Bernstein: Filip, you'll go next and we will cut the break a little. (04:56) Andreas Foerster: Hi Paul, in your inverse-beam example, why not also present it as one 360-degree dataset? (04:58) Herbert J. Bernstein: GDPR warning: This chat is being retained and will be published. If you required your remarks to be deleted, you will have an opportunity to do so before general publication. (05:00) Sandor Brockhauser: Will metadata be present in your inverse beam master files to support the retrieval of radiation dose history? (05:07) Jie Nan: Two Q.s regarding VDS, how about the computing performance comparing to real dataset? One disadvantage for VDS is if one generates several VDS from one dataset, even one only wants to download/process one VDS, one has to pull the wholte original dataset, how do you see that? (05:09) Clemens Vonrhein, Global Phasing, Cambridge/UK: VDS example: https://zenodo.org/record/3611103 (05:21) Valerio Mariani: The XFEL format is really easy to generate from single chunks, just stacking the chunks. The middle way requires rearrangement that can be quite time-consuming, especially for high speed processing (for example, real-time image processing) (05:25) ekm22040: Someone (Clemens) earlier asked whether it really matters what the lab coordinate system is (assuming it remains right-handed). From the point of view of a chemical crystallographer at Diamond, there is an advantage to having an agreeg unique definition, as the gold standard has with McStas. Since our experiments often include apparatus that will shadow the incident and/or diffracted beam, and which may need to be added into the metadata retrospectively, not having a defined coordinate system would make this confusing and difficult. (05:25) ekm22040: Sorry, that wasn't a question for the speaker (05:26) ekm22040: * Someone (Clemens?) (06:04) Clemens Vonrhein, Global Phasing, Cambridge/UK: @Jie: do you have an overview about the fraction of datasets where users take the automatically processed ones as-is (as opposed to downloading data and re-processing at home lab)? (06:07) Dave Hall: At Diamond we think about 80% of deposited structures may be using the automatically processed data. (06:08) Tom Caradoc-Davies: Same (06:28) Loes Kroon-Batenburg: Great Clemens! Very useful. (06:39) Andreas Foerster: Hi Sameer, (06:39) Andreas Foerster: Will develop a process to read metadata from the raw data to populate some of the fields required by the PDB (facility, beamline, etc.)? (06:40) Andreas Foerster: Will you develop... (06:43) Loes Kroon-Batenburg: It is a pity that the Onedep system does not allow proving doi's for raw datasets from e.g. zenodo. Unless we have defined necessary metadata for the deposition, there is no reason to ignore these raw data (06:44) Andreas Foerster: It would be great to associate re-processed structures (pdb-redo or other sources) with the original entries, not as revisions but as alternatives. (06:45) Clemens Vonrhein, Global Phasing, Cambridge/UK: @Andreas: within the PDBX/mmCIF WG we are trying to provide the deposition-ready mmCIF file so that it should indeed contain all that information. Of course, it requires an unbroken connection nbetween the raw data to the PDB deposition (via processing, structure solution and refinement, like our Pipedream for compound screening campaigns). Completely automated pipelines can do this, but as otherwise the provenance tracking throughout a multi-year project can be very complicated ;-) (06:46) Andreas Foerster: yeah, it's a bit of a utopian view (06:46) Gerard Bricogne: I agree with Loes. Expecting repositories to validate raw datasets before linking to them is expecting a lot. For instance, a highly incomplete dataset may be OK for a structure with very high-order NCS, e.g. a virus. A raw data repository would not be equipped to carry out this kind of acceptance test. (06:50) Tom Caradoc-Davies: Sameer, could the master file be parsed during deposition to harvest the gold-standard metadata that described the experiment? (06:53) Sameer Velankar: @Leos - all doi links are welcome and will be added to mmCIF. I was making the point that it is also useful to have some validation to make sure that data deposited to raw data repositories is reusable. (06:55) Sameer Velankar: @Tom - As I suggested it would be very helpful to gather all the metadata throughout the pipeline and submit it to PDB in PDBx/mmCIF format including gold-standard metadata. (06:56) Loes Kroon-Batenburg: Sure, but how to do that that is not estblished yet. So please do not only at the doi to mmCIF but also make the doi to the raw data visible (06:56) Gerard Bricogne: @Sameer: please ignore my comment, I think I misunderstood a point you made about selecting raw data repositories through their having to carry out some kind of validation of contents (not just of compliance with standards). (06:58) Tom Caradoc-Davies: During PDB deposition if we could upload a dataset master file (in the same way we can harvest parameters from an uploaded data processing file) it would save users a lot of time. It would be a relatively small change that would save users a lot of time and would help to make sure data is correctly mapped from master file to the pdb (07:00) Sameer Velankar: @Loes - agree and PDBe is doing it as you showed. I will discuss it with my wwPDB colleagues. (07:02) Sameer Velankar: @Tom - as Clemens has mentioned earlier in the chat there is active discussion in mMCIF working group on capturing more metadata related to processing of raw images. (07:03) Tom Caradoc-Davies: There is a lot of information in the master file that is not mandatory for users to enter. If there was a master file upload then users would follow the lowest effort step and do this, making sure much more complete and accurate metadata describing the experiment. (07:06) Tom Caradoc-Davies: @Sameer - great to hear it. I would point out that they may be different things. The master file describes the experiment, not the data processing. or dataset statistics. (07:25) Alun Ashton: you can dissable presenter view in the ppt (07:59) Dale Kreitler: What oscillation range do you typically use for CX samples with the eiger? (08:06) Jason Price: we collect a full 360, there is a min-Kappa on MX1, so 2 x 180 as well. Assuming lower symmetry, to not have to recollect (08:07) Dale Kreitler: sorry, I meant oscillation width per frame (08:09) Jason Price: default is 0.1 degrees, just hasn't been explored enough. (08:09) Dale Kreitler: thanks (08:10) Jason Price: finer slicing may help fril the count rate collection, but attenuation, is very well characterized. (08:16) John Helliwell: Personally I deposit my raw data at my Manchester University Data Archive or Zenodo. I am assured by these that their long term funding is secure. Secondly, I have not received thus far any queries or statements of failure about the reprocessing those archived raw data sets. [John Helliwell, Chairman of IUCr CommDat, explains: My reply to @JohnWestbrook was because he wondered why Europeans were not using either of the two USA based MX domain specific repositories.] (08:19) Clemens Vonrhein, Global Phasing, Cambridge/UK: @John: How would one get access to data from Manchester Data Archive? (08:20) John Helliwell: My raw data that I archive are linked to each of my publications which of course explicitly state those dois. (08:29) Dave Hall: can see example of link to zenodo data here: https://www.ebi.ac.uk/pdbe/entry/pdb/5rel (08:30) Loes Kroon-Batenburg: Great! (08:30) John Helliwell: I forgot to say that my PDB deposits also have the associated raw data dois as well as stated in the publication. Reusers of my lab’s raw data can thereby find the doi in both those places, the publications and the PDB deposits. (08:33) John Helliwell: @johnwestbrook The funders in Europe require that the underpinning data, including raw data, are available long term. This requires assurances by an archive that they have long term funding, >10 years. (08:34) Oskar Aurelius: What is the NXmx-view on experiment-related metadata which is not strict MX-related, such as excitation laser properties, ligand/substrate mixing properties and similar events for time-resolved MX studies? (08:36) John Helliwell: @johnwestbrook The IUCr Committee on Data provides official and enthusiastic support, led by me, to the USA raw data MX domain specific archives. My earlier comments are my personal researcher based in Europe perspectives. [John Helliwell, Chairman of IUCr CommDat, explains: My reply to @JohnWestbrook was because he wondered why Europeans were not using either of the two USA based MX domain specific repositories.] (08:39) Dave Hall: what is the long term funding outlook for the domain data resources? (08:40) Gianluca Santoni: Are facility-based solutions like data.esrf.fr ok for everyone? (08:44) Loes Kroon-Batenburg: @gianluca, yes, all dataset with doi are welcome (08:44) Clemens Vonrhein, Global Phasing, Cambridge/UK: @Herb: Thank you very much for the organisation and chairing! I found it a very useful collection of talks and presenations - to stay in contact with the current status even in more difficlt times. (08:45) Dave Hall: Thanks everyone (08:45) Tom Caradoc-Davies: Thanks everyone! (08:45) Daniel Eriksson: Thanks! (08:45) Gianluca Santoni: Thanks everyone! (08:45) Jie Nan: Thanks! (08:45) Loes Kroon-Batenburg: @thank you, Herb (08:45) Jason Price: thanks! (08:45) ian Clifton: Yes, thanks all! (08:45) jaishima: thanks, i always learn a lot in these sessions! (08:45) Fabio Dall'Antonia: Thanks everyone (08:45) Sameer Velankar: url to get all entries with raw data - (08:45) John Helliwell: Thankyou to Herbert and Andreas for organising all this great workshop. (08:45) Luca Gelisio: Thanks, great workshop! (08:45) Sameer Velankar: https://www.ebi.ac.uk/pdbe/search/pdb/select?group=true&group.field=pdb_id&group.ngroups=true&json.nl=map&fl=*&rows=10&start=0&q=(q_raw_exp_data_doi%3A*)&wt=json (08:45) Scott Classen: great talks. Thanks all!! (08:45) Oskar Aurelius: Thanks everyone! (08:45) Dale Kreitler: I vote for new york time (08:45) Dale Kreitler: Zzzz... (08:46) Tom Caradoc-Davies: Melbourne time! (08:46) Daniel Eriksson: ^^this (08:46) Dave Hall: GMT (08:46) Dave Hall: in a weekday (08:46) Sameer Velankar: https://www.ebi.ac.uk/pdbe/search/pdb/select?group=true&group.field=pdb_id&group.ngroups=true&json.nl=map&fl=pdb_id,raw_exp_data_doi&rows=10000&start=0&q=(q_raw_exp_data_doi%3A*)&wt=json (08:46) Wei-Chun Kao: Thanks for organising this meeting, all speakers and participants. (08:46) Sameer Velankar: Thanks (08:47) Hathaway, Paul (DLSLtd,RAL,LSCI): Thanks All (08:47) Suzanna Ward: Thanks for great talks and interesting discussion The following issues in the dicussions were noted by John Helliwell: Issues raised at the Gold Standard Workshop noticed by JRH, needing action by the Gold Standard Team:- Andy Thompson, Soleil: There are invalid precisions given to some parameters such as X-ray wavelength. Filip Leonarski, SLS: The calibration of integrating and photon counting detectors are different. Aaron Brewster, LBL: For time-resolved MX studies a time stamp is needed for each data frame ie as core metadata. Jason Price, ANSTO: For chemical crystallography (and maybe MX) core metadata should include a crystal description notably its colour and crystal sample dimensions.