[IUCr Home Page] [MEDSBIO]

| IUCr Home Page | CIF Home Page | CBF | NeXus |
| BioSync | MEDSBIO list | MEDSBIO list archive | Meetings |
| Make a Submission to the MEDSBIO web site |
MEDSBIO

Consortium for Management of Experimental Data in Structural Biology
Fourth imgCIF workshop (new series)
at BNL on Thursday 22 May 2008


Workshop on Raw Image Formats in Structural Biology

Herbert J. Bernstein, yaya@dowling.edu
Robert M. Sweet, sweet@bnl.gov
The new imgCIF workshop series has been sponsored in part by DOE under grant ER64212-1027708-0011962, NSF under grant DBI-0610407, and NIH under grant 1R13RR023192-01A1.

Workshop Report

This is a report on the fourth in the new series of imgCIF workshops that began with a workshop at the summer 2006 meeting of the American Crystallographic Association. One major objective of these workshops was to find and remove the obstacles to adoption of a common interoperable format for synchrotron diffraction images. We are pleased to report excellent progress in that direction. Three out of four of the major detector vendors had representatives at this workshop, and all three agreed to cooperate in an effort to define an agreed minimum set of common tags that would be provided in synchrotron diffraction images and to participate in a "bakeoff" to help resolve any open issues with respect to interoperability. Much work remains to be done to move from having imgCIF as an available option to having it used as a routine tool in the collection of images.

In addition there is now wide recognition that there is much to be gained from careful consideration of the interactions among multiple raw data image formats in structural biology, such as imgCIF, NeXus, HDF, XML and the microscopy formats. In addition to the effort on the imgCIF bakeoff, it was the consensus of the group that another workshop addressing these more general issues is needed within the next one to two years.

Workshop Participants (left to right): Matt Dougherty, Robert Sweet, Mark Pressprich, Nick Sauter, Frances Bernstein, Georgi Darakev, Justin Anderson, Nikolay Darakev, Curtis Rueden, Kevin Eliceiri. Partial view of Chris Nielsen center front. Herbert Bernstein behind the camera. Andy Howard present but off-camera. John Skinner joined the group for lunch.

The pace of data collection and the volume of data collected at synchrotron beam lines is increasing. The ACA Data, Standards, and Computing Committee spearheaded an effort to improve the efficiency of the handling and storage of these data by encouraging the adoption of common data formats and standard software interfaces. The goal of this was firstly to have the data be self defining, therefore equally accessible to data-reduction and -visualization codes. The second goal, for the purposes of secure archiving, was to provide robust internal documentation of the source of the data.

The current effort began in 2005, building on work started in the mid 1990's on a Crystallographic Binary Format (CBF) proposed by Andy Hammersley. This effort was the basis for the image-supporting Crystallographic Information Format/Crystallographic Binary Format (imgCIF/CBF). The first imgCIF/CBF workshop took place at Brookhaven National Laboratory in 1997 and proposed a format combining support for an efficient binary representation of images with a fully CIF-compliant ASCII equivalent. An imgCIF/CBF dictionary and software to support the format were created, are available on the web, and are described in Volume G of the IUCr International Tables for Crystallography. Now the community should adopt a consensus standard for management of data at synchrotron beam lines and to make it easier for users to process data taken from various beam lines. Also, as our science evolves, new concepts will be considered: possibilities include NeXus and XML.

The first workshop in the new series on "Management of Synchrotron Image Data: imgCIF File System and Beyond", was held on 22 July 2006 as part of the 2006 ACA meeting in Honolulu, Hawaii. That workshop concluded that that was "the right time for more widespread use of imgCIF ... [and that] SR sources should start writing imgCIF image files as soon as possible, employing the imgCIF dictionary already adopted by the IUCr Committee on the Maintenance of the CIF Standard (COMCIFS) and published on the web and in International Tables Volume G.] " [from the report of the workshop, see http://www.medsbio.org/meetings/ACA_2006_WK02_Report.html].

Subsequent to the Hawaii workshop intensive work was started in response to these recommendations. Both the imgCIF dictionary and the supporting software library were reviewed and, after meetings at SLS and ESRF, extended. See SLS_report.html and ESRF_report.html for more information on those meetings. The work continued in collaboration with members of the community (see the imgCIF mailing list http://www.iucr.org/iucr-top/cif/cbf/imgcif-l/.

In light of this activity a workshop on data formats for synchrotron image data was held after the NSLS/CFN meeting on 24 May 2007 at BNL in the Biology Department. Topics discussed included proposed extensions to imgCIF, the use of NeXus, progress on software and the status of imgCIF at Diamond and at SLS. That workshop concluded that work was needed on support for the handling of uncorrected images and true bitmaps, creation of a utility to "tidy" CIFS, creation of an agreed interface with XML, NeXus and HDF, clarification of the specification of the relationship between detector specification and the physical locations of pixels in the laboratory, tags for robotics and remotes access, and the creation of more cookbooks. See http://www.medsbio.org/mettings/BNL_May07_imgCIF_Workshop_Report.html.

In the short time between the second and third workshops, work on many of the items on this task list began. In addition, building on discussions with J. Steinbrener at the second workshop, discussions with Matt Dougherty on the image needs in the microscopy community, and, through Matt Dougherty, with Mike Folk of the HDF Group on techniques for integration between imgCIF and NeXus, HDF and XML were started. Work on CBFlib continued and version 0.7.8 was released.

The third imgCIF workshop was held in two sessions at BSR 2007 in Manchester and at Diamond. Herbert Bernstein and Alun Ashton organized this workshop. The purpose of this workshop was to provide a review of the status of imgCIF and CBFlib for the European user community and to discuss the integration of imgCIF with NeXus, HDF and XML. The Manchester session was used for an introduction and review of the status of imgCIF and for some discussion. The Diamond Light Source session was used to deal with more detailed technical issues and further discussion. That workshop raised several issues that needed further consideration: the Dectris Pilatus 6m miniCBF headers, integration with NeXus, HDF and XML, and dealing with common issues between microscopy and crystallography image handling. See BSR_2007_imgCIF_Workshop.

After discussion with the funding agencies, and signs of good progress on the adoption of imgCIF, a fourth workshop in the new series of imgCIF workshops was added. That workshop on "Raw Image Formats in Structural Biology" was held on 22 May 2008 in the Biology Department (Building 463) at Brookhaven National Laboratory. It was scheduled just after the NSLS/CFN user meeting.

The charge to the participants was:

Over the past 2 years, imgCIF has seen increasing use, and the interactions among raw image formats for x-ray crystallography, neutron crystallography and microscopy have started to be addressed. This one-day meeting will be a follow-up to the 2006--2007 imgCIF workshops at the summer 2006 ACA meeting, at BNL in May 2007, and in summer 2007 in conjunction with BSR 2007 in Manchester and at the Diamond Light Source.

We will have reviews of the current status of imgCIF, exploration of ways to move between imgCIF and NeXus using XML and HDF and ways to work with microscopy and tomography images.

The morning and part of the afternoon was used for presentations and rest the afternoon for discussions and plans for the future.

Agenda:


8:30 am Breakfast

8:30 am Welcome and introduction to the workshop    H. J. Bernstein, Dowling
                                                    R. M. Sweet, BNL
9:00 am Participants introduce themselves

9:10 am Breakfast

9:30 am Progress in adoption of imgCIF and
          integration with NeXus                    H. J. Bernstein, Dowling
                                                    handout
                                                    imgCIF dictionary
                                                    CBFlib manual
                                                    cbf2nx.c

10:00 am Discussion

10:30 am "A beamline perspective on data formats"   A. Howard, IIT
                                                    handout

10:20 am "The Importance of Standard Image Formats for Scientific Progress"
                                                    N. Sauter, LBNL
                                                    abstract
                                                    handout

11:00 am Break

11:10 am "Through the Looking Glass: creating an HDF data prism"
                                                    M. Dougherty, NCMI
                                                    abstract
                                                    handout
                                                    ImageCore V01 RFC (pdf)

11:40 am "Bio-Formats and the Open Microscopy Environment"
                                                    C. Rueden, LOCI
                                                    handout

12:30 pm Lunch

1:10 pm "CBF: Issues for Vendors"                   C. Nielsen, ADSC
                                                    handout

1:30  pm Discussion of vendor issues with data formats

2:00  pm Discussion of imgCIF/NeXus/HDF integration issues

2:30  pm Discussion of future plans and funding for
         software support and meetings

3:00  pm Break

3:15  pm Preparation of recommendations and conclusions

4:30  pm Adjourn the BNL workshop

7:00  pm Post-workshop dinner

Discussion, Recommendations and Conclusions

This workshop involved wide-ranging and detailed discussion on all of the material presented (see the handouts linked to in the agenda above). Three topics generated very strong interest: the impact of the open-source licensing on vendor acceptance of software for support of image formats, the creation of an agreed list of mandatory tags and automatic use of FUSE and HDF (see the Dougherty talk) to facilitate management of image data in a directory structure to facilitate combination of images with appropriate headers. All of these discussions produced the seeds of consensus agreements.

The discussion on open-source licensing was primarily one of mutual education about the needs of vendors and the realities of modern open source licensing practices. The vendors need assurance that they will not have to make public software that they are not ready to make public. On the other hand it is important to protect and preserve access to the base of community software and only Richard Stallman's GNU General Public License and GNU Lesser General Public License have a clear track record of standing up under legal attack. A workable compromise between these competing needs is to use the same license that is already used for many of the libraries used by existing vendor packages, the LGPL. The objection was raised that those libraries distributed with operating systems, such as glibc and libm are subject to less restrictive terms under the LGPL than libraries that are not distributed with operating systems. As holder of the copyright on CBFlib, Herbert Bernstein assured the vendors that they will not be held to a stricter standard in using CBFlib than they are held to in using glibc and libm. We note for the record that recently CBFlib became one of the libraries distributed as part of the Debian Linux operating system, automatically placing CBFlib under the same liberal conditions as apply to glibc and libm.

In the discussion of the interaction with HDF it was noted that the necessary code to convert from imgCIF to HDF is already available, but that the general conversion from HDF to imgCIF is much more difficult. However, despite that difficulty, there was general agreement that it would be worthwhile to try to use the HDF-based FUSE approach suggested by Dougherty as a way to integrate one or more imgCIF headers with diffraction images as an alternative to the use of imgCIF templates.

An outcome of the workshop was the clear agreement by all three vendors present (ADSC, Rigaku, Rayonix) that they would help to support imgCIF (see the recommendations on agreeing to a minimal set of tags and cooperating in a "bakeoff"). The discussion was on how best to do it, not whether to do it. On this basis, it would be fair to say the new imgCIF workshop series has achieved its primary goal. However, it is important to note that acceptance in the sense of routine use by users in real experiments, rather than by detector vendors testing their software, has not yet been achieved.

It was noted that the use of imgCIF for the Dectris Pilatus 6m detector and the efforts at Diamond and SOLEIL have helped to generate interest in imgCIF, but that there is not yet significant use of imgCIF by users other than at SLS, and that the SLS miniCBF format is not a full CBF format.

It was noted by people from both communities that the crystallographers and the microscopists have common problems and can learn useful things from the solutions being adopted in each of the communities.

It was noted that plans are afoot for archiving of raw data. It is hoped that the role of formats such as imgCIF, NeXus, HDF and XML will be considered in the planning.

It was noted that a java version of CBFlib needs to be created. The work done by J. Wright in creating a python wrapper and by N. Sauter in creating a C++ wrapper for CBFlib should provide appropriate templates. The only reason that this has not been made a firm recommendation of the workshop is the uncertainty about obtaining funding for the work.

The major conclusions and recommendations of the workshop were:

  1. Are we done? It was the unanimous consensus of the participants that there is a need to continue the workshop series, but that it is time to break into two different efforts: one primarily on the technical details on imgCIF and one primarily on the broader issues of biological image formats. The technical effort on imgCIF would be driven by a series of "bake-offs" in which different software and sample images would be collaboratively tested, and the more general effort would involve a next workshop sometime in the next one to two years. Various participants agreed to investigate funding and practical arrangements for these two efforts. In paticular N. Sauter and A. Howard will investigate the possibility of doing some off the bake-off effort in conjunction with appropriate synchrotron user meetings, and H. Bernstein will arrange a web site to hold the necessary test images for the bake-offs. Dealing with the technical imgCIF issues in conjunction with the bake-offs will then allow more time in the next workshop for exploration of the interaction among biological image formats. K. Eliceiri and M. Dougherty will take the lead on pursuing possible funding sources for the more general workshops, possibly using teleconferencing to help ensure the widest participation.

    It should be noted that a few days after the workshop, N. Sauter reported,

    "In the June/July time frame, I will arrange to modify the ALS sector 5 beamlines to write both CBF & ADSC formats. Also, I will add CBF support to Webice. Then we will do an experiment to determine if a CBF-formatted dataset can be processed in real time using Webice+Labelit+Mosflm. Data acquired will be posted on the new example dataset Website to be hosted by Herbert Bernstein. Thus we will have some initial bakeoff results within the next few months."

  2. How should imgCIF software be licensed? After detailed discussion of alternatives to the current licensing of CBFlib under the GPL with licensing of the API under the LGPL, it was agreed that use of the LGPL would be extended to more of CBFlib. This was done the day after the workshop by distribution of the following announcement:

    imgCIF/CBF is a format for image data, such as synchrotron diffraction images. CBFlib is a software package supporting the imgCIF/CBF format. For background information, see Hall and McMahon, International Tables for Crystallography, Volume G, Definition and exchange of crystallographic data, IUCr, Springer, 2008, Dordrecht, NL, esp. chapters 2.3, 3.7, 4.6 and 5.6.

    The CBFlib package available from

    http://www.sourceforge.net/projects/cbflib

    is an open source package covered by the GNU General Public Licence (GPL). The CBFlib Applications Programming Interface (API) is also covered by the GNU Lesser General Public License (LGPL), which is also know as the GNU Library Public License.

    Effective immediately, all functions, methods, subroutines and procedures in the CBFlib package will be considered to be part of the API and to be covered by the LGPL as an alternative to the GPL that covers everything in the CBFlib package.

    This change results from the discussions at the 22 May 2008 workshop at BNL to help make detector vendors and others with proprietary software more comfortable in using the CBFlib package.

    Thanks to Teemu Ikonen, since February 2008 CBFlib is a debian package and you may link to the functions in the CBFlib package from a proprietary program just as you may link to glibc or to the trigonometry functions in the libm math library.

    Use it in good health.

    -- Herbert J. Bernstein

    In order to avoid any ambiguity in the interpretation on this annoucement, over the next few weeks the comments in the code will be updated to reflect this licensing and the libraries explicitly reorganized to incorporate versions of all functions, but with some name changes to void conflicts.

  3. What tags must be present in an imgCIF file? A consensus proposal of a required minimum set of tags for a useful imgCIF file will be resolved among the three vendors present at the workshop, ADSC, Rigaku and Rayonix. The basis for the set will be the tags provided in C. Nielsen's ADSC jiffie. Those tags are sufficient to meet the requirements of mosflm and adxv. The vendors will meet during the upcoming ACA meeting in an attempt to converge on a common list, not only of the required tags, but also, when possible, default values, so that by changing as many mandatory tags as possible to implicit, there will be an unambiguous meaning for imgCIF files that fail to state the tags for which defaults have been specified. The first list proposed will be for routine protein crystallography experiments and then an appropriate list for small angle X-ray scattering experiments will be proposed.

  4. What tags need to be added to the imgCIF dictionary

    • New tags are needed to handle sparse pixel data, such as masks and replacmement values for overloaded pixels, e.g. tags to allow an image to be presented as a list, rather than as an array.
    • New tags are needed to report monitor values and other scan dependent overall image data.
    • Tags are needed for versioning.
    • Once agreement is achieved on the defaults for the required minimum set of tags, those defaults would be included in the dictionary.
    H. J. Bernstein agreed to coordinate the efforts in all three groups of tags. It should be noted that a proposal on versioning was included in the presentation at the BSR 2007 workshop:

    For these reasons we propose to add ...variant tags to all imgCIF categories in which an arbitrary variant identifier can be inserted. The variant tag would be added to the key of each category, but would be implicit, with a null (empty) default value, so that it could be omitted until and unless needed. All rows from all categories using the same variant identifier would be considered to be related. All rows with the null variant would be considered related to all rows with any non-null variant except for rows that agree with the row with the null variant in all columns other than the ...variant column. A new VARIANT category would be added identifying the role of each variant (preferred, unsuccessful trial, etc.), giving a time stamp for each variant and giving the variant from which this variant has descended:

    loop_
    _variant.variant
    _variant.role
    _variant.timestamp
    _variant.variant_of
    _variant.details
    . "raw data" 2007-08-03T23:20:00 . .
    indexed "preferred" 2007-08-04T01:17:28 .
    "indexed cell and refined beam center"

    It should be noted that a proposal on handling sparse images using a background-offset-delta compression was posted to the imgCIF list a few days after the workshop.

Participants:

"Justin Anderson" <justin at rayonix dot com> Rayonix, LLC (Formerly Mar USA)
"Georgi Darakev" <darakevg at gmail dot com> Dowling College
"Nikolay Darakev" <darakevn at gmail dot com> Dowling College
"Matthew T. Dougherty" <matthewd at bcm.tmc dot edu> National Center for Macromolecular Imaging
"Kevin Eliceiri" <eliceiri at wisc dot edu> Laboratory for Optical and Computational Instrumentation
"Frances C. Bernstein" <fcb at bernstein-plus-sons dot com> Bernstein + Sons
"Herbert J. Bernstein" <yaya at bernstein-plus-sons dot com> Dowling College
"Andrew J. Howard" <howard at iit dot edu> Illinois Institute of Technology
"Chris Nielsen" <cn at adsc-xray dot com> ADSC
"Mark Pressprich" <Mark.Pressprich at Rigaku dot com> Rigaku
"Curtis Rueden" <ctrueden at wisc dot edu> Laboratory for Optical and Computational Instrumentation
"Nicholas K. Sauter" <nksauter at lbl dot gov> LBNL
"John Skinner" <skinner at bnl >b>dot gov> Brookhaven National Laboratory
"Robert M. Sweet" <sweet at bnl dot gov> Brookhaven National Laboratory

Travel reimbursements

For those for whom a budget for travel reimbursement has been agreed to, please use this travel form

Post-workshop Dinner Participants (face to camera, left to right) Kevin Eliceiri, Curtis Rueden, Matt Dougherty, Robert Sweet, Justin Anderson, Andy Howard, (back to camera, left to right), Nick Sauter, Frances Bernstein. Herbert Bernstein behind the camera
Andy Howard Nick Sauter
Chris Nielsen Matt Dougherty and Chris Nielsen
Chris Rueden