3.3.3.4. Atom Probe Microscopy

Introduction

For the use case atom probe tomography the community contributed not only the NXapm application definition. It was also explored how using several instances of NXprocess is useful for documenting the many data processing steps that are typical in atom probe research to investigate structural features of the material demanding reconstructions, i.e., models of the crystal and defect network by analyzing the collective position data of atoms. The following definitions are a summary of the status quo how NeXus can be used for documenting these processing steps to improve numerical reproducibility and assist researchers with documenting procedural aspects of their data analysis workflows.

Base Classes

The processing steps of ranging and reconstructing are documented as two specializations of NXprocess:

NXapm_ranging

Metadata to ranging definitions made for a dataset in atom probe microscopy.

NXapm_reconstruction

Metadata of a dataset (tomographic reconstruction) in atom probe microscopy.

Spatial or other type of filters which are frequently used for atom probe to select specific atom positions or portions of the data based on isotopic identity are modeled as base classes for filters, which are defined atom-probe-agnostic empower reuse:

NXdelocalization

Base class to describe the delocalization of point-like objects on a grid.

NXisocontour

Computational geometry description of isocontouring/phase-fields in Euclidean space.

NXmatch_filter

Base class to filter ions based on their type or other descriptors like hit multiplicity.

NXspatial_filter

Base class to filter based on position. This base class takes advantage of NXcg_ellipsoid,

NXcg_cylinder, NXcg_hexahedron

Base classes to describe commonly used geometric primitives (not only) in atom probe. The primitives are used for defining the shape and extent of a region of interest (ROI) NXroi_process of material.

NXsubsampling_filter

Base class for a filter that can also be used for specifying how entries like ions can be filtered via sub-sampling.

Tools and applications in APM

There exist several research software tools in the APM community that deal with handling and analyzing APM data.

One of these is the paraprobe-toolbox The software is developed by M. Kühbach et al..

The paraprobe-toolbox is an example of an open-source parallelized software for analyzing point cloud data, for assessing meshes in 3D continuum space, and for studying the effects of parameterization on descriptors of micro- and nanoscale structural features (crystal defects) within materials when characterized and studied with atom probe.

There is a set of contributed application definitions describing each computational step in the paraprobe-toolbox. These were added to describe the whole workflow in this particular software, but can also act as a blueprint for how computational steps of other software tools (including commercial ones) could be developed further to benefit from NeXus.

The need for a thorough documentation of the tools was motivated by several needs:

First, users of software would like to better understand and also be able to study for themselves which individual parameters and settings for each tool exist and how configuring these affects analyses quantitatively. This stresses the aspect how to improve documentation.

Second, scientific software like paraprobe-toolbox implement numerical/algorithmical (computational) workflows whereby data coming from multiple input sources (like previous analysis results) are processed and carried through more involved analyses within several steps inside the tool. The tool then creates output as files. This provenance and workflow should be documented.

Individual tools of paraprobe-toolbox are developed in C/C++ and/or Python. Provenance tracking is useful as it is one component and requirement for making workflows exactly numerically reproducible and thus to enable reproducibility (the “R” of the FAIR principles of data stewardship).

For tools of the paraprobe-toolbox each workflow step is a pair or triple of sub-steps: 1. The creation of a configuration file. 2. The actual analysis using a given Python/or C/C++ tool from the toolbox. 3. The optional analyses/visualization of the results based on data in NeXus/HDF5 files generated by each tool.

Data and metadata between the tools are exchanged with NeXus/HDF5 files. This means that data inside HDF5 binary containers are named, formatted, and hierarchically structured according to NeXus application definitions.

In a refactoring project, within the FAIRmat project, which is part of the German National Research Data Infrastructure, the tools of the paraprobe-toolbox were modified to read from and write data using NeXus application definitions.

For example the application definition NXapm_paraprobe_surfacer_config: specifies the expectation how a configuration file for the paraprobe-surfacer tool is formatted and which parameters it contains including optionality and cardinality constraints.

Thereby, each config file uses a controlled vocabulary of terms. The config files store SHA256 checksum for each input file, thereby implementing an uninterrupted provenance tracking chain documenting the computational workflow.

As an example, a user may first range their reconstruction and then compute spatial correlation functions. The config file for the ranging tool stores the files which hold the reconstructed ion position and ranging definitions. The ranging tool generates a results file with the labels of each molecular ion. This results file is formatted according to the tool-specific results application definition. The generated results file and the reconstruction is imported by the spatial statistics tool which again keeps track of all files and reports its results in a spatial statistics tool results file.

This design makes it possible to rigorously trace which numerical results were achieved with specific inputs and settings using specifically-versioned tools. Noteworthy, this includes Y-junction on a graph which is where multiple input sources are combined to generate new results.

Defining, documenting, using, and sharing application definitions is a useful and future-proof strategy for software development and data analyses as it enables automated provenance tracking working silently in the background.

In summary, the following application definitions were defined for the paraprobe-toolbox. These are always pairs of application definitions — one for the configuration (input) side and one for the results (output) side. For each tool one such pair is proposed:

Application Definitions

NXapm_paraprobe_ranger_config, NXapm_paraprobe_ranger_results

Configuration and results respectively of the paraprobe-ranger tool. Apply ranging definitions and explore possible molecular ions. Store applied ranging definitions and combinatorial analyses of possible iontypes.

NXapm_paraprobe_surfacer_config, NXapm_paraprobe_surfacer_results

Configuration and results respectively of the paraprobe-surfacer tool. Create a model for the edge of a point cloud via convex hulls, alpha shapes, or alpha-wrappings. Store triangulated surface meshes of models for the edge of a dataset.

NXapm_paraprobe_distancer_config, NXapm_paraprobe_distancer_results

Configuration and results respectively of the paraprobe-distancer tool. Compute and store analytical distances between ions to a set of triangles.

NXapm_paraprobe_tessellator_config, NXapm_paraprobe_tessellator_results

Configuration and results respectively of the paraprobe-tessellator tool. Compute and store Voronoi cells and properties of these for all ions in a dataset.

NXapm_paraprobe_selector_config, NXapm_paraprobe_selector_results

Configuration and results respectively of the paraprobe-selector tool. Defining complex spatial regions-of-interest to filter reconstructed datasets. Store which points are inside or on the boundary of complex spatial regions-of-interest.

NXapm_paraprobe_spatstat_config, NXapm_paraprobe_spatstat_results

Configuration and results respectively of the paraprobe-spatstat tool. Compute spatial statistics on the entire or selected regions of the reconstructed dataset.

NXapm_paraprobe_nanochem_config, NXapm_paraprobe_nanochem_results

Configuration and results respectively of the paraprobe-nanochem tool. Compute delocalization, iso-surfaces, analyze 3D objects, composition profiles, and mesh interfaces.

NXapm_paraprobe_clusterer_config, NXapm_paraprobe_clusterer_results

Configuration and results respectively of the paraprobe-clusterer tool. Compute cluster analyses with established machine learning algorithms using CPU or GPUs.

NXapm_paraprobe_intersector_config, NXapm_paraprobe_intersector_results

Configuration and results resepctively of the paraprobe-intersector tool. Analyze volumetric intersections and proximity of 3D objects discretized as triangulated surface meshes in continuum space to study the effect the parameterization of surface extraction algorithms on the resulting shape, spatial arrangement, and colocation of 3D objects via graph-based techniques.

Joint work German NFDI consortia NFDI-MatWerk and FAIRmat

Members of the FAIRmat and the NFDI-MatWerk consortia of the German National Research Data Infrastructure are working together within the Infrastructure Use Case IUC09 of the NFDI-MatWerk project to work on examples how software tools in both consortia become better documented and interoperable to use. Within this project, we have also added the CompositionSpace tool by A. Saxena et al. that has been developed at the Max Planck Institute for Sustainable Materials in Düsseldorf

NXapm_compositionspace_config, NXapm_compositionspace_results

Results of a run with Alaukik Saxena’s composition space tool.