Next: The YODA Interpreter Up: Introduction Previous: Introduction

Why YODA ?

In general, data analysis for a small- to medium sized experiment in modern nuclear or particle physics can roughly be divided into two steps.

First, operations on single (physical) events are performed to transform raw data coming from the digitizing front-end electronics into physically meaningful data, possibly condense the information, do data reduction and so on. This means typically performing the same operations repeatedly on large amounts of sometimes heterogeneously structured data packets. This calculated and condensed information is then filled into statistical distributions, like histograms or x/y plots.

These statistical distributions are -- in a second step -- viewed on a graphics display, compared with pre-calculated distributions, integrated, projected, ''fitted'' with analytical functions, plotted and so on.

There are already some program packages available which satisfy the needs for the second step, more or less. First level data access, however, is normally done in an experiment dependent, hardcoded way. An executable program usually ''knows'' about the internal structure of the experimental data and does fixed calculation steps to produce histograms of primary data as well as more physical values (e.g. tracks, angles and momenta).

This traditional approach is sometimes necessary and natural, consider e.g. programs for event-displays or ''number-crunching'' algorithms. It also provides the fastest way of data processing in general. But, access to data which was not planned in advance (when the program was written), is usually cumbersome -- if not impossible --, and almost always requires a local expert. To add or change functionality, the program has to be stopped, edited, recompiled, tested, probably changed again and so on until the desired changes are built in. For nontrivial programs, this may result in long turnaround times, making changes during ''online'' analysis impossible. Code for physical analysis sometimes ''lives'' close to code which implements the technical part of data access. As a consequence, sideeffects of user-written code may introduce subtle bugs into the analysis, which may become hard to track down. Finally, such a program is inherently non-portable to a different experiment with other requirements and is normally closely tight to the experimental setup.

On the other hand, first-level data analysis often needs the flexibility of a programming language. The calculations to be performed are highly dependent on the special experimental needs and may be of high (and increasing) complexity.

In general, arbitrary arithmetic operations must be performed, iterations on data (loops) must be possible, and logic decisions (if/else) must be made. In order to gain structure, the calculation should be broken into smaller units (subroutines and functions). Furthermore, calls for histogramming and plotting of both plain input data and calculated quantities must be available. ''Cuts'', which mean certain selection criteria on the data are typical tools for preparing physically meaningful information.

When dealing with physical analysis, the user should not need to worry about how to access the data. Instead, he or she should be provided with an data interface as simple and convenient as possible. If feasible, the user should be aided by mnemonic names for the data instead of abstract bunches of bits and bytes, and should be totally shielded from the inner details of event formats. This would also protect the data from being accidentally changed by the user.

A possible solution to the problem could be given by a dedicated, interpretative programming environment. It should be reduced to the physicist's experimental needs, while retaining the useful features of traditional programming languages. This programming environment should provide a well-defined and convenient interface to the experimental data. Analysis routines should be changeable ''on the fly'' to reflect the user's ideas instantaneously. Modifying sources should require no expert knowledge.

Unfortunately, this is not (yet) possible within the frame of an existing programming languages, but requires a ''new', interpretative language with an interface to experimental data.

Next: The YODA Interpreter Up: Introduction Previous: Introduction

Heiko Rohdjess
2001-07-19