Program LND for log-normal decomposition of particle size distributions
Table of contents
This program decomposes a frequency particle size distribution into a sum of log-normal functions (components). It does this by scanning a particle size distribution data set with a window of variable width for log-normal components of the distribution at each of the possible locations of the window.
Once the components are found and their coefficients are determined, the user can select the best set of coefficients of log-normal components manually from the program output file. A companion program LNDVIEW makes it easy to select a set of components, as well as view and print a graph of log-normal components for a size distribution.
This program is written in C++ and is intended to be used in the Windows environment. A screen resolution of 1024 x 768, or better, is expected.
The shipment package contains particle size distribution data files in the Data directory to enable you to run and experiment right away with the LND program (in the LND directory) in both in the batch mode and single-file modes of operation. These data files may include:
The package also contains the Results directory with LND files (results files) corresponding to these data files.
If you received a ZIPped package, the default directory structure is created automatically and files are copied to correct folders on unzipping the package (provided that the default directory structure encoded in the ZIPped package does not conflict with the directory structure existing in your computer). Please note that this directory structure is set up merely for convenience and is not e prerequisite for running the LND program:
To run the LND program in the batch mode, locate the "Run" group in the main form of the program and
This section is intended only for a brief discussion of the user interface itself. Please refer to the remaining part of the document for terms that are not explained in this section.
This parameter group defines the data scan window in terms of the number of data points. Each data point is defined by the particle diameter D and FD (frequency size distribution). The two parameters of this group define the start width range of the scan window that is used by the LND algorithm to find log-normal components of the size distribution:
This parameter group enables one to select the processing mode: batch or single file.
Results are saved automatically (if possible) in this mode. The names of the results files (LND files) are auto-generated by the program. In the batch mode, two options are available: "Use all files" and "Select files". These options are selected by clicking their respective radio buttons.
The single-file mode
If log-normal components cannot be generated (bad data file, no components found, or any other reason), the relevant data files are listed in the error log window along with the potential error source in the following format:
input file name > error message
This info group displays:
The program decomposes into a sum of log-normal components a frequency particle size distribution FD(D), where D is the particle size. That distribution is represented by a data array [D, FD(D)], . Each of these components is represented by a parabola:
The coefficients b0, b1, and b2 of each component are determined through an iterative application of the least-squares procedure. Detailed description of the fitting algorithm can be found in Jonasz and Fournier 1996.
The frequency size distribution, FD(D) [cm-3 µm-1] is defined as follows:
where dN is the number concentration of particles with sizes in a range of D to D + dD. It is the derivative of the cumulative size distribution [cm-3] and describes, as a function of the particle size, the number concentration of particles (i.e. the number of particles per unit volume) per unit size range.
In short, the fit is performed in two steps as follows:
Apart from the average fit, a set of log-normal components is thus produced for each initial scan window width. This width is represented by the minpts parameter in the program code. A log-normal component is considered to exist if coefficient (b2) at the second power of logD is negative ( the parabola, which represents this component in the log-log scale, has a maximum ). Beginning with version 2.02, this condition applies only to the log-normal components' fits, not to the average fit, where the b2 can assume any value.
The initial start width of the scan window is set by the content of the initial start width edit. The default initial start width equals 4 data points.
For each value of the start scan window width, a series of scans is performed. The first scan uses the start window width. Each next scan uses a window that is wider by one point than the window used in the previous scan of this series. The maximum width of that window is the data point count. Such a process is termed a scan series. A scan series may produce components with the "characteristic width" ranging from that corresponding to the start width of the scan window for that series to that corresponding to the maximum window width. For example, a scan series for a particle size distribution having 20 data points and the start window width of 5 data points would use windows whose widths range from 5 to 20 points. As will be explained shortly, the particle size distribution data remaining at the end of the scan series may be quite different than the original input data. After a scan series is finished, the start window width is incremented and another scan series for the original input data begins with the scan window width ranging from the new start window width. The last scan series begins with the start window width set to the final start width equal to the initial start window width plus 8 points.
Several components may be found during a scan series. The program selects the component that produced the lowest approximation error to the data within the window with which that component was discovered. The position and width of that window may be different for each component found during the scan series. Once a component is found, it is subtracted from the current version of the input data (i.e. the particle size distribution). The difference becomes the new version of the size distribution, presumed to contain other log-normal components whose presence was "obscured" by the dominant component that has just been identified. This new version of the size distribution is then processed in the same way as was the previous version. Only those difference data points are included in the new version of the particle size distribution whose FD value is greater than the old data point value by a factor named fract in the program code (with a value of 1E-10).
The initial start width of the scan window determines the minimum characteristic width of log-normal components that can be detected. For example, if the size distribution contains a component with a size scale of 4 data points at your size grid, this component will most likely be undetected in scans beginning with the window width > 4. The absolute meaningful minimum scan window width is 3 points, i.e. the number of points required to fit a parabola (log-normal function in log-log scale) to data. Too fine a value of the initial scan window width might result in fitting the "noise". Too large a value might miss some small-width components.
For all values of the initial scan window width from the range defined in the "Scan window parameters" group, one obtains a set of scan series, each with a characteristic approximation error of the particle size distribution by the sum of components that were found during that scan series.
One can now review the sets of log-normal components discovered by the program and select a set. Such a selection can be performed manually or by a simple utility program processing the results files generated by LND. If the purpose of the user is to merely approximate the particle size data, the set of components that provides the lowest approximation error might most likely be selected. However, if the purpose is to identify component populations of the particles, then physics insight might influence the selection of the best set of the log-normal components of the particle size distribution.
It should be stressed that the scan window width settings may affect the decomposition of a size distribution into log-normal components. The default values of these settings result from trial-and-error analysis using size distributions from a database that we compiled. That database contained mostly size distributions specified at a size grid equidistant in the log-size scale. This size grid of the particle diameter, D, can be defined as follows:
where the constant a is on the order of 21/3. In this case, we have, for example:
D0 = 2 micrometers (µm)
This particle size scale has been frequently used in environmental sciences, such as oceanography (for example, Sheldon et al. 1972), where the particle size range typically spans several decades.
The window width measurement in points rather than in length units (for example, micrometers) was selected for simplicity that is generally justified by the presentation conventions of the marine size distributions. If your data are denser that the size grid just discussed implies, you need to select the minimum window width that reflects your diameter data spacing. For example, if the size increment is 0.1 µm, 12 data points span a size range of 1.2 µm. Thus, if the expected minimum component width is about 1 µm, then the initial start window width in data points should be about 8 instead of 4 (the default value).
The LND program performes a statistical test (Fisher test, for example, Hudson 1964) of the significance of the various components within each scan group. The Fisher test examines the equivalence of variances, here represented by the approximation errors, expressed by the sum of squares of residuals:
where N is the data count, and FDapprox, i is the sum of log-normal components. This is an approximation of the "correct" expression for the error:
that is implied by the way the fit is calculated (by using the log-log transform of the original data set). Approximation (3) is valid for small values of the error, in which we are interested: if the error is large, we are not going to consider the fit anyway. The approximation is used for the calculation speed and also to avoid potential singularity problem with taking a logarithm of FDapprox when the latter (potentially) evaluates to a machine zero value.
The significance of components versus each other and vs. the average fit is found for a scan series as follows. At first, the components are ordered according to their ability to remove the approximation error as follows:
The significant components are chosen by successive applications of the Fisher test to approximations of the log-transformed data by cumulative components subsets within a scan series, i.e. first to the approximation by component 1 vs. approximation by components: 1 and 2, then to approximation by components: 1 and 2, vs. approximation by components: 1, 2, and 3, and so on. Finally, the best components selection's significance is tested versus the average fit by using the Fisher test again. These tests are performed by using the data and components expressed in the log-log scale.
The significance of a set of components for a scan series is not evaluated against the significance of components for another scan series.
Fig.1 shows a sample log-normal decomposition of a frequency particle size distribution.
Fig. 1. An example of the decomposition of a frequency particle size distribution into a set of log-normal components for data from file Kraatl8601.psd (the Northwestern Atlantic, data kindly provided by Dr. K. Kranck and Dr. T. Milligan, Bedford Institute of Oceanography, Canada).
The LND program works in the single-file and batch modes. In the single-file mode, the user supplies the names of the PSD data file, selects the diameter and PSD columns in that data file, the manner in which the input data are read, and supplies the name of the results file. In the batch mode, the user only selects files to be processed or elects to process all files in a directory and the program decides how to obtain data from these files: empty lines or lines contain text which cannot be converted to numbers are ignored. This is also the default setting for the single-file mode.
The input data must be available as the space, tab, or comma-delimited text data files (PSD files, extensions: "psd" and "txt" in the single file mode, extension "psd" in tha batch mode). Each PSD file is expected to contain the frequency particle size distribution data. The program does not check whether the data provided in a PSD file represent such a distribution.
The structure of a PSD file is as follows:
where D is the particle size and FD is the frequency particle size distribution
Several size distributions can be included in the same data file, if these distributions share the same particle diameter grid (the 1st column of the file). The units are irrelevant, however, for the proper viewing with a companion spreadsheet program (LNDVIEW), the units should be as follows: particle size, D, unit = µm, particle size distribution, FD, unit = cm-3 µm-1. Note that 1 µm = 10-6 m.
A sample PSD file (JONATL7821.PSD):
[beginning of file]
The output (i.e. LND) results can be stored to a text file. The structure of such a file is as follows:
In the single-file mode, the name of a results file is set by the user, although the program proposes a default file name according to a rule applicable to the batch mode. In the batch mode, the names of the results files are generated by the program as follows:
For example, if the original file name is c:\lnd\data\jonbal7821.psd and the FD data are listed in the 2nd column of that file, then the LND results are stored in c:\lnd\data\jonbal7821_2.lnd
Hudson D. 1964. Statistics for physicists. Geneva.
Jonasz M., Fournier G. 1996. Approximation of the size distribution of marine particles by a sum of log-normal functions. Limnol. Oceanogr. 41: 744-754.
Jonasz, M. and Fournier G. F. 1999. Approximation of the size distribution of marine particles by a sum of log-normal functions (Errata: Corrections and additional results. Limnol. Oceanogr. 44: 1358-1358.
Sheldon R. W., Prakash A., Sutcliffe W. H. Jr. 1972. The size distribution of particles in the ocean. Limnol. Oceanogr. 17: 327-340.
Please direct your comments and questions regarding this software, as well as questions on other MJC Optical Technology software and services to:
The information contained in this document is believed to be accurate. However, neither the author nor MJC Optical Technology guarantee the accuracy nor completeness of this information and neither the author nor MJC Optical Technology assumes responsibility for any omissions, and errors, or for damages which may result from using or misusing this information.
Copyright 2000-2017 MJC Optical Technology. All rights reserved.