{MJC_Logo}

Program LND for log-normal decomposition of particle size distributions

Miroslaw Jonasz

Table of contents

Introduction

This program decomposes a frequency particle size distribution into a sum of log-normal functions (components). It does this by scanning a particle size distribution data set with a window of variable width looking for log-normal components of the distribution at each of the possible locations of the window.

Once the components are found and their coefficients are determined, the user can select the best set of coefficients of log-normal components manually from the program output file. A companion program LNDVIEW makes it easy to select a set of components, as well as view and print a graph of log-normal components for a size distribution.

The original version of this program was used in research described by Jonasz and Fournier (1996). See also Jonasz and Fournier (1999) for an erratum and additional results.

This program is intended to be used in the Windows environment. A screen resolution of 1024 x 768 is expected.

The LND program can be modified to allow decomposition of a size distribution or similar data set into components other than log-normal. Please contact MJC if you need such modification(s).
[Top]

Quick start

The shipment disk/package contains particle size distribution data files in the Data directory to enable you to run and experiment right away with the LND program (in the LND directory) in both in the batch mode and single-file modes of operation. These data files may include:

  • Brabal5178mat.psd
    data from the Baltic Sea, data kindly provided by Dr. K. Bradtke, Gdansk University, Poland
  • Jonbal7821.psd
    data from the Baltic Sea, obtained by Dr. M. Jonasz
  • Kraatl8601.psd
    data from the Nortwestern Atlantic, kindly provided by Dr. K. Kranck and Dr. T. Milligan, Bedford Institute of Oceanography, Canada

The package also contains the Results directory with LND files (results files) corresponding to these data files.

If you received a ZIPped package, the default directory structure is created automatically and files are copied to correct folders on unzipping the package (provided that the default directory structure encoded in the ZIPped package does not conflict with the directory structure existing in your computer). Please note that this directory structure is set up merely for convenience and is not e prerequisite for running the LND program:

LND directory
Contains the LND program (and other companion programs that you may have ordered)

Help
Contains the LndHelp.htm file and related files. The LndHelp file contains hyperlinked help information that can be opened and viewed with any Internet browser

Data sub-directory
Contains sample data files

Results sub-directory
Contains sample results files

To run the LND program in the batch mode, locate the "Run" group in the main form of the program and

  • click either "Use all files" (which causes the program to process all PSD files in the directory that you select via an Open dialog) or "Select files" (processes selected PSD files only)
  • click the "Batch" button

To run the LND program in the single file mode, click the "Single file" button in the "Run" group and select a PSD file.

[Top]

User interface

This section is intended only for a brief discussion of the user interface itself. Please refer to the remaining part of the document for terms that are not explained in this section.

"Scan window parameters" group

This parameter group defines the data scan window in terms of the number of data points, where each point is defined by the particle diameter D and FD (frequency size distribution). The two parameters of this group define the start width range of the scan window that is used by the LND algorithm to find log-normal components of the size distribution.

"Initial start width" edit
This editable parameter specifies the initial start window width that is used when scanning the data set for log-normal components.

"Final start width"
This parameter is not editable. It equals the initial start window width plus an arbitrary value selected during the algorithm development. The final start width marks the end of the range of the start width of the scan window.
[Top]

"View" group

"Input"check box
When this checkbox is checked, the program displays both the original input data set and a set of their logarithms, each in a separate window.

"Results"check box
When this checkbox is checked, the program displays the LND results, i.e. the parameters of the various sets of log-normal components, as well as some additional data. The results are displayed in a separate window. The results can be copied to a Windows-based program by clicking the "Copy" button of that window.
[Top]

"Run" group

This parameter group enables one to select the processing mode: batch or single file.

Batch mode

In the batch mode, accessible by clicking the "Batch" button, the program obtains the input data from the PSD files subject to the following limitations:

  • The diameter column must be the first column in the file
  • Only those rows that contain purely numerical data are included in the input data set

Results are saved automatically (if possible) in this mode. The names of the results files (LND files) are auto-generated by the program. In the batch mode, two options are available: "Use all files" and "Select files". These options are selected by clicking the "Use all files" or "Select files" radio buttons respectively.

"Use all files"
If this radio button is checked, then all files in a directory are selected for processing. The directory in question is that of a sample file that needs to be selected via an Open dialog, which is displayed on clicking the "Batch" button of this group.

"Select files"
If this radio button is checked, only selected files in a directory are processed. The files are selected via an Open dialog that is displayed on clicking the "Batch" button of this group.
[Top]

The single-file mode

In the "Single file" mode, a single file is opened via an Open dialog that displays when the "Get data" button is clicked in the "Get PSD data" form. That latter form opens when the "Single file" button in the "Run" group is clicked. Results can be viewed, copied, and saved to a LND file. The program provides a suggestion for the results file name in a standard Windows file save dialog.
[Top]

Error log

If log-normal components cannot be generated (bad data file, no components found, or any other reason), the relevant data files are listed in the error log window along with the potential error source in the following format:

input file name > error message

No LND file is created for an input data file that had caused errors.
[Top]

"Processing file" info group

This info group displays:

  • the name of the currently-processed file
  • the initial and current scan window widths
  • the current scan window position
  • the number of components found during the current scan

[Top]

Miscellaneous items

"Help" button
Clicking this button displays info on how to open this present help document.

"About" button
Clicking this button displays the program info.

[Top]

Log-normal decomposition (LND)

The LND algorithm

The program decomposes a frequency particle size distribution FD(D) represented by a data array [D, FD(D)], into a sum of log-normal components, each represented by a parabola:

 log(FD) = b0 + b1 log(D) + b2 [log(D)]2  (1)

The coefficients b0, b1, and b2 of each component are determined through an iterative application of the least-squares procedure. Detailed description of the fitting algorithm can be found in (Jonasz and Fournier 1996)

The frequency size distribution, FD(D) [cm-3*µm-1] is defined as follows:

 dN = FD(D) dD  (2)

where dN is the number concentration of particles with "diameters" in a range of D to D + dD. It is the derivative of the cumulative size distribution [cm-3] and describes, as a function of the particle size, the number concentration of particles per unit size range.

In short, the fit is performed in two steps as follows:

  1. AVERAGE FIT
    All data are used to calculate the average log-normal fit if it exists
  2. SCAN FIT
    Sets of components are searched for by sequentially scanning the data set with a window of an increasing width.

Apart from the average fit, a set of log-normal components is thus produced for each start scan window width. This width is represented by the minpts parameter in the program code. A log-normal component is considered to exist if coefficient (b2) at the second power of logD is negative ( the parabola has a maximum ). Beginning with version 2.02, this condition applies only to the log-normal components' fits, not to the average fit, where the b2 can assume any value.

The minimum width of a detectable log-normal component, the initial start width of the scan window is set by the content of the initial start width edit. The default initial minimum width is set to 4 data points and the maximum initial width is always set to the minimum width plus 8.

For each value of the minimum scan window width, a set of components may be discovered by the program. Such a process is termed a scan series. A scan series produces components with the "characteristic width" ranging from that corresponding to the start width of the scan window for that series to that corresponding to the final start window width (a default of start width plus 8 points). Thus, for example, a scan series for the start window width of 5 data points may contain components with the width range of 5 to 13 pts. After the scan series is finished, the start window width is incremented and another scan series begins with the scan window width ranging from the new initial start width to the number of data points. The last scan series begins with the start window width set to the final start width

Several components may be found during a scan series. The program selects the component that produced the lowest approximation error to the data within its window (whose position and width may be different for each component found during the scan series). Once a component is found, it is subtracted from the current version of the input data (i.e. the particle size distribution), with the difference becoming the new version of the size distribution, presumed to contain other log-normal components whose presence was "obscured" by the dominant component that has just been identified. This new version of the size distribution is then processed in the same way as was the previous version. Only those difference data points are included in the new version of the particle size distribution whose FD value is greater than the old data point value by a factor named fract in the program code (with a value of 1E-10).

The initial start width of the scan window determines the characteristic width of log-normal components that can be detected. For example, if the size distribution contains a component with a size scale of 4 data points at your size grid, this component will most likely be undetected in scans beginning with the window width > 4. The absolute meaningful minimum scan window width is 3 points, i.e. the number of points required to fit a parabola (log-normal function in log-log scale) to data. Too fine a value of the initial scan window width might result in fitting the "noise". Too large a value might miss some small-width components.

Following the completion of a scan series, the initial start window width is incremented and the new scan series begins. For all values of the initial scan window width from the range defined in the "Scan window parameters" group, one obtains a set of scan series, each with a characteristic fit error by the sum of components that were found during that scan series. A fit which yields the lowest error can now be selected. Such a selection can be performed manually or by a simple utility program processing the output files generated by LND.

Particle size scale and the component width

It should be stressed that the scan window width settings may affect the decomposition of a size distribution into log-normal components. The default values of these settings have been arrived by trial-and-error during analysis of size distributions from a database that we compiled. That database contained mostly size distributions specified at a size grid equidistant in the log-size scale. This size grid of the particle diameter, D, can be defined as follows:

D0
D1 = D0 a
D2 = D1 a = D0 a2 ...

where the constant a is on the order of 21/3. In this case, we have, for example:

D0 = 2 micrometers (µm)
D1 ~ 2.5 µm
D2 ~ 3.2 µm
D3 ~ 4.0 µm ...

This particle size scale has been frequently used in environmental sciences, such as oceanography (for example, Sheldon et al. 1972), where the particle size range typically spans several decades.

The window width measurement in points rather than in length units (for example, micrometers) was selected for simplicity that is generally justified by the presentation conventions of the marine size distributions. If your data are denser that the size grid just discussed implies, you need to select the minimum window width that reflects your diameter data spacing. For example, if the size increment is 0.1 µm, 12 data points span a size range of 1.2 µm. Thus, if the expected minimum component width is about 1 µm, then the initial start window width in data points should be about 8 instead of 4 (the default value).

[Top]

Statistical significance of components

The LND program performes a statistical test (Fisher test, for example, Hudson 1964) of the significance of the various components within each scan group. The Fisher test examines the equivalence of variances, here represented by the approximation errors, expressed by the sum of squares of residuals:

 errorj = SUMi = 1, ..., N [ (FDi - FDapprox, i ) / FDi ]2  (3)

where N is the data count, and FDapprox, i is the sum of log-normal components. This is an approximation of the "correct" expression for the error:

 errorj = SUMi = 1, ..., N [ log(FDi) - log(FDapprox, i ) ]2  (4)

that is implied by the way the fit is calculated (by using the log-log transform of the original data set). Approximation (3) is valid for small values of the error, in which we are interested: if the error is large, we are not going to consider the fit anyway. The approximation is used for the calculation speed and also to avoid potential singularity problem with taking a logarithm of FDapprox when the latter (potentially) evaluates to a machine zero value.

The significance of components versus each other and vs. the average fit is found for a scan series as follows. At first, the components are ordered according to their ability to remove the approximation error as follows:

  1. select a component that removes most of the error
  2. select a component that, when added to the previously selected components, removes most of the error
  3. perform step 2 until no unused component is left.

The significant components are chosen by successive applications of the Fisher test to approximations of the log-transformed data by cumulative components subsets within a scan series, i.e. first to the approximation by component 1 vs. approximation by components: 1 and 2, then to approximation by components: 1 and 2, vs. approximation by components: 1, 2, and 3, and so on. Finally, the best components selection's significance is tested versus the average fit by using the Fisher test again. These tests are performed by using the data and components expressed in the log-log scale.

The significance of a set of components for a scan series is not evaluated against the significance of components for another scan series.

Sample results

Fig.1 shows a sample log-normal decomposition of a frequency particle size distribution.

{short description of image}

Fig. 1. An example of the decomposition of a frequency particle size distribution into a set of log-normal components for data from file Kraatl8601.psd (the Northwestern Atlantic, data kindly provided by Dr. K. Kranck and Dr. T. Milligan, Bedford Institute of Oceanography, Canada).

[Top]

The file system

PSD (input) files

The LND program works in the single-file and batch modes. In the single-file mode, the user supplies the names of the PSD data file, selects the diameter and PSD columns in that data file, the manner in which the input data are read, and supplies the name of the results file. In the batch mode, the user only selects files to be processed or elects to process all files in a directory and the program decides how to obtain the data form these files: empty lines or lines in the file that contain text, which cannot be converted to columns of numbers are ignored. This is also the default setting for the single-file mode.

The input data must be available as the space, tab, or comma-delimited text data files (PSD files, extensions: "psd" and "txt" in the single file mode, extension "psd" in tha batch mode). Each PSD file is expected to contain the frequency particle size distribution data. The program does not check whether the data provided in a PSD file are indeed those of such a distribution.

The structure of a PSD file is as follows:

D FD
D FD
...
D FD

where D is the particle diameter and FD is the frequency particle size distribution

Several size distributions can be included in the same data file, all sharing the same particle diameter grid (the 1st column of the file). The units are irrelevant, however, for the proper viewing with a companion spreadsheet program (LNDVIEW), the units should be as follows: particle "diameter", D, unit = µm, particle size distribution, FD, unit = cm-3 µm-1 Note that 1 µm = 10-6 m.

A sample PSD file (JONATL7821.PSD):

[beginning of file]
2.5 6415.6 4731.6
3 2928.3 2783.0
4 899.3 1107.7
6 212.6 284.5
7 121.7 164.5
8 54.8 69.3
9 41.6 43.9
10 30.4 25.2
11 13.4 7.3
12 6.2 13.1
13 4 12.5
14 3.4 4.6
15 1.9 2
16 1.1 1.4
18 1 1
20 0.6 0.6
22 0.4 0.4
24 0.5 0.4
26 0.2 0.2
[end-of-file]

LND (results) files

The output (i.e. LND) results can be stored to text files with the following structures:

Start section
LND VER version number

Components' parameters' section
minpts, epdfnlcpt, fit type, nlcpt, not used
b01, b02, ..., b0 nlcpt
b11, b12, ..., b1 nlcpt
b21, b22, ..., b2 nlcpt
s21,s22, ..., s2nlcpt
epdf1, epdf2 ..., epdfnlcpt
sb01, sb02 , ..., sb0 nlcpt
sb11, sb12 , ..., sb1 nlcpt
sb21, sb22 , ..., sb2 nlcpt

where the first index in the two-dimensional arrays denotes the log-normal parameter number (0, 1, and 2), and:

  • minpts = start scan window width (all data points for the average fit). If the average fit is better than the fit by components obtained for a given start window width, the average fit is listed instead, i.e. minpts = total number of data and nlcpt = 0).
  • epdfj = square root of the approximation error per degree of freedom, by a sum of j components (epdfnlcpt is the value of that error by a sum of all components) for a given minpts
  • fit type = 0 (average fit), 1 (scan fit), 2 (no fit possible)
  • nlcpt = number of components
  • bij = i-th coefficient of the j-th component
  • s2j = goodness of fit statistics of the j-th component to this component's fitting data (that may be different than the input [D, FD(D)] data (see the LND algorithm). Here it is a sum of squared residuals per degree of freedom for a component's fit to its window's data. The value of this parameter gives an idea about the quality of that fit, relative to that of fits by other components within their respective data windows.
  • sbij = standard deviation of the i-th coefficient of the j-th component

The components' parameters' section is repeated for each scan series. Note that if the average is, according to the Fisher test, statistically equivalent to a component's fit, then the "simpler" average fit is listed in the components' parameter's section.

End section
name of file with the input data (infile)
name of file (LND) with the output data (outfile)
number of the column with FD (frequency distribution of particle sizes) in the infile
fract
not used
D FD
D FD
...
D FD

where

  • fract indicates whether a D, FD pair is retained for further processing after the FD has been assigned the value of the difference between the input value before a log-normal component was found and the value of the log-normal component at this diameter D. If the relative value of this difference (FDold-FDcomponent)/FDold is smaller than an arbitrarily set fract value, the D, FD pair is not included in the particle size distribution to be used for the next component's search.

In the single-file mode, the name of a results file is set by the user, although the program proposes a default file name according to a rule applicable to the batch mode. In the batch mode, the names of the results files are generated by the program as follows:

  • a string "_c" where, c is the number of the column with the FD in the PSD file is appended to the original file name
  • an extension "psd" is replaced by "lnd"

For example, if the original file name is c:\lnd\data\jonbal7821.psd and the FD data are listed in the 2nd column of that file, then the LND results are stored in c:\lnd\data\jonbal7821_2.lnd

References

Hudson D. 1964. Statistics for physicists. Geneva.

Jonasz M., Fournier G. 1996. Approximation of the size distribution of marine particles by a sum of log-normal functions. Limnol. Oceanogr. 41: 744-754.

Jonasz, M. and Fournier G. F. 1999. Approximation of the size distribution of marine particles by a sum of log-normal functions (Errata: Corrections and additional results. Limnol. Oceanogr. 44: 1358-1358.

Sheldon R. W., Prakash A., Sutcliffe W. H. Jr. 1972. The size distribution of particles in the ocean. Limnol. Oceanogr. 17: 327-340.

Contact info for comments and questions

Please direct your comments and questions regarding this software, as well as questions on other MJC Optical Technology software and services to:

Dr. Miroslaw Jonasz
MJC Optical Technology
217 Cadillac Street
Beaconsfield QC H9W 2W7
Canada

Internet: www.mjcopticaltech.com
e-mail: m.jonasz@mjcopticaltech.com

fax +1 514 695 3315
[Top]

Disclaimer

The information contained in this document is believed to be accurate. However, neither the author nor MJC Optical Technology guarantee the accuracy nor completeness of this information and neither the author nor MJC Optical Technology assumes responsibility for any omissions, and errors, or for damages which may result from using or misusing this information.

Last modified: . Copyright 2000-2011 MJC Optical Technology. All rights reserved.