SoftNet-Consult Java Utility Library

com.softnetConsult.utils.math
Class Distribution<T extends java.lang.Number>

java.lang.Object
  extended by com.softnetConsult.utils.math.Distribution<T>
Type Parameters:
T - The type of values of which the distribution is being analysed.

public class Distribution<T extends java.lang.Number>
extends java.lang.Object

This class implements a sample distribution; it is a useful tool when undertaking a statistical analysis of value producing processes: it counts the number of times each value is observed (method observe(Number)) and can be used conveniently for obtaining the number of observations within a certain range, discretising the observed sample, calculating mean, variance and other sample properties, printing and plotting the distribution, and so on.
Where possible, the methods provided by the class type of the sample values are used directly, for other computations the values are converted to double; be careful as this might have consequences if the type of the sampled variable cannot be exactly converted to double.

This product includes software developed by the SoftNet-Consult Java Utility Library project and its contributors.
(http://java-tools.sourceforge.net)
Copyright (c) 2007-2008 SoftNet-Consult.
Copyright (c) 2007-2008 G. Paperin.
All rights reserved.

File: Distribution.java
Library API version: "2.02"
Java compliance version: "1.5"

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following terms and conditions are met:

1. Redistributions of source code must retain the above acknowledgement of the SoftNet-Consult Java Utility Library project, the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above acknowledgement of the SoftNet-Consult Java Utility Library project, the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. All advertising materials mentioning features or use of this software or any derived software must display the following acknowledgement:
This product includes software developed by the SoftNet-Consult Java Utility Library project and its contributors.

THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Version:
"2.02"
Author:
Greg Paperin (http://www.paperin.org)

Field Summary
private static java.util.Comparator<java.lang.Number> comparator
          A comparator object for all Number types.
private  java.util.TreeMap<T,java.lang.Integer> dist
          Stores the observed sample (MAP: observation -> number of times observed).
private  T maxObservation
          Largest observation in this sample.
private  double mean
          Sample mean.
private  T minObservation
          Smallest observation in this sample.
private static java.lang.String newLine
           
private  long observationCount
          Number of observations in this sample.
private  double variance
          Sample variance.
 
Constructor Summary
  Distribution()
          Creates a new empty distribution sample.
private Distribution(java.util.TreeMap<T,java.lang.Integer> dist)
          Used internally for creating discretised sammples and sub-samples of this distribution.
 
Method Summary
 int countObservations()
          Gets the total number of observations in this sample.
 int countObservations(T observation)
          Gets the number of times the specified observation was encountered.
 int countObservations(T min, T max)
          Gets the total number of times that any observation within the specified range was encountered.
 long countObservationsL()
          Gets the total number of observations in this sample.
 double countProportion(T observation)
          Gets the proportion of the specified observation out of all observations in this sample.
 double countProportion(T min, T max)
          Gets proportion of observations within the specified range out of all observations in this sample.
 Distribution<java.lang.Double> discretise(double interval)
          Discretises this sample into intervals of the specified length.
 Distribution<java.lang.Integer> discretise(int interval)
          Discretises this sample into intervals of the specified length.
 Distribution<java.lang.Double> discretise(T min, T max, double interval)
          Discretises this sample into intervals of the specified length while only considering the observations between min and max (inclusive).

If S = (max - min) / interval, the resulting distribution will contain S observations if (max - min) is not exactly dividable by interval, otherwise it will contain S + 1 observations.
The first observation of the resulting distrinution will be min and the corresponding frequency will be the sum of the frequencies of all observations between min (inclusive) and min + inverval (exclusive) in this original distribution.
The n-th observation of the resulting distrinution will be (min + (n-1) * interval and the corresponding frequency will be the sum of the frequencies of all observations between min + (n-1) * inverval (inclusive) and min + n * inverval (exclusive) in this original distribution.
 Distribution<java.lang.Integer> discretise(T min, T max, int interval)
          Discretises this sample into intervals of the specified length while only considering the observations between min and max (inclusive).

If S = (max - min) / interval, the resulting distribution will contain S observations if (max - min) is not exactly dividable by interval, otherwise it will contain S + 1 observations.
The first observation of the resulting distrinution will be min and the corresponding frequency will be the sum of the frequencies of all observations between min (inclusive) and min + inverval (exclusive) in this original distribution.
The n-th observation of the resulting distrinution will be (min + (n-1) * interval and the corresponding frequency will be the sum of the frequencies of all observations between min + (n-1) * inverval (inclusive) and min + n * inverval (exclusive) in this original distribution.
 Pair<T[],java.lang.Integer[]> getData()
          Gets the data of this distribution as two arrays - one containing the observations, and the other containing the corresponding frequencies.
 Distribution<T> getLogDistribution(double base)
          Returns a new Distribution in which each observation frequency equals to the logarithm of the corresponding observation frequency of this distribution; all resulting non-integer frequencies are rounded to the nearest integer.
 T getMax()
          Gets the largest observation in this sample.
 double getMean()
          Computes the mean of this sample.
 T getMin()
          Gets the smallest observation in this sample.
private static java.lang.String getNewLine()
           
 java.util.Set<T> getObservations()
          Returns an unmodifiable Set view of the observations contained in this distribution sample.
 double getStdDeviation()
          Computes the standard deviation of this sample.
 java.lang.String getStringLiveGraphPlot()
          This method creates a string that - if saved to a file - can be loaded by the LiveGraph-plotter in order to plot this dirtribution sample.

The LiveGraph plotter framework is an open-source project written in Java available from http://www.live-graph.org.

The string returned by this method encodes all observations of this sample.
 java.lang.String getStringLiveGraphPlot(java.lang.String info)
          This method creates a string that - if saved to a file - can be loaded by the LiveGraph-plotter in order to plot this dirtribution sample.

The LiveGraph plotter framework is an open-source project written in Java available from http://www.live-graph.org.

The string returned by this method encodes all observations of this sample.
 java.lang.String getStringLiveGraphPlot(java.lang.String[] infos)
          This method creates a string that - if saved to a file - can be loaded by the LiveGraph-plotter in order to plot this dirtribution sample.

The LiveGraph plotter framework is an open-source project written in Java available from http://www.live-graph.org.

The string returned by this method encodes all observations of this sample.
 java.lang.String getStringStats()
          Creates a String that contains some basic statistical information about this sample, such as the number of observations, mean, variance and standard deviation.
 java.lang.String getStringTable()
          This is equivalent to getStringTable(true).
 java.lang.String getStringTable(boolean extraInfo)
          Creates a String with a distribution table of this sample.
 java.lang.String getStringTableD()
          This is equivalent to getStringTableD(true).
 java.lang.String getStringTableD(boolean extraInfo)
          Creates a String with a distribution table of this sample.
 java.lang.String getStringTableI()
          This is equivalent to getStringTableI(true).
 java.lang.String getStringTableI(boolean extraInfo)
          Creates a String with a distribution table of this sample.
 double getVariance()
          Computes the variance of this sample.
 Distribution<T> normaliseBy(double value)
          Returns a new Distribution in which each observation frequency equals to the observation frequency of this distribution divided by the specified value; all resulting non-integer frequencies are rounded to the nearest integer.
 void observe(T observation)
          Adds an observation to this sample.
 Distribution<T> selectInterval(T min, T max)
          A sample that contains only the values of this sample between the specified boundaries.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dist

private java.util.TreeMap<T extends java.lang.Number,java.lang.Integer> dist
Stores the observed sample (MAP: observation -> number of times observed).


minObservation

private T extends java.lang.Number minObservation
Smallest observation in this sample.


maxObservation

private T extends java.lang.Number maxObservation
Largest observation in this sample.


observationCount

private long observationCount
Number of observations in this sample.


mean

private double mean
Sample mean.


variance

private double variance
Sample variance.


comparator

private static final java.util.Comparator<java.lang.Number> comparator
A comparator object for all Number types. This comparator uses MathTools.compare(Number, Number) for the camparison.


newLine

private static java.lang.String newLine
Constructor Detail

Distribution

public Distribution()
Creates a new empty distribution sample.


Distribution

private Distribution(java.util.TreeMap<T,java.lang.Integer> dist)
Used internally for creating discretised sammples and sub-samples of this distribution.

Parameters:
dist - The map containg the observation of this distribution.
Method Detail

getNewLine

private static java.lang.String getNewLine()

observe

public void observe(T observation)
Adds an observation to this sample.

Parameters:
observation - The observed value.

discretise

public Distribution<java.lang.Double> discretise(double interval)
Discretises this sample into intervals of the specified length. This is equivalent to discretise(getMin(), getMax(), interval).

Parameters:
interval - The size of the discrete intervals.
Returns:
A discretised distribution containing the same observations as this distribution.
See Also:
discretise(Number, Number, double)

discretise

public Distribution<java.lang.Double> discretise(T min,
                                                 T max,
                                                 double interval)
Discretises this sample into intervals of the specified length while only considering the observations between min and max (inclusive).

If S = (max - min) / interval, the resulting distribution will contain S observations if (max - min) is not exactly dividable by interval, otherwise it will contain S + 1 observations.
The first observation of the resulting distrinution will be min and the corresponding frequency will be the sum of the frequencies of all observations between min (inclusive) and min + inverval (exclusive) in this original distribution.
The n-th observation of the resulting distrinution will be (min + (n-1) * interval and the corresponding frequency will be the sum of the frequencies of all observations between min + (n-1) * inverval (inclusive) and min + n * inverval (exclusive) in this original distribution. No observations of this original distribution that are larger than the specified max will be considered.

If min > max the resulting distribution will be empty.
This method converts the observations contained in this sample to double; be careful as this might have consequences if the type of the values in this sample cannot be exactly converted to double. Be also aware of the "1 + 1 = 1.9999999" effect which may cause inconsistencies between this and disretised distribution.

Parameters:
min - The minimum of the observation interval to be discretised.
max - The maximum of the observation interval to be discretised.
interval - The size of the discrete intervals.
Returns:
A discretised sample as described above.

discretise

public Distribution<java.lang.Integer> discretise(int interval)
Discretises this sample into intervals of the specified length. This is equivalent to discretise(getMin(), getMax(), interval).

Parameters:
interval - The size of the discrete intervals.
Returns:
A discretised distribution containing the same observations as this distribution.
See Also:
discretise(Number, Number, int)

discretise

public Distribution<java.lang.Integer> discretise(T min,
                                                  T max,
                                                  int interval)
Discretises this sample into intervals of the specified length while only considering the observations between min and max (inclusive).

If S = (max - min) / interval, the resulting distribution will contain S observations if (max - min) is not exactly dividable by interval, otherwise it will contain S + 1 observations.
The first observation of the resulting distrinution will be min and the corresponding frequency will be the sum of the frequencies of all observations between min (inclusive) and min + inverval (exclusive) in this original distribution.
The n-th observation of the resulting distrinution will be (min + (n-1) * interval and the corresponding frequency will be the sum of the frequencies of all observations between min + (n-1) * inverval (inclusive) and min + n * inverval (exclusive) in this original distribution. No observations of this original distribution that are larger than the specified max will be considered.

If min > max the resulting distribution will be empty.
This method converts the observations contained in this sample to int; be careful as this might have consequences if the type of the values in this sample cannot be exactly converted to int.

Parameters:
min - The minimum of the observation interval to be discretised.
max - The maximum of the observation interval to be discretised.
interval - The size of the discrete intervals.
Returns:
A discretised sample as described above.

selectInterval

public Distribution<T> selectInterval(T min,
                                      T max)
A sample that contains only the values of this sample between the specified boundaries.

Parameters:
min - Min observation.
max - Max observation.
Returns:
A new Distribution that is the same as this distribution, but contains only observations between min and max (inclusive).

countObservations

public int countObservations(T observation)
Gets the number of times the specified observation was encountered.

Parameters:
observation - An observation value.
Returns:
The frequency of the speciefied observation.

countProportion

public double countProportion(T observation)
Gets the proportion of the specified observation out of all observations in this sample.

Parameters:
observation - An observation value.
Returns:
The frequency of the speciefied observation divided by the size of this sample.

countObservations

public int countObservations(T min,
                             T max)
Gets the total number of times that any observation within the specified range was encountered.

Parameters:
min - Min observation.
max - Max observation.
Returns:
The sum of the frequencies of all observations between min and max (inclusive).

countProportion

public double countProportion(T min,
                              T max)
Gets proportion of observations within the specified range out of all observations in this sample.

Parameters:
min - Min observation.
max - Max observation.
Returns:
The sum of the frequencies of all observations between min and max (inclusive) divided by the number of observations in this sample.

getObservations

public java.util.Set<T> getObservations()
Returns an unmodifiable Set view of the observations contained in this distribution sample. The set's iterator returns the keys in ascending order. The set is backed by the distributioon, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined.

Returns:
An unmodifiable Set view of the observations contained in this distribution sample.

getMean

public double getMean()
Computes the mean of this sample. This method converts the observations contained in this sample to double values; be careful as this might have consequences if the type of the values in this sample cannot be exactly converted to double.

Returns:
The mean of this sample.

getVariance

public double getVariance()
Computes the variance of this sample. This method converts the observations contained in this sample to double values; be careful as this might have consequences if the type of the values in this sample cannot be exactly converted to double.

Returns:
The variance of this sample.

normaliseBy

public Distribution<T> normaliseBy(double value)
Returns a new Distribution in which each observation frequency equals to the observation frequency of this distribution divided by the specified value; all resulting non-integer frequencies are rounded to the nearest integer.

Parameters:
value - A non-zero value.
Returns:
A new distribution that in normalised through dividing by the specified value.

getLogDistribution

public Distribution<T> getLogDistribution(double base)
Returns a new Distribution in which each observation frequency equals to the logarithm of the corresponding observation frequency of this distribution; all resulting non-integer frequencies are rounded to the nearest integer.

Parameters:
base - The base of the logarithm to use in obtainign the new distribution sample.
Returns:
A new log distribution obtained from this distribution.

getData

public Pair<T[],java.lang.Integer[]> getData()
Gets the data of this distribution as two arrays - one containing the observations, and the other containing the corresponding frequencies.

Returns:
A Pair of arrays, where the first element of the pair is an array contaning all observations in this sample in ascending order and the second element of the pair is an array containing the respective observation frequencies.

getStdDeviation

public double getStdDeviation()
Computes the standard deviation of this sample. This method converts the observations contained in this sample to double values; be careful as this might have consequences if the type of the values in this sample cannot be exactly converted to double.

Returns:
The standard deviation of this sample.

getMin

public T getMin()
Gets the smallest observation in this sample.

Returns:
The smallest observation in this sample or null is this sample is empty.

getMax

public T getMax()
Gets the largest observation in this sample.

Returns:
The largest observation in this sample or null is this sample is empty.

countObservations

public int countObservations()
Gets the total number of observations in this sample. If this sample is very large, it may be prefereable to use countObservationsL().

Returns:
The size of this sample.

countObservationsL

public long countObservationsL()
Gets the total number of observations in this sample.

Returns:
The size of this sample.

getStringStats

public java.lang.String getStringStats()
Creates a String that contains some basic statistical information about this sample, such as the number of observations, mean, variance and standard deviation.

Returns:
A String that contains some basic statistical information about this sample.

getStringTable

public java.lang.String getStringTable(boolean extraInfo)
Creates a String with a distribution table of this sample. The toString() method of the sample values is used and not type specific formating is performed.

Parameters:
extraInfo - Whether to append the result of getStringStats() to the distribution table.
Returns:
A distribution table of this sample.

getStringTable

public java.lang.String getStringTable()
This is equivalent to getStringTable(true).

Returns:
A distribution table of this sample.

getStringTableD

public java.lang.String getStringTableD(boolean extraInfo)
Creates a String with a distribution table of this sample. The observation values are formatted as doubles.

Parameters:
extraInfo - Whether to append the result of getStringStats() to the distribution table.
Returns:
A distribution table of this sample.

getStringTableD

public java.lang.String getStringTableD()
This is equivalent to getStringTableD(true).

Returns:
A distribution table of this sample.

getStringTableI

public java.lang.String getStringTableI(boolean extraInfo)
Creates a String with a distribution table of this sample. The observation values are formatted as ints.

Parameters:
extraInfo - Whether to append the result of getStringStats() to the distribution table.
Returns:
A distribution table of this sample.

getStringTableI

public java.lang.String getStringTableI()
This is equivalent to getStringTableI(true).

Returns:
A distribution table of this sample.

getStringLiveGraphPlot

public java.lang.String getStringLiveGraphPlot()
This method creates a string that - if saved to a file - can be loaded by the LiveGraph-plotter in order to plot this dirtribution sample.

The LiveGraph plotter framework is an open-source project written in Java available from http://www.live-graph.org.

The string returned by this method encodes all observations of this sample. For a straight-forward method of saving the string to a file, see FileTools.writeToFile(String, String, boolean).

Returns:
An encoding of this distribution to be used with the LiveGraph plotter.

getStringLiveGraphPlot

public java.lang.String getStringLiveGraphPlot(java.lang.String info)
This method creates a string that - if saved to a file - can be loaded by the LiveGraph-plotter in order to plot this dirtribution sample.

The LiveGraph plotter framework is an open-source project written in Java available from http://www.live-graph.org.

The string returned by this method encodes all observations of this sample. For a straight-forward method of saving the string to a file, see FileTools.writeToFile(String, String, boolean).

Parameters:
info - An info string to add as data file annotation.
Returns:
An encoding of this distribution to be used with the LiveGraph plotter.

getStringLiveGraphPlot

public java.lang.String getStringLiveGraphPlot(java.lang.String[] infos)
This method creates a string that - if saved to a file - can be loaded by the LiveGraph-plotter in order to plot this dirtribution sample.

The LiveGraph plotter framework is an open-source project written in Java available from http://www.live-graph.org.

The string returned by this method encodes all observations of this sample. For a straight-forward method of saving the string to a file, see FileTools.writeToFile(String, String, boolean).

Parameters:
infos - An list of info strings to add as data file annotation.
Returns:
An encoding of this distribution to be used with the LiveGraph plotter.

SoftNet-Consult Java Utility Library is a member of SourceForge.net