prefuse.util
Class DataLib

java.lang.Object
  extended by prefuse.util.DataLib

public class DataLib
extends java.lang.Object

Functions for processing an iterator of tuples, including the creation of arrays of particular tuple data values and summary statistics (min, max, median, mean, standard deviation).

Author:
jeffrey heer

Constructor Summary
DataLib()
           
 
Method Summary
static int count(java.util.Iterator tuples, java.lang.String field)
          Get the number of values in a data column.
static double deviation(java.util.Iterator tuples, java.lang.String field)
          Get the standard deviation of a tuple data value.
static double deviation(java.util.Iterator tuples, java.lang.String field, double mean)
          Get the standard deviation of a tuple data value.
static java.lang.Class inferType(TupleSet tuples, java.lang.String field)
          Infer the data field type across all tuples in a TupleSet.
static Tuple max(java.util.Iterator tuples, java.lang.String field)
          Get the Tuple with the maximum data field value.
static Tuple max(java.util.Iterator tuples, java.lang.String field, java.util.Comparator cmp)
          Get the Tuple with the maximum data field value.
static Tuple max(TupleSet tuples, java.lang.String field)
          Get the Tuple with the maximum data field value.
static Tuple max(TupleSet tuples, java.lang.String field, java.util.Comparator cmp)
          Get the Tuple with the maximum data field value.
static double mean(java.util.Iterator tuples, java.lang.String field)
          Get the mean value of a tuple data value.
static Tuple median(java.util.Iterator tuples, java.lang.String field)
          Get the Tuple with the median data field value.
static Tuple median(java.util.Iterator tuples, java.lang.String field, java.util.Comparator cmp)
          Get the Tuple with the median data field value.
static Tuple median(TupleSet tuples, java.lang.String field)
          Get the Tuple with the median data field value.
static Tuple median(TupleSet tuples, java.lang.String field, java.util.Comparator cmp)
          Get the Tuple with the median data field value.
static Tuple min(java.util.Iterator tuples, java.lang.String field)
          Get the Tuple with the minimum data field value.
static Tuple min(java.util.Iterator tuples, java.lang.String field, java.util.Comparator cmp)
          Get the Tuple with the minimum data field value.
static Tuple min(TupleSet tuples, java.lang.String field)
          Get the Tuple with the minimum data field value.
static Tuple min(TupleSet tuples, java.lang.String field, java.util.Comparator cmp)
          Get the Tuple with the minimum data field value.
static java.lang.Object[] ordinalArray(java.util.Iterator tuples, java.lang.String field)
          Get a sorted array containing all column values for a given tuple iterator and field.
static java.lang.Object[] ordinalArray(java.util.Iterator tuples, java.lang.String field, java.util.Comparator cmp)
          Get a sorted array containing all column values for a given table and field.
static java.lang.Object[] ordinalArray(TupleSet tuples, java.lang.String field)
          Get a sorted array containing all column values for a given tuple iterator and field.
static java.lang.Object[] ordinalArray(TupleSet tuples, java.lang.String field, java.util.Comparator cmp)
          Get a sorted array containing all column values for a given table and field.
static java.util.Map ordinalMap(java.util.Iterator tuples, java.lang.String field)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static java.util.Map ordinalMap(java.util.Iterator tuples, java.lang.String field, java.util.Comparator cmp)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static java.util.Map ordinalMap(TupleSet tuples, java.lang.String field)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static java.util.Map ordinalMap(TupleSet tuples, java.lang.String field, java.util.Comparator cmp)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static double sum(java.util.Iterator tuples, java.lang.String field)
          Get the sum of a tuple data value.
static java.lang.Object[] toArray(java.util.Iterator tuples, java.lang.String field)
          Get an array containing all data values for a given tuple iteration and field.
static double[] toDoubleArray(java.util.Iterator tuples, java.lang.String field)
          Get an array of doubles containing all column values for a given table and field.
static int uniqueCount(java.util.Iterator tuples, java.lang.String field)
          Get the number of distinct values in a data column.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataLib

public DataLib()
Method Detail

toArray

public static java.lang.Object[] toArray(java.util.Iterator tuples,
                                         java.lang.String field)
Get an array containing all data values for a given tuple iteration and field.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
an array containing the data values

toDoubleArray

public static double[] toDoubleArray(java.util.Iterator tuples,
                                     java.lang.String field)
Get an array of doubles containing all column values for a given table and field. The Table.canGetDouble(String) method must return true for the given column name, otherwise an exception will be thrown.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
an array of doubles containing the column values

ordinalArray

public static java.lang.Object[] ordinalArray(java.util.Iterator tuples,
                                              java.lang.String field)
Get a sorted array containing all column values for a given tuple iterator and field.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
an array containing the column values sorted

ordinalArray

public static java.lang.Object[] ordinalArray(java.util.Iterator tuples,
                                              java.lang.String field,
                                              java.util.Comparator cmp)
Get a sorted array containing all column values for a given table and field.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
an array containing the column values sorted

ordinalArray

public static java.lang.Object[] ordinalArray(TupleSet tuples,
                                              java.lang.String field)
Get a sorted array containing all column values for a given tuple iterator and field.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
an array containing the column values sorted

ordinalArray

public static java.lang.Object[] ordinalArray(TupleSet tuples,
                                              java.lang.String field,
                                              java.util.Comparator cmp)
Get a sorted array containing all column values for a given table and field.

Parameters:
tuples - a TupleSet
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
an array containing the column values sorted

ordinalMap

public static java.util.Map ordinalMap(java.util.Iterator tuples,
                                       java.lang.String field)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
a map mapping column values to their position in a sorted order of values

ordinalMap

public static java.util.Map ordinalMap(java.util.Iterator tuples,
                                       java.lang.String field,
                                       java.util.Comparator cmp)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
a map mapping column values to their position in a sorted order of values

ordinalMap

public static java.util.Map ordinalMap(TupleSet tuples,
                                       java.lang.String field)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
a map mapping column values to their position in a sorted order of values

ordinalMap

public static java.util.Map ordinalMap(TupleSet tuples,
                                       java.lang.String field,
                                       java.util.Comparator cmp)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - a TupleSet
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
a map mapping column values to their position in a sorted order of values

count

public static int count(java.util.Iterator tuples,
                        java.lang.String field)
Get the number of values in a data column. Duplicates will be counted.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the number of values

uniqueCount

public static int uniqueCount(java.util.Iterator tuples,
                              java.lang.String field)
Get the number of distinct values in a data column.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the number of distinct values

min

public static Tuple min(java.util.Iterator tuples,
                        java.lang.String field)
Get the Tuple with the minimum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the Tuple with the minimum data field value

min

public static Tuple min(java.util.Iterator tuples,
                        java.lang.String field,
                        java.util.Comparator cmp)
Get the Tuple with the minimum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
the Tuple with the minimum data field value

min

public static Tuple min(TupleSet tuples,
                        java.lang.String field,
                        java.util.Comparator cmp)
Get the Tuple with the minimum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the minimum data field value

min

public static Tuple min(TupleSet tuples,
                        java.lang.String field)
Get the Tuple with the minimum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the minimum data field value

max

public static Tuple max(java.util.Iterator tuples,
                        java.lang.String field)
Get the Tuple with the maximum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the Tuple with the maximum data field value

max

public static Tuple max(java.util.Iterator tuples,
                        java.lang.String field,
                        java.util.Comparator cmp)
Get the Tuple with the maximum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
the Tuple with the maximum data field value

max

public static Tuple max(TupleSet tuples,
                        java.lang.String field,
                        java.util.Comparator cmp)
Get the Tuple with the maximum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the maximum data field value

max

public static Tuple max(TupleSet tuples,
                        java.lang.String field)
Get the Tuple with the maximum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the maximum data field value

median

public static Tuple median(java.util.Iterator tuples,
                           java.lang.String field)
Get the Tuple with the median data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the Tuple with the median data field value

median

public static Tuple median(java.util.Iterator tuples,
                           java.lang.String field,
                           java.util.Comparator cmp)
Get the Tuple with the median data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
the Tuple with the median data field value

median

public static Tuple median(TupleSet tuples,
                           java.lang.String field,
                           java.util.Comparator cmp)
Get the Tuple with the median data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the median data field value

median

public static Tuple median(TupleSet tuples,
                           java.lang.String field)
Get the Tuple with the median data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the median data field value

mean

public static double mean(java.util.Iterator tuples,
                          java.lang.String field)
Get the mean value of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the mean value, or NaN if a non-numeric data type is encountered

deviation

public static double deviation(java.util.Iterator tuples,
                               java.lang.String field)
Get the standard deviation of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the standard deviation value, or NaN if a non-numeric data type is encountered

deviation

public static double deviation(java.util.Iterator tuples,
                               java.lang.String field,
                               double mean)
Get the standard deviation of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
mean - the mean of the column, used to speed up accurate deviation calculation
Returns:
the standard deviation value, or NaN if a non-numeric data type is encountered

sum

public static double sum(java.util.Iterator tuples,
                         java.lang.String field)
Get the sum of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the sum, or NaN if a non-numeric data type is encountered

inferType

public static java.lang.Class inferType(TupleSet tuples,
                                        java.lang.String field)
Infer the data field type across all tuples in a TupleSet.

Parameters:
tuples - the TupleSet to analyze
field - the data field to type check
Returns:
the inferred data type
Throws:
java.lang.IllegalArgumentException - if incompatible types are used


Copyright © 2007 Regents of the University of California