0 / 0
Column Statistics content model and Pairwise Statistics content model
Last updated: Jan 18, 2024
Column Statistics content model and Pairwise Statistics content model

The Column Statistics content model provides access to statistics that can be computed for each field (univariate statistics). The Pairwise Statistics content model provides access to statistics that can be computed between pairs of fields or values in a field.

Any of these statistics measures are possible:

  • Count
  • UniqueCount
  • ValidCount
  • Mean
  • Sum
  • Min
  • Max
  • Range
  • Variance
  • StandardDeviation
  • StandardErrorOfMean
  • Skewness
  • SkewnessStandardError
  • Kurtosis
  • KurtosisStandardError
  • Median
  • Mode
  • Pearson
  • Covariance
  • TTest
  • FTest

Some values are only appropriate from single column statistics while others are only appropriate for pairwise statistics.

Nodes that produce these are:

  • Statistics node produces column statistics and can produce pairwise statistics when correlation fields are specified
  • Data Audit node produces column and can produce pairwise statistics when an overlay field is specified.
  • Means node produces pairwise statistics when comparing pairs of fields or comparing a field's values with other field summaries.

Which content models and statistics are available depends on both the particular node's capabilities and the settings within the node.

Table 1. Methods for the Column Statistics content model
Method Return types Description
getAvailableStatistics() List<StatisticType> Returns the available statistics in this model. Not all fields necessarily have values for all statistics.
getAvailableColumns() List<String> Returns the column names for which statistics were computed.
getStatistic(String column, StatisticType statistic) Number Returns the statistic values associated with the column.
reset() void Flushes any internal storage associated with this content model.
Table 2. Methods for the Pairwise Statistics content model
Method Return types Description
getAvailableStatistics() List<StatisticType> Returns the available statistics in this model. Not all fields necessarily have values for all statistics.
getAvailablePrimaryColumns() List<String> Returns the primary column names for which statistics were computed.
getAvailablePrimaryValues() List<Object> Returns the values of the primary column for which statistics were computed.
getAvailableSecondaryColumns() List<String> Returns the secondary column names for which statistics were computed.
getStatistic(String primaryColumn, String secondaryColumn, StatisticType statistic) Number Returns the statistic values associated with the columns.
getStatistic(String primaryColumn, Object primaryValue, String secondaryColumn, StatisticType statistic) Number Returns the statistic values associated with the primary column value and the secondary column.
reset() void Flushes any internal storage associated with this content model.

Nodes and outputs

This table lists nodes that build outputs that include this type of content model.

Table 3. Nodes and outputs
Node name Output name Container ID Notes
"means" (Means node) "means" "columnStatistics"  
"means" (Means node) "means" "pairwiseStatistics"  
"dataaudit" (Data Audit node) "means" "columnStatistics"  
"statistics" (Statistics node) "statistics" "columnStatistics" Only generated when specific fields are examined.
"statistics" (Statistics node) "statistics" "pairwiseStatistics" Only generated when fields are correlated.

Example script

from modeler.api import StatisticType
stream = modeler.script.stream()

# Set up the input data
varfile = stream.createAt("variablefile", "File", 96, 96)
varfile.setPropertyValue("full_filename", "$CLEO/DEMOS/DRUG1n")

# Now create the statistics node. This can produce both
# column statistics and pairwise statistics
statisticsnode = stream.createAt("statistics", "Stats", 192, 96)
statisticsnode.setPropertyValue("examine", ["Age", "Na", "K"])
statisticsnode.setPropertyValue("correlate", ["Age", "Na", "K"])
stream.link(varfile, statisticsnode)

results = []
statisticsnode.run(results)
statsoutput = results[0]
statscm = statsoutput.getContentModel("columnStatistics")
if (statscm != None):
	cols = statscm.getAvailableColumns()
	stats = statscm.getAvailableStatistics()
	print "Column stats:", cols[0], str(stats[0]), " = ", statscm.getStatistic(cols[0], stats[0])

statscm = statsoutput.getContentModel("pairwiseStatistics")
if (statscm != None):
	pcols = statscm.getAvailablePrimaryColumns()
	scols = statscm.getAvailableSecondaryColumns()
	stats = statscm.getAvailableStatistics()
	corr = statscm.getStatistic(pcols[0], scols[0], StatisticType.Pearson)
	print "Pairwise stats:", pcols[0], scols[0], " Pearson = ", corr