Using Pilot to analyze existing data

Existing experiment data can be easily analyzed using the analyze command of Pilot.

Basic Command

You can just pipe your data into Pilot. Say you have existing test data in file unit_test_analyze_input.csv (sample in the cli/test directory of the Pilot source code) with one sample per line, you can do

pilot analyze unit_test_analyze_input.csv

or pipe the data in through a pipe

cat | pilot analyze -

(the ending - option tells Pilot to read data from stdin)

The output would be like:

1:[2016-08-16 10:34:45] <info> Preset mode activated: quick
2:[2016-08-16 10:34:45] <info> Setting the limit of autocorrelation coefficient to 0.8
sample_size 48
mean 1.756458
optimal_subsession_size 1
CI 0.157416
variance 0.073474
subsession_autocorrelation_coefficient 0.636556

To get help on using the analyze command, run:

pilot analyze --help

Handling Comma Separated Value (CSV) File

Pilot can read data from a Comma Separated Value (CSV) file. CSV file holds data in plaintext format and can be generated by many programs, like LibreOffice and Excel. The following options can be passed to Pilot to help parsing a CSV file:

  • Use -f n to set the field (or column) to extract data from. Note that n starts with 0, so data from the (n+1)th field will be analyzed. Currently only one field can be analyzed. If you need to analyze data in multiple fields please run Pilot multiple times with different -f n.
  • Use -i n to ignore the first n lines of the input file. This can be useful for ignoring the CSV file header.

Autocorrelation Analysis

Pilot performs autocorrelation analysis to check whether the input data is i.i.d. and uses subsession analysis to mitigate the high autocorrelation (see for details). The limit for the autocorrelation coefficient (AC) can be set by using --ac n or using a preset. The following ACs are used for three presets:

  • Quick mode (default): AC limit to 0.8
  • Normal mode: AC limit to 0.2
  • Strict mode: AC limit to 0.1

If getting precise result is critical, you should use a smaller AC limit, like 0.1. The consequence is that the confidence interval would become wider with the same amount of input samples.