Probo2 - Evaluation tool for abstract argumentation solvers

A python tool that bundles all functionalities needed to collect, validate, analyze, and represent data in the context of benchmarking argumentation solvers.

Changelog

v1.1 - 22.03.2023

add-solver command:
- Interface test checks all supported tasks and formats of solver
- ICCMA23 instance format (.i23) added to solver interface check
- Added progressbars to interface check
- Name is derived from path if not specified
kwt-gen command:
- Added instance generation via a config file ( see /generators/generator_configs/kwt_example.yaml)
- Added parsing of generated instances to ICCMA23 (.i23) format
- Added option to generate random query arguments for DS an DC tasks
- Added option to add generated instances as a benchmark to the database
quick command:
- Added new command to get a quick result for a single instance
board command:
- Added new command to create a dashboard for result visualization
Added calculation of node and edge homophily

Setup

Clone repository
Navigate to project folder and create a virtual enviroment

python3 -m venv probo2_env

Activate enviroment

source probo2_env/bin/activate

Install

pip install -e .

Experiments

In this section we describe the general workflow of probo2 and how experiments are managed. In probo2, a configuration file fully describes experiments. It contains information about which solvers to run, which benchmark to use, how the results should be validated, what statistics should be calculated, and the graphical representation of the results. This standardized pipeline allows easy and scaleable experiment management and reliably reproducing results. All configuration options and their default values can be found in the default_config.yaml file. The file format of the configuration files is YAML. For further information on YAML files see the offical specification. Next, we will take a closer look at the default_config.yaml file and how probo2 handles configuration files in general.

Configuration Files

As mentioned before the default_config.yaml file contains all default configurations. User specifications (via command line or custom configuration files) overwrite the corresponding default configurations. To avoid conflicts between specifications from the command line and those in a (custom) configuration file, the specifications made by the user via the command line have priority. If the user does not specify any other options, the experiment will be performed with the default configurations. However, we recommend creating a separate configuration file for each experiment. In general, options are specified as key:value pairs in YAML files. The following options can be specified :

name

Name/tag of the experiment.
task

Comma-separated list of computational tasks to solve.
solver

Comma-seperated list of ids or names of solvers to run.
benchmark

Comma-separated list of ids or names of benchmarks to run solvers on.
timeout

Instance cut-off value in seconds. If cut-off is exceeded instance is marked as timed out. (Default: 600)
repetitions

Specifies how often an instance should be repeated. (Default: 1)
result_format:

File format for results. (Default: csv)
plot

Comma-separated list of kinds of plots to be created.
statistics

Comma-separated list of statistics to be calculated.
printing

Formatting of the command line output. (Default: 'default')
save_to

Directory to store analysis results in. If not specified, the current working directory is used.
copy_raws

Copy raws result files to save_to destination. (Default: True)
table_export

Comma-separated list of export format for tables.

For a list of choices for an option, run the following command:

probo2 run --help

Note: The list is incomplete and constantly being expanded as probo2 is still under development.

Example

This is an example of a configuration file:

name: my_experiment
task: ['DS-PR']
solver: [1,2,'my_solver']
benchmark: all
timeout: 600
repetitions: 3
plot: ['cactus']
statistics: 
- mean
- solved
- timeouts
- coverage

Here an experiment named "my_experiment" is configured. The solvers with the ids 1 and 2 and the solver with the name "my_solver" should be executed. In addition, all benchmarks should be used and the 'DS-PR' task should be solved. Each instance shall be repeated 3 times and the cut-off per instance is 600 seconds. After that, the results should be visualized using cactus plots. Since no path was specified by the save_to option, the plots are saved in a folder "my_experiment/plots" in the current working directory. In addition, the raw data will also be copied to the folder since the copy_raws option is true by default. Furthermore various statistics are calculated. The "statistics" option also shows an alternative syntax for lists.

To run the experiment simply execute the following command:

probo2 run --config /path/to/my_config.yaml

If you want to change the configuration on the fly without changing the file again, we can do it from the command line. For example, if you want to calculate additional statistics ( sum of runtimes) just add the following Options:

probo2 run --config /path/to/my_config.yaml --statistics sum

Another example can be found in the example_config.yaml

Commands

Probo2 - Evaluation tool for abstract argumentation solvers
- Changelog
  - v1.1 - 22.03.2023
- Table of Contents
- Setup
- Experiments
  - Configuration Files
  - Example
- Commands
  - add-solver
  - solvers
  - delete-solver
  - add-benchmark
  - benchmarks
  - delete-benchmark
  - run
  - status
  - last
  - experiment-info
  - plot
  - calculate
  - validate
  - significance
  - board

add-solver

Usage: probo2 add-solver [OPTIONS]

Add a solver to the database.

Options:

-n, --name

Name of the solver [required]
-p, --path PATH

Path to solver executable [required].

Relative paths are automatically resolved. The executable can be a compiled binary, bash script or a python file. As long as the executable implements the ICCMA interface it should work. More information on the ICCMA interface.
-f, --format [apx|tgf]

Supported format of solver.
-t, --tasks

Supported computational problems
-v, --version

Version of solver [required].

This option has to be specified to make sure the same solver with different versions can be added to the database.
-g, --guess

Pull supported file format and computational problems from solver.
--help

Show this message and exit.

Example

probo2 add-solver --name MyAwesomeSolver --version 1.23 --path ./path/to/MyAwesomeSolver --guess

solvers

Prints solvers in database to console.

Options

-v, --verbose
--id

Print summary of solver with specified id
--help

Show this message and exit.

delete-solver

Usage: probo2 delete-solver [OPTIONS]

Deletes a solver from the database. Deleting has to be confirmed by user. Options:

--id

ID of solver to delete.
--all

Delete all solvers in database.
--help

Show this message and exit.

add-benchmark

Usage: probo2 add-benchmark [OPTIONS]

Adds a benchmark to the database. Before a benchmark is added to the database, it is checked if each instance is present in all specified file formats. Missing instances can be generated after the completion test (user has to confirm generation) or beforehand via the --generate/-g option. It is also possilbe to generate random argument files for the DC/DS problems with the --random_arguments/-rnd option. By default, the following attributes are saved for a benchmark: name, path, format, and the extension of query argument files. However, it is possible to specify additional attributes using your functions. See section "custom function" for further information. If no benchmark name via the --name option is provided, the name is derived from the benchmark path. Instance formats and the file extension of the query arguments (used for DS and DC tasks) are automatically set if not specified. For this the file extensions of all files in the given path are compared with the default file formats\extensions (see src/utils/definitions.DefaultInstanceFormats and src/utils/definitions.DefaultQueryFormats). Formats are set to the intesection between found formats and default formats.

Options:

*-n, --name *

Name of benchmark/fileset [required]
-p, --path

Directory of instances [required] Subdirectories are recursively searched for instances.
-f, --format [apx|tgf]

Supported formats of benchmark/fileset [required]
-ext, --extension_arg_files

Extension of additional argument parameter for DC/DS problems. Default is "arg"
--no_check

Checks if the benchmark is complete.
-g, --generate [apx|tgf]

Generate instances in specified format
-rnd, --random_arguments

Generate additional argument files with a random argument.
-fun, --function

Custom functions to add additional attributes to benchmark.
--help

Show this message and exit.

Example

probo2 add-benchmark --name MyTrickyBenchmark --path ./path/to/MyTrickyBenchmark --format tgf --generate apx -rnd

benchmarks

Prints benchmarks in database to console.

Options

-v, --verbose

Prints additional information on benchmark
--help

Show this message and exit.

delete-benchmark

Usage: probo2 delete-benchmark [OPTIONS]

Deletes a benchmark from the database. Deleting has to be confirmed by user.

Options:

--id

ID of benchmark to delete.
--all

Delete all benchmarks in database
--help

Show this message and exit.

run

Run solver.

Options

-a, --all

Execute all solvers supporting the specified tasks on specified instances.
-slct, --select

Execute (via solver option) selected solver supporting the specified tasks.
-s, --solver

Comma-seperated list of ids or names of solvers (in database) to run.
-b, --benchmark

Comma-seperated list of ids or names of benchmarks (in database) to run solvers on.
--task

Comma-seperated list of tasks to solve.
-t, --timeout

Instance cut-off value in seconds. If cut-off is exceeded instance is marked as timed out.
--dry

Print results to command-line without saving to the database.
--track

Comma-seperated list of tracks to solve.
--tag

Tag for individual experiments.This tag is used to identify the experiment. [required]
--notify

Send a notification to the email address provided as soon as the experiments are finished.
-n, --n_times

Number of repetitions per instance. Run time is the avg of the n runs.
-sub, --subset

Run only the first n instances of a benchmark.
--multi

Run experiment on mutiple CPU cores. The number of cores to use is #physical cores - 1 or 1. This is a heuristic to avoid locking up the system.
--help

Show this message and exit. Example

probo2 run --all --benchmark my_benchmark --task EE-CO,EE-PR --tag MyExperiment --timeout 600 --notify my@mail.de

status

Usage: probo2 status

Provides an overview of the progress of the currently running experiment.

Options:

--help

Show this message and exit.

last

Usage: probo2 last

Shows basic information about the last finished experiment.

Infos include experiment tag, benchmark names, solved tasks, executed solvers and and the time when the experiment was finished

Options:

--help

Show this message and exit.

experiment-info

Usage: probo2 experiment-info [OPTIONS]

Prints some basic information about the experiment speficied with "--tag" option.

Options:

-t, --tag

Experiment tag. [required]
--help

Show this message and exit.

plot

Usage: probo2 plot[OPTIONS]

Create plots of experiment results.

The --tag option is used to specify which experiment the plots should be created for. With the options --solver, --task and --benchmark you can further restrict this selection. If only a tag is given, a plot is automatically created for each task and benchmark of this experiment. With the option --kind you determine what kind of plot should be created. It is also possible to combine the results of different experiments, benchmarks and tasks with the --combine option.

Options:

-t, --tag

Comma-separated list of experiment tags to be selected.
--task

Comma-separated list of task IDs or symbols to be selected.
--benchmark

Comma-separated list of benchmark IDs or names to be selected.
-s, --solver

Comma-separated list of solver IDs or names to be selected.
-st, --save_to

Directory to store plots in. Filenames will be generated automatically.
--vbs

Create virtual best solver from experiment data.
-b, --backend

Backend to use. Choices: [pdf|pgf|png|ps|svg]
-c, --combine

Combine results on specified key Choices: [tag|task_id|benchmark_id]
-k, --kind

Kind of plot to create: Choices: [cactus|count|dist|pie|box|all]
--compress

Compress saved files. Choices: [tar|zip]
-s, --send

Send plots via E-Mail
-l, --last

Plot results for the last finished experiment.
--help

Show this message and exit.

Example

probo2 plot --kind cactus --tag MyExperiment --compress zip --send my@mail.de

calculate

Calculte different statistical measurements. Usage: probo2 calculate [OPTIONS]

Options

-t, --tag

Comma-separated list of experiment tags to be selected.
--task

Comma-separated list of task IDs or symbols to be selected.
--benchmark

Comma-separated list of benchmark IDs or names to be selected.
-s, --solver

Comma-separated list of solver IDs or names to be selected.
-p, --par

Penalty multiplier for PAR score
-s, --solver Comma-separated list of solver ids or names.
-pfmt, --print_format

Table format for printing to console.

Choices: [plain|simple|github|grid|fancy_grid|pipe|orgtbl|jira|presto|pretty|psql|rst|mediawiki|moinmoin|youtrack|html|unsafehtmllatex|latex_raw|latex_booktabs|textile]
-c, --combine

Combine results on key.

Choices: [task_id|benchmark_id|solver_id]
--vbs

Create virtual best solver
-st, --save_to

Directory to store tables.
-e, --export

Export results in specified format.

Choices: [latex|json|csv]
-s, --statistics

Stats to calculate.

Choices:[mean|sum|min|max|median|var|std|coverage|timeouts|solved|errors|all]
-l, --last

Calculate stats for the last finished experiment.
--compress

Compress saved files. Choices: [tar|zip]
-s, --send

Send files via E-Mail
--help

Show this message and exit.

Example

probo2 calculate --tag MyExperiment -s timeouts -s errors -s solved --par 10 --compress zip --send my@mail.de

validate

Usage: probo2 validate [OPTIONS]

Validate experiments results.

With the validation, we have the choice between a pairwise comparison of the results or a validation based on reference results. Pairwise validation is useful when no reference results are available. For each solver pair, the instances that were solved by both solvers are first identified. The results are then compared and the percentage of accordance is calculated and output in the form of a table. It is also possible to show the accordance of the different solvers as a heatmap with the "--plot heatmap" option. Note: The SE task is not supported here.

For the validation with references, we need to specify the path to our references results with the option "--ref". It's important to note that each reference instance has to follow a naming pattern to get matched with the correct result instance. The naming of the reference instance has to include (1) the full name of the instance to validate, (2) the task, and (3) the specified extension. For example for the instance "my_instance.apx", the corresponding reference instance for the EE-PR task would be named as follows: "my_instance_EE-PR.out" The order of name and task does not matter. The extension is provided via the "--extension" option.

Options:

--tag

Experiment tag to be validated
-t, --task

Comma-separated list of task IDs or symbols to be validated.
-b, --benchmark

Benchmark name or id to be validated. [required]
-s, --solver

Comma-separated list of solver IDs or names to be validated. [required]
-f, --filter

Filter results in database. Format: [column:value]
-r, --reference PATH

Path to reference files.
--update_db

Update instances status (correct, incorrect, no reference) in database.
-pw, --pairwise

Pairwise comparision of results. Not supported for the SE task.
-e, --export

Export results in specified format. Choices: [json|latex|csv]
--raw

Export raw validation results in csv format.
-p, --plot

Create a heatmap for pairwise comparision results and a count plot for validation with references. Choices: [heatmap|count]
-st, --save_to

Directory to store plots and data in. Filenames will be generated automatically.
-ext, --extension

Reference file extension
--compress

Compress saved files. Choices: [tar|zip]
--send

Send plots and data via E-Mail.
-v, --verbose

Verbose output for validation with reference. For each solver the instance names of not validated and incorrect instances is printed to the console.
--help

Show this message and exit.

Example

Validation with references:

probo2 validate --tag MyExperiment --benchmark MyBenchmark --references /path/to/references --export latex --plot count --compress zip --send my@mail.de --save_to .

Pairwise validation:

probo2 validate --tag MyExperiment --benchmark MyBenchmark --pairwise --plot heatmap --save_to .

significance

Usage: probo2 significance [OPTIONS]

Parmatric and non-parametric significance and post-hoc tests.

Options:

--tag

Experiment tag to be tested.
-t, --task

Comma-separated list of task IDs or symbols to be tested.
--benchmark

Benchmark name or id to be tested.
-s, --solver

Comma-separated list of solver id.s [required]
-p, --parametric

Parametric significance test. ANOVA for mutiple solvers and t-test for two solvers. Choices: [ANOVA|t-test]
-np, --non_parametric

Non-parametric significance test. kruksal for mutiple solvers and mann-whitney-u for two solvers Choices: [kruskal|mann-whitney-u]
-php, --post_hoc_parametric

Parametric post-hoc tests. Choices: [scheffe|tamhane|ttest|tukey|tukey_hsd]
-phn, --post_hoc_non_parametric

Non-parametric post-hoc tests. Choices: [conover|dscf|mannwhitney|nemenyi|dunn|npm_test|vanwaerden|wilcoxon]
-a, --alpha FLOAT

Significance level.
-l, --last

Test the last finished experiment.
--help

Show this message and exit.

Example

probo2 significance --tag MyExperiment --parametric ANOVA --php scheffe

board

Probo2 provides an interactive dashboard to visualize results of experiments. The dashboard contains plots and tables which can be filtered using checkboxes in the sidebar.

Usage: probo2 board [OPTIONS]

Launch dashboard for experiment visualization.

Options:

--tag, -t

Experiment tag
--raw, -r

Full path to a raw results file (raw.csv).

Note: Only needed when no tag is specified.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
temp.time		temp.time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probo2 - Evaluation tool for abstract argumentation solvers

Changelog

v1.1 - 22.03.2023

Table of Contents

Setup

Experiments

Configuration Files

Example

Commands

add-solver

solvers

delete-solver

add-benchmark

benchmarks

delete-benchmark

run

status

last

experiment-info

plot

calculate

validate

significance

board

About

Releases

Packages

Languages

License

aig-hagen/probo2

Folders and files

Latest commit

History

Repository files navigation

Probo2 - Evaluation tool for abstract argumentation solvers

Changelog

v1.1 - 22.03.2023

Table of Contents

Setup

Experiments

Configuration Files

Example

Commands

add-solver

solvers

delete-solver

add-benchmark

benchmarks

delete-benchmark

run

status

last

experiment-info

plot

calculate

validate

significance

board

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages