Skip to content

aig-hagen/probo2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Probo2 - Evaluation tool for abstract argumentation solvers

License: MIT

A python tool that bundles all functionalities needed to collect, validate, analyze, and represent data in the context of benchmarking argumentation solvers.

Changelog

v1.1 - 22.03.2023

  • add-solver command:

    • Interface test checks all supported tasks and formats of solver
    • ICCMA23 instance format (.i23) added to solver interface check
    • Added progressbars to interface check
    • Name is derived from path if not specified
  • kwt-gen command:

    • Added instance generation via a config file ( see /generators/generator_configs/kwt_example.yaml)
    • Added parsing of generated instances to ICCMA23 (.i23) format
    • Added option to generate random query arguments for DS an DC tasks
    • Added option to add generated instances as a benchmark to the database
  • quick command:

    • Added new command to get a quick result for a single instance
  • board command:

    • Added new command to create a dashboard for result visualization
  • Added calculation of node and edge homophily

Table of Contents

1.Setup

2.Experiments

3.Commands

Setup

  1. Clone repository
  2. Navigate to project folder and create a virtual enviroment
python3 -m venv probo2_env
  1. Activate enviroment
source probo2_env/bin/activate
  1. Install
pip install -e .

Experiments

In this section we describe the general workflow of probo2 and how experiments are managed. In probo2, a configuration file fully describes experiments. It contains information about which solvers to run, which benchmark to use, how the results should be validated, what statistics should be calculated, and the graphical representation of the results. This standardized pipeline allows easy and scaleable experiment management and reliably reproducing results. All configuration options and their default values can be found in the default_config.yaml file. The file format of the configuration files is YAML. For further information on YAML files see the offical specification. Next, we will take a closer look at the default_config.yaml file and how probo2 handles configuration files in general.

Configuration Files

As mentioned before the default_config.yaml file contains all default configurations. User specifications (via command line or custom configuration files) overwrite the corresponding default configurations. To avoid conflicts between specifications from the command line and those in a (custom) configuration file, the specifications made by the user via the command line have priority. If the user does not specify any other options, the experiment will be performed with the default configurations. However, we recommend creating a separate configuration file for each experiment. In general, options are specified as key:value pairs in YAML files. The following options can be specified :

  • name

    Name/tag of the experiment.

  • task

    Comma-separated list of computational tasks to solve.

  • solver

    Comma-seperated list of ids or names of solvers to run.

  • benchmark

    Comma-separated list of ids or names of benchmarks to run solvers on.

  • timeout

    Instance cut-off value in seconds. If cut-off is exceeded instance is marked as timed out. (Default: 600)

  • repetitions

    Specifies how often an instance should be repeated. (Default: 1)

  • result_format:

    File format for results. (Default: csv)

  • plot

    Comma-separated list of kinds of plots to be created.

  • statistics

    Comma-separated list of statistics to be calculated.

  • printing

    Formatting of the command line output. (Default: 'default')

  • save_to

    Directory to store analysis results in. If not specified, the current working directory is used.

  • copy_raws

    Copy raws result files to save_to destination. (Default: True)

  • table_export

    Comma-separated list of export format for tables.

For a list of choices for an option, run the following command:

probo2 run --help

Note: The list is incomplete and constantly being expanded as probo2 is still under development.

Example

This is an example of a configuration file:

name: my_experiment
task: ['DS-PR']
solver: [1,2,'my_solver']
benchmark: all
timeout: 600
repetitions: 3
plot: ['cactus']
statistics: 
- mean
- solved
- timeouts
- coverage

Here an experiment named "my_experiment" is configured. The solvers with the ids 1 and 2 and the solver with the name "my_solver" should be executed. In addition, all benchmarks should be used and the 'DS-PR' task should be solved. Each instance shall be repeated 3 times and the cut-off per instance is 600 seconds. After that, the results should be visualized using cactus plots. Since no path was specified by the save_to option, the plots are saved in a folder "my_experiment/plots" in the current working directory. In addition, the raw data will also be copied to the folder since the copy_raws option is true by default. Furthermore various statistics are calculated. The "statistics" option also shows an alternative syntax for lists.

To run the experiment simply execute the following command:

probo2 run --config /path/to/my_config.yaml

If you want to change the configuration on the fly without changing the file again, we can do it from the command line. For example, if you want to calculate additional statistics ( sum of runtimes) just add the following Options:

probo2 run --config /path/to/my_config.yaml --statistics sum

Another example can be found in the example_config.yaml

Commands

add-solver

Usage: probo2 add-solver [OPTIONS]

Add a solver to the database.

Options:

  • -n, --name

    Name of the solver [required]

  • -p, --path PATH

    Path to solver executable [required].

    Relative paths are automatically resolved. The executable can be a compiled binary, bash script or a python file. As long as the executable implements the ICCMA interface it should work. More information on the ICCMA interface.

  • -f, --format [apx|tgf]

    Supported format of solver.

  • -t, --tasks

    Supported computational problems

  • -v, --version

    Version of solver [required].

    This option has to be specified to make sure the same solver with different versions can be added to the database.

  • -g, --guess

    Pull supported file format and computational problems from solver.

  • --help

    Show this message and exit.

Example

probo2 add-solver --name MyAwesomeSolver --version 1.23 --path ./path/to/MyAwesomeSolver --guess

solvers

Prints solvers in database to console.

Options

  • -v, --verbose

  • --id

    Print summary of solver with specified id

  • --help

    Show this message and exit.

delete-solver

Usage: probo2 delete-solver [OPTIONS]

Deletes a solver from the database. Deleting has to be confirmed by user. Options:

  • --id

    ID of solver to delete.

  • --all

    Delete all solvers in database.

  • --help

    Show this message and exit.

add-benchmark

Usage: probo2 add-benchmark [OPTIONS]

Adds a benchmark to the database. Before a benchmark is added to the database, it is checked if each instance is present in all specified file formats. Missing instances can be generated after the completion test (user has to confirm generation) or beforehand via the --generate/-g option. It is also possilbe to generate random argument files for the DC/DS problems with the --random_arguments/-rnd option. By default, the following attributes are saved for a benchmark: name, path, format, and the extension of query argument files. However, it is possible to specify additional attributes using your functions. See section "custom function" for further information. If no benchmark name via the --name option is provided, the name is derived from the benchmark path. Instance formats and the file extension of the query arguments (used for DS and DC tasks) are automatically set if not specified. For this the file extensions of all files in the given path are compared with the default file formats\extensions (see src/utils/definitions.DefaultInstanceFormats and src/utils/definitions.DefaultQueryFormats). Formats are set to the intesection between found formats and default formats.

Options:

  • *-n, --name *

    Name of benchmark/fileset [required]

  • -p, --path

    Directory of instances [required] Subdirectories are recursively searched for instances.

  • -f, --format [apx|tgf]

    Supported formats of benchmark/fileset [required]

  • -ext, --extension_arg_files

    Extension of additional argument parameter for DC/DS problems. Default is "arg"

  • --no_check

    Checks if the benchmark is complete.

  • -g, --generate [apx|tgf]

    Generate instances in specified format

  • -rnd, --random_arguments

    Generate additional argument files with a random argument.

  • -fun, --function

    Custom functions to add additional attributes to benchmark.

  • --help

    Show this message and exit.

Example

probo2 add-benchmark --name MyTrickyBenchmark --path ./path/to/MyTrickyBenchmark --format tgf --generate apx -rnd

benchmarks

Prints benchmarks in database to console.

Options

  • -v, --verbose

    Prints additional information on benchmark

  • --help

    Show this message and exit.

delete-benchmark

Usage: probo2 delete-benchmark [OPTIONS]

Deletes a benchmark from the database. Deleting has to be confirmed by user.

Options:

  • --id

    ID of benchmark to delete.

  • --all

    Delete all benchmarks in database

  • --help

    Show this message and exit.

run

Run solver.

Options

  • -a, --all

    Execute all solvers supporting the specified tasks on specified instances.

  • -slct, --select

    Execute (via solver option) selected solver supporting the specified tasks.

  • -s, --solver

    Comma-seperated list of ids or names of solvers (in database) to run.

  • -b, --benchmark

    Comma-seperated list of ids or names of benchmarks (in database) to run solvers on.

  • --task

    Comma-seperated list of tasks to solve.

  • -t, --timeout

    Instance cut-off value in seconds. If cut-off is exceeded instance is marked as timed out.

  • --dry

    Print results to command-line without saving to the database.

  • --track

    Comma-seperated list of tracks to solve.

  • --tag

    Tag for individual experiments.This tag is used to identify the experiment. [required]

  • --notify

    Send a notification to the email address provided as soon as the experiments are finished.

  • -n, --n_times

    Number of repetitions per instance. Run time is the avg of the n runs.

  • -sub, --subset

    Run only the first n instances of a benchmark.

  • --multi

    Run experiment on mutiple CPU cores. The number of cores to use is #physical cores - 1 or 1. This is a heuristic to avoid locking up the system.

  • --help

    Show this message and exit. Example

probo2 run --all --benchmark my_benchmark --task EE-CO,EE-PR --tag MyExperiment --timeout 600 --notify my@mail.de

status

Usage: probo2 status

Provides an overview of the progress of the currently running experiment.

Options:

  • --help

    Show this message and exit.

last

Usage: probo2 last

Shows basic information about the last finished experiment.

Infos include experiment tag, benchmark names, solved tasks, executed solvers and and the time when the experiment was finished

Options:

  • --help

    Show this message and exit.

experiment-info

Usage: probo2 experiment-info [OPTIONS]

Prints some basic information about the experiment speficied with "--tag" option.

Options:

  • -t, --tag

    Experiment tag. [required]

  • --help

    Show this message and exit.

plot

Usage: probo2 plot[OPTIONS]

Create plots of experiment results.

The --tag option is used to specify which experiment the plots should be created for. With the options --solver, --task and --benchmark you can further restrict this selection. If only a tag is given, a plot is automatically created for each task and benchmark of this experiment. With the option --kind you determine what kind of plot should be created. It is also possible to combine the results of different experiments, benchmarks and tasks with the --combine option.

Options:

  • -t, --tag

    Comma-separated list of experiment tags to be selected.

  • --task

    Comma-separated list of task IDs or symbols to be selected.

  • --benchmark

    Comma-separated list of benchmark IDs or names to be selected.

  • -s, --solver

    Comma-separated list of solver IDs or names to be selected.

  • -st, --save_to

    Directory to store plots in. Filenames will be generated automatically.

  • --vbs

    Create virtual best solver from experiment data.

  • -b, --backend

    Backend to use. Choices: [pdf|pgf|png|ps|svg]

  • -c, --combine

    Combine results on specified key Choices: [tag|task_id|benchmark_id]

  • -k, --kind

    Kind of plot to create: Choices: [cactus|count|dist|pie|box|all]

  • --compress

    Compress saved files. Choices: [tar|zip]

  • -s, --send

    Send plots via E-Mail

  • -l, --last

    Plot results for the last finished experiment.

  • --help

    Show this message and exit.

Example

probo2 plot --kind cactus --tag MyExperiment --compress zip --send my@mail.de

calculate

Calculte different statistical measurements. Usage: probo2 calculate [OPTIONS]

Options

  • -t, --tag

    Comma-separated list of experiment tags to be selected.

  • --task

    Comma-separated list of task IDs or symbols to be selected.

  • --benchmark

    Comma-separated list of benchmark IDs or names to be selected.

  • -s, --solver

    Comma-separated list of solver IDs or names to be selected.

  • -p, --par

    Penalty multiplier for PAR score

  • -s, --solver Comma-separated list of solver ids or names.

  • -pfmt, --print_format

    Table format for printing to console.

    Choices: [plain|simple|github|grid|fancy_grid|pipe|orgtbl|jira|presto|pretty|psql|rst|mediawiki|moinmoin|youtrack|html|unsafehtmllatex|latex_raw|latex_booktabs|textile]

  • -c, --combine

    Combine results on key.

    Choices: [task_id|benchmark_id|solver_id]

  • --vbs

    Create virtual best solver

  • -st, --save_to

    Directory to store tables.

  • -e, --export

    Export results in specified format.

    Choices: [latex|json|csv]

  • -s, --statistics

    Stats to calculate.

    Choices:[mean|sum|min|max|median|var|std|coverage|timeouts|solved|errors|all]

  • -l, --last

    Calculate stats for the last finished experiment.

  • --compress

    Compress saved files. Choices: [tar|zip]

  • -s, --send

    Send files via E-Mail

  • --help

    Show this message and exit.

Example

probo2 calculate --tag MyExperiment -s timeouts -s errors -s solved --par 10 --compress zip --send my@mail.de

validate

Usage: probo2 validate [OPTIONS]

Validate experiments results.

With the validation, we have the choice between a pairwise comparison of the results or a validation based on reference results. Pairwise validation is useful when no reference results are available. For each solver pair, the instances that were solved by both solvers are first identified. The results are then compared and the percentage of accordance is calculated and output in the form of a table. It is also possible to show the accordance of the different solvers as a heatmap with the "--plot heatmap" option. Note: The SE task is not supported here.

For the validation with references, we need to specify the path to our references results with the option "--ref". It's important to note that each reference instance has to follow a naming pattern to get matched with the correct result instance. The naming of the reference instance has to include (1) the full name of the instance to validate, (2) the task, and (3) the specified extension. For example for the instance "my_instance.apx", the corresponding reference instance for the EE-PR task would be named as follows: "my_instance_EE-PR.out" The order of name and task does not matter. The extension is provided via the "--extension" option.

Options:

  • --tag

    Experiment tag to be validated

  • -t, --task

    Comma-separated list of task IDs or symbols to be validated.

  • -b, --benchmark

    Benchmark name or id to be validated. [required]

  • -s, --solver

    Comma-separated list of solver IDs or names to be validated. [required]

  • -f, --filter

    Filter results in database. Format: [column:value]

  • -r, --reference PATH

    Path to reference files.

  • --update_db

    Update instances status (correct, incorrect, no reference) in database.

  • -pw, --pairwise

    Pairwise comparision of results. Not supported for the SE task.

  • -e, --export

    Export results in specified format. Choices: [json|latex|csv]

  • --raw

    Export raw validation results in csv format.

  • -p, --plot

    Create a heatmap for pairwise comparision results and a count plot for validation with references. Choices: [heatmap|count]

  • -st, --save_to

    Directory to store plots and data in. Filenames will be generated automatically.

  • -ext, --extension

    Reference file extension

  • --compress

    Compress saved files. Choices: [tar|zip]

  • --send

    Send plots and data via E-Mail.

  • -v, --verbose

    Verbose output for validation with reference. For each solver the instance names of not validated and incorrect instances is printed to the console.

  • --help

    Show this message and exit.

Example

Validation with references:

probo2 validate --tag MyExperiment --benchmark MyBenchmark --references /path/to/references --export latex --plot count --compress zip --send my@mail.de --save_to .

Pairwise validation:

probo2 validate --tag MyExperiment --benchmark MyBenchmark --pairwise --plot heatmap --save_to .

significance

Usage: probo2 significance [OPTIONS]

Parmatric and non-parametric significance and post-hoc tests.

Options:

  • --tag

    Experiment tag to be tested.

  • -t, --task

    Comma-separated list of task IDs or symbols to be tested.

  • --benchmark

    Benchmark name or id to be tested.

  • -s, --solver

    Comma-separated list of solver id.s [required]

  • -p, --parametric

    Parametric significance test. ANOVA for mutiple solvers and t-test for two solvers. Choices: [ANOVA|t-test]

  • -np, --non_parametric

    Non-parametric significance test. kruksal for mutiple solvers and mann-whitney-u for two solvers Choices: [kruskal|mann-whitney-u]

  • -php, --post_hoc_parametric

    Parametric post-hoc tests. Choices: [scheffe|tamhane|ttest|tukey|tukey_hsd]

  • -phn, --post_hoc_non_parametric

    Non-parametric post-hoc tests. Choices: [conover|dscf|mannwhitney|nemenyi|dunn|npm_test|vanwaerden|wilcoxon]

  • -a, --alpha FLOAT

    Significance level.

  • -l, --last

    Test the last finished experiment.

  • --help

    Show this message and exit.

Example

probo2 significance --tag MyExperiment --parametric ANOVA --php scheffe

board

Probo2 provides an interactive dashboard to visualize results of experiments. The dashboard contains plots and tables which can be filtered using checkboxes in the sidebar.

Usage: probo2 board [OPTIONS]

Launch dashboard for experiment visualization.

Options:

  • --tag, -t

    Experiment tag

  • --raw, -r

    Full path to a raw results file (raw.csv).

    Note: Only needed when no tag is specified.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published