Why are Sensitive Functions Hard for Transformers?

This is the official repository of the ACL 2024 paper "Why are Sensitive Functions Hard for Transformers?".

Usage

To run the code, you need Python 3.8. Installing required packages can be done by running pip install -r requirements.txt.

To train the models analogous to what we used in the paper and track their metrics, simply run python train.py. We use Hydra for parameter configuration. With it, you can change training or model parameters in the command line, e.g.

python train.py model.num_layers=12 training.batch_size=512

All parameters with their descriptions are listed in conf/train_config.yaml.

To use Weights & Biases (turned on by default), first configure W&B credentials:

export WANDB_ENTITY=your_username
export WANDB_PROJECT=your_project
wandb login

If something doesn't work, feel free to leave a GitHub issue or contact us by email!

Replicating experiments

All our experiments are logged using Weights & Biases sweeps due to their easy configuration and scalability. The directory sweeps/ contains all the configuration files. Please refer to the Weights & Biases documentation for details.

Scaling experiment (Figures 1, 5, 6, 7, and 8 in the paper)

Run

wandb sweep sweeps/length_scaling_1.yaml
wandb sweep sweeps/length_scaling_2.yaml

wandb agent your_username/your_project/sweep_id_1
wandb agent your_username/your_project/sweep_id_2

python generate_plots.py --experiment scaling

Then, the visualizations will be saved in the images/ folder.

The first two commands will output sweep ids that are used in starting agents and generating plots.

Tradeoff experiment (Figures 2, 9, and 10 in the paper)

Same as scaling, but use the sweeps weight_norm_blowup_tradeoff_1.yaml, weight_norm_blowup_tradeoff_2.yaml, weight_norm_blowup_tradeoff_3.yaml, and weight_norm_blowup_tradeoff_4.yaml. Also use the argument --experiment tradeoff for generate_plots.py.

Dynamic experiment (Figures 3, 12, 13, and 14 in the paper)

Same as scaling, but use the sweeps dynamic_metrics_1.yaml and dynamic_metrics_2.yaml. Also use the argument --experiment dynamic for generate_plots.py.

Scratchpad experiment (Figure 11 in the paper)

Same as scaling, but use the sweep scratchpad.yaml. Also use the argument --experiment scratchpad for generate_plots.py.

Generalization experiment (Figures 4 and 15 in the paper)

Run the sweeps generalization_first_step_1.yaml and generalization_first_step_2.yaml.
Run python generalization_save_predictions.py.
Run the sweeps generalization_second_step_1.yaml and generalization_second_step_2.yaml.
Run python generalization_add_train_sharpness.py.
Run python generate_plots.py --experiment generalization.

Citation

@article{Hahn2024WhyAS,
  title={Why are Sensitive Functions Hard for Transformers?},
  author={Michael Hahn and Mark Rofin},
  journal={ArXiv},
  year={2024},
  volume={abs/2402.09963}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Why are Sensitive Functions Hard for Transformers?

Usage

Replicating experiments

Scaling experiment (Figures 1, 5, 6, 7, and 8 in the paper)

Tradeoff experiment (Figures 2, 9, and 10 in the paper)

Dynamic experiment (Figures 3, 12, 13, and 14 in the paper)

Scratchpad experiment (Figure 11 in the paper)

Generalization experiment (Figures 4 and 15 in the paper)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
images		images
saved_functions		saved_functions
src		src
sweeps		sweeps
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generalization_add_train_sharpness.py		generalization_add_train_sharpness.py
generalization_save_predictions.py		generalization_save_predictions.py
generate_plots.py		generate_plots.py
requirements.txt		requirements.txt
train.py		train.py

License

lacoco-lab/sensitivity-hardness

Folders and files

Latest commit

History

Repository files navigation

Why are Sensitive Functions Hard for Transformers?

Usage

Replicating experiments

Scaling experiment (Figures 1, 5, 6, 7, and 8 in the paper)

Tradeoff experiment (Figures 2, 9, and 10 in the paper)

Dynamic experiment (Figures 3, 12, 13, and 14 in the paper)

Scratchpad experiment (Figure 11 in the paper)

Generalization experiment (Figures 4 and 15 in the paper)

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages