NormVio Dataset

This repository contains the NormVio dataset, originally introduced in the publication:

Analyzing Norm Violations in Live-Stream Chat

Citation

Please cite the original publication if you use the dataset in your research:

@inproceedings{moon-etal-2023-analyzing,
    title = "Analyzing Norm Violations in Live-Stream Chat",
    author = "Moon, Jihyung  and
      Lee, Dong-Ho  and
      Cho, Hyundong  and
      Jin, Woojeong  and
      Park, Chan  and
      Kim, Minwoo  and
      May, Jonathan  and
      Pujara, Jay  and
      Park, Sungjoon",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.55/",
    doi = "10.18653/v1/2023.emnlp-main.55",
    pages = "852--868",
    abstract = "Toxic language, such as hate speech, can deter users from participating in online communities and enjoying popular platforms. Previous approaches to detecting toxic language and norm violations have been primarily concerned with conversations from online forums and social media, such as Reddit and Twitter. These approaches are less effective when applied to conversations on live-streaming platforms, such as Twitch and YouTube Live, as each comment is only visible for a limited time and lacks a thread structure that establishes its relationship with other comments. In this work, we share the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms. We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch. We articulate several facets of live-stream data that differ from other forums, and demonstrate that existing models perform poorly in this setting. By conducting a user study, we identify the informational context humans use in live-stream moderation, and train models leveraging context to identify norm violations. Our results show that appropriate contextual information can boost moderation performance by 35{\%}."
}

Usage

To access the dataset, please follow instructions provided by the authors or maintainers. Ensure proper ethical considerations and privacy standards when handling the data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
original_data		original_data
.gitignore		.gitignore
README.md		README.md
acl_data.jsonl		acl_data.jsonl
data_statistics.ipynb		data_statistics.ipynb
label_consolidation.py		label_consolidation.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NormVio Dataset

Citation

Usage

About

Releases

Packages

Contributors 2

Languages

softly-ai/NormVio

Folders and files

Latest commit

History

Repository files navigation

NormVio Dataset

Citation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages