PWLDS is a public dataset consisting of over 4 million passwords with varying assigned strength levels. The dataset was designed to help researchers, security professionals, and developers analyze password strength and build more secure systems. The dataset contains 5 classes:
very_weak
: Represented by0
in the datasetweak
: Represented by1
in the datasetaverage
: Represented by2
in the datasetstrong
: Represented by3
in the datasetvery_strong
: Represented by4
in the dataset
For strength level 4
, we used Python's secrets
module to generate cryptographically secure passwords, ensuring the robustness and security of these passwords.
- Security Analysis: PWLDS can be used to study common password patterns and weaknesses, helping to identify vulnerabilities and improve password policies.
- Machine Learning: The dataset can be used to train machine learning models for password strength estimation or prediction.
- Educational Purposes: This dataset is valuable for educational projects and demonstrations related to cybersecurity and data science.
- Benchmarking: PWLDS provides a large, labeled dataset for benchmarking password strength estimation algorithms and tools.
- No Real-World Data: The dataset is synthetically generated and does not contain real user passwords. While this avoids privacy concerns, it may not fully represent real-world password distribution and usage patterns.
- Bias in Password Generation: Since passwords are generated based on predefined rules and patterns, there may be biases that do not reflect the diversity of passwords used in real-life scenarios.
- Exclusivity to English Words: The weak and very weak password categories rely heavily on English words, which may not be representative of non-English-speaking populations.
- Download the Dataset: You can download the dataset from GitHub or Hugging Face.
- Load the Dataset: Use Python, R, or any data processing tool to load and analyze the dataset. The file is in CSV format for easy use.
- Data Structure: Each entry in the dataset includes a password and its associated strength level (0-4), labels for columns are respectively:
Password
andStrength_Level
.
When using this dataset, please cite as follows:
Dataset Title: Password Weakness and Level Dataset (PWLDS)
Author: Infinitode Pty Ltd
Date: 2024
Source: https://github.com/Infinitode/PWLDS
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See the LICENSE file for more details.
For questions or suggestions, please contact us through our website or open an issue on the project's GitHub repository.