This project demonstrates a simple language model using the GPT architecture implemented with PyTorch. It includes functionality for training, evaluating, and generating text based on the transformer-based model.
- Self-Attention and Multi-Head Attention implementation
- Customizable training parameters through command line arguments
- Parallel processing for text extraction and vocabulary update
- Text generation using the trained model
To run this project, you will need:
- Python 3.8+
- PyTorch
- tqdm
- lzma
- concurrent.futures
Clone this repository to your local machine:
git clone [repository-url]
Install the required Python packages:
pip install torch tqdm
-
Setting up the Dataset:
- Place your
.xz
compressed text files in the specified data directory. - Modify the
data_directory
variable in the script to point to your directory of.xz
files.
- Place your
-
Training the Model:
- Run the training script with the required batch size:
python model_script.py -batch_size [your-batch-size]
-
Generating Text:
- Use the interactive prompt to generate text:
python model_script.py -batch_size [your-batch-size]
Edit the parameters in the script to customize the model and training process:
batch_size
block_size
max_steps
learning_rate
num_heads
num_layers
# learn_llm