anyGPT
anyGPT is a general purpose library for training any type of GPT model. Support for gpt-1, gpt-2, and gpt-3 models. Inspired by nanoGPT by Andrej Karpathy, the goal of this project is to provide tools for the training and usage of GPT style large language models. The aim is to provide a tool that is
- production ready
- easily configurable
- scalable
- free and open-source
- accessible by general software engineers and enthusiasts
- easily reproducible and deployable
You don't need a Ph.D. in Machine Learning or Natural Language Processing to use anyGPT.
Installation
NOTE: It is recommended that you set up of a python virtual environment using mamba, conda, or poetry. To install anyGPT:
Using Docker
The Docker image supports GPU passthrough for training and inference. In order to enable GPU passthrough please follow the guide for installing the NVidia Container Toolkit for your OS.
NOTE On Windows you need to follow the guide to get NVidia Container Toolkit setup on WSL2. Docker WSL2 Backend is required.
Once NVidia Container Toolkit and Docker is setup correctly, build the Docker image
Use the following command to login to the container interactively, and use anygpt as if it was on your local host
Mounting Volumes
It is recommended to mount a local directory into your container in order to share data between your local host and the container. This will allow you to save trained checkpoints, reuse datasets between runs and more.
The above example mounts /path/to/local/dir
to the /data
directory in the container, and all data and changes are shared between them dynamically.
Non interactive Docker
The above documentation explains how to run a Docker container with an interactive session of anyGPT. You can also run anyGPT commands to completion using Docker by overriding the entrypoint
$ docker run --gpus=all -v /path/to/your/data:/data --entrypoint anygpt-run -it anygpt /data/test.pt "hello world"
The above command runs anygpt-run
with the parameters /data/test.pt "hello world"
Dependencies
- torch >= 2.0.0
- numpy
- transformers
- datasets
- tiktoken
- wandb
- tqdm
- PyYAML
- lightning
- tensorboard
Features
Current
- CLI and config file driven GPT training
- Supports CPU, GPU, TPU, IPU, and HPU
- Distributed training strategies for training at scale
- Easy spin up using Docker
- FastAPI end-points for containerized microservice deployment
- HuggingFace integration
- Load pre-trained gpt models
Roadmap
- Documentation
- HuggingFace integration
- push to hub
- Easy spin VM spinup/getting started with
- Downloading of pre-trained models
- Gradio ChatGPT style interface for testing and experimentation
- Fine-tuning of pre-trained models
- Reinforcement Learning from Human Feedback and Rules Base Reward Modeling for LLM alignment
- More dataformat support beyond hosted text files *
Usage
Data Preparation
Training
Create a config file. In this example, I'll call it gpt-2-30M.yaml
. You can also check out the example configuration files.
model_config:
name: 'gpt-2-30M'
block_size: 256
dropout: 0.2
embedding_size: 384
num_heads: 6
num_layers: 6
training_config:
learning_rate: 1.0e-3
batch_size: 8
accumulate_gradients: 8
beta2: 0.99
min_lr: 1.0e-4
max_steps: 5000
val_check_interval: 200
limit_val_batches: 100
io_config:
experiment_name: 'gpt-2'
dataset: 'shakespeare_complete'
Inference
$ anygpt-run results/gpt-2-pretrain/version_0/checkpoints/last.pt \
"JAQUES.
All the world’s a stage,
And all the men and women merely players;
They have their exits and their entrances,
And one man in his time plays many parts,"
Inference Microservice
anyGPT supports running models as a hosted microservice with a singular endpoint for inference.
To launch the microservice, use the anygpt-serve
entrypoint.
Commandline Options
$ anygpt-serve -h
usage: anyGPT inference service [-h] [--port PORT] [--log-level LOG_LEVEL] model
Loads an anyGPT model and hosts it on a simple microservice that can run inference over the network.
positional arguments:
model Path t0 the trained model checkpoint to load
options:
-h, --help show this help message and exit
--port PORT Port to start the microservice on (default: 5000)
--host HOST Host to bind microservice to (default: 127.0.0.1)
--log-level LOG_LEVEL
uvicorn log level (default: info)
Example
Sending Requests
anygpt-serve
uses FastAPI to serve the microservice.
To see the available microservice api go to the /docs
endpoint in your browser once the microservice is started
Spinning up Microservice in Docker
$ docker run --gpus=all -v /path/to/your/data:/data -p 5000:5000 --entrypoint anygpt-serve -it anygpt /data/test.pt --port 5000 --host 0.0.0.0 --log-level info
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:5000 (Press CTRL+C to quit)
Documentation
Limitations
TBD
License
The goal of this project is to enable organizations, both large and small, to train and use GPT style Large Language Models. I believe the future is open-source, with people and organizations being able to train from scratch or fine-tune models and deploy to production without relying on gatekeepers. So I'm releasing this under an MIT license for the benefit of all and in the hope that the community will find it useful.