File size: 1,713 Bytes
b3632d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5364228
 
b3632d5
 
 
5364228
 
b3632d5
 
 
5364228
 
 
 
 
 
 
 
 
 
b3632d5
 
 
 
 
 
6b99536
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Image Captioning App

## Overview

This application generates descriptive captions for images using advanced ML models. It processes single images or entire directories, leveraging CLIP and LLM models for accurate and contextual captions. It has NSFW captioning support with natural language.

## Features

- Single image and batch processing
- Multiple directory support
- Custom output directory
- Adjustable batch size
- Progress tracking

## Usage

| Command | Description |
|---------|-------------|
| `python app.py image.jpg` | Process a single image |
| `python app.py /path/to/directory` | Process all images in a directory |
| `python app.py /path/to/dir1 /path/to/dir2` | Process multiple directories |
| `python app.py /path/to/dir --output /path/to/output` | Specify output directory |
| `python app.py /path/to/dir --bs 8` | Set batch size (default: 4) |

## Technical Details

- **Models**: CLIP (vision), LLM (language), custom ImageAdapter
- **Optimization**: CUDA-enabled GPU support
- **Error Handling**: Skips problematic images in batch processing

## Requirements

- Python 3.x
- PyTorch
- Transformers library
- CUDA-capable GPU (recommended)

## Installation

Windows

```bash
git clone https://huggingface.co/Wi-zz/joy-caption-pre-alpha
cd joy-caption-pre-alpha
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
```

Linux

```bash
git clone https://huggingface.co/Wi-zz/joy-caption-pre-alpha
cd joy-caption-pre-alpha
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the [MIT License](LICENSE).