Pixel-based Pre-training
Collection
[EMNLP'24] [Autoregressive Pre-Training on Pixels and Texts](https://arxiv.org/pdf/2404.10710).
•
6 items
•
Updated
This repository contains the official checkpoint for PixelGPT, as presented in the paper Autoregressive Pre-Training on Pixels and Texts (EMNLP 2024). For detailed instructions on how to use the model, please visit our GitHub page.
DualGPT is an autoregressive language model pre-trained on the dual modality of both pixels and texts. By processing documents as visual data (pixels), the model learns to predict both the next token and the next image patch in a sequence, enabling it to handle visually complex tasks in different modalities.
@misc{chai2024autoregressivepretrainingpixelstexts,
title = {Autoregressive Pre-Training on Pixels and Texts},
author = {Chai, Yekun and Liu, Qingyi and Xiao, Jingwu and Wang, Shuohuan and Sun, Yu and Wu, Hua},
year = {2024},
eprint = {2404.10710},
archiveprefix = {arXiv},
primaryclass = {cs.CL},
url = {https://arxiv.org/abs/2404.10710},
}