Dataset-Tools 's Collections

Dataset Creation

Spaces and utilities for creating datasets and getting them on the Hub


  • Note This Space extracts embeeded text from PDFs and pushes the resulting text to a Hugging Face Hub dataset


  • Note This Spaces will convert a PDF(s) to a set of images per page and optionally push the images to a Hugging Face Dataset. Can be useful to help generate an initial dataset for annotation or further processing.


  • Note Corpus Creator is a tool for transforming a collection of text files into a Hugging Face dataset, perfect for various natural language processing (NLP) tasks. Whether you're preparing data for synthetic generation, building pipelines, or setting up annotation tasks, this app simplifies the process.