TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Abstract
Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TableLLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance with GPT-3.5. Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands, where the most advanced model, GPT-4, achieves only a modest score compared to humans.
Community
TableBench encompasses 18 fields across four primary categories of table question answering (TableQA) tasks. Extensive experimentation on TableBench reveals that both nearly 30 open-source and proprietary LLMs still have considerable room for improvement to meet the demands of real-world applications. Even the most advanced model, GPT-4, achieves only a moderate score compared to human performance.
Main Page: https://tablebench.github.io/
Leaderboard: https://tablebench.github.io/leaderboard.html
TableBench: https://huggingface.co/datasets/Multilingual-Multimodal-NLP/TableBench
TableInstruct: https://huggingface.co/datasets/Multilingual-Multimodal-NLP/TableInstruct
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ALTER: Augmentation for Large-Table-Based Reasoning (2024)
- Synthesizing Text-to-SQL Data from Weak and Strong LLMs (2024)
- Enhancing Temporal Understanding in LLMs for Semi-structured Tables (2024)
- UniCoder: Scaling Code Large Language Model via Universal Code (2024)
- Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper