Pre-trained Web Table Embeddings
The models here represent schema terms and instance data terms in a semantic vector space making them especially useful for representing schema and class information as well as for ML tasks on tabular text data.
The code for executing and evaluating the models is located in the table-embeddings Github repository
Quick Start
You can install the table_embeddings package to encode text from tables by running the following commands:
pip install cython
pip install git+https://github.com/guenthermi/table-embeddings.git
After that you can encode text with the following Python snippet:
from table_embeddings import TableEmbeddingModel
model = TableEmbeddingModel.load_model('ddrg/web_table_embeddings_plain150')
embedding = model.get_header_vector('headline')
Model Types
Model Type | Description | Download-Links |
---|---|---|
W-tax | Model of relations between table header and table body | (64dim, 150dim) |
W-row | Model of row-wise relations in tables | (64dim, 150dim) |
W-combo | Model of row-wise relations and relations between table header and table body | (64dim, 150dim) |
W-plain | Model of row-wise relations in tables without pre-processing | (64dim, 150dim) |
More Information
For examples on how to use the models, you can take a look at the Github repository
More information can be found in the paper Pre-Trained Web Table Embeddings for Table Discovery
@inproceedings{gunther2021pre,
title={Pre-Trained Web Table Embeddings for Table Discovery},
author={G{\"u}nther, Michael and Thiele, Maik and Gonsior, Julius and Lehner, Wolfgang},
booktitle={Fourth Workshop in Exploiting AI Techniques for Data Management},
pages={24--31},
year={2021}
}