johngiorgi
commited on
Commit
•
8d0c385
1
Parent(s):
d813e97
call tokenizer directly
Browse filesThe instructions call the tokenizer from `self.tokenizer`. Looks like this is some kind of copy/paste error. Simple fix is to call the tokenizer that was initialized directly
README.md
CHANGED
@@ -123,7 +123,7 @@ papers = [{'title': 'BERT', 'abstract': 'We introduce a new language representat
|
|
123 |
# concatenate title and abstract
|
124 |
text_batch = [d['title'] + tokenizer.sep_token + (d.get('abstract') or '') for d in papers]
|
125 |
# preprocess the input
|
126 |
-
inputs =
|
127 |
return_tensors="pt", return_token_type_ids=False, max_length=512)
|
128 |
output = model(**inputs)
|
129 |
# take the first token in the batch as the embedding
|
|
|
123 |
# concatenate title and abstract
|
124 |
text_batch = [d['title'] + tokenizer.sep_token + (d.get('abstract') or '') for d in papers]
|
125 |
# preprocess the input
|
126 |
+
inputs = tokenizer(text_batch, padding=True, truncation=True,
|
127 |
return_tensors="pt", return_token_type_ids=False, max_length=512)
|
128 |
output = model(**inputs)
|
129 |
# take the first token in the batch as the embedding
|