Spaces:
Running
Running
wrong output?
#2
by
ceyda
- opened
BLEURT's output is said to be between 0 and 1(approx) but I get negative scores too like: {'scores': [-0.9087899327278137, -0.6429446339607239]}
see: https://imgur.com/KIaqIPj
?
Here are other examples of scores outside the range [0, 1]:
from datasets import load_metric
bleurt = load_metric("bleurt", module_type="metric", checkpoint="bleurt-large-512")
>>> bleurt.compute(references=["this is a test"], predictions=["this is a test"])
{'scores': [1.0118293762207031]}
>>> bleurt.compute(references=["this is a test"], predictions=["this is a boat"])
{'scores': [-1.3691496849060059]}
I think scores slightly above 1 below 0 are expected (see Interpreting BLEURT Scores) but a score of -1.4
seems like an error.