how is this fp16 when filename has q4?

by ucalyptus - opened May 13

Discussion

ucalyptus

May 13

@Xenova

Xenova

Owner May 13

As of today, WebGPU only supports fp16 and fp32 ops (int8 is coming soon). So, the model either runs in fp16 or fp32 mode. The weights, however, are quantized to q4, which are dequantized on the fly and then (in this case) ran in fp16 mode.

Hopefully that clears things up!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment