Garbage output? (Q6_k)

#1
by ThijsL202 - opened

So I'm using the Q6_k quant and my output is utter garbage, I've tried a lot of settings, presets, story, and instruct strings in silly tavern but the model doesn't produce any good message. (I mainly use theia 21b v2) and compared to that model... It's night and day. Is this the model itself? (Running using koboldcpp v1.74) and context shift + context size 8192

Is it not following instructions? or is it text output issues? (garbled? broken?).
Can you give me an idea of temp / rep pen settings used?

Although this model works with 3 (or more templates), Alpaca seems most stable. (unclear why).
If you can give me some output from it ; I will see what I can do.

Note: This model is composed of fine tuned models, it is not a fine tune itself. Theia is a "ground up" fine tune.

You may want to try:
https://huggingface.co/DavidAU/L3-Dark-Planet-8B
or
https://huggingface.co/DavidAU/L3-Dark-Planet-8B-V2-Eight-Orbs-Of-Power-GGUF

As these models are composed differently, and retain all function of their "parents" so to speak.
They do not have the same level of "character" as "Kaboom" (and Dark Horror series), but they may be better for RP ; and have 8k context windows.

Will gather some examples.

Owner

I re-read your comment; try turning "content shift" off.
This model has a 131k context window.

Will try!

Since I'm horrible at explaining things myself, here's what Chatgpt made for me;

Below are some examples:

Character Used:

Parameters:

  • Preset: universal_light + 120 Response (tokens) 8k context
  • Story string template: ChatML + enabled Trim Incomplete Sentences
  • Instruct template: turned off

Model Loading Info:

  • Koboldcpp Version: 1.74
  • Context size: 8192
  • Context shift: turned off
  • Flash attention: enabled
  • GPU layers: 200
  • Use QuantMatMul: enabled
  • BLAS batch size: 1024

Example Output:

Example Output:

Excerpt 1:
"Our blades locked together with a resounding clang, sparks showering forth like falling stars. Our eyes locked, inches apart, nos鼻e to nose."

Excerpt 2:
"You speak of greater good yet you've only brought suffering and despair!" I growled back, my arm shaking with the strain of holding back her might. "The price you speak of, is one I'll never accept! I'll see you burn, witch!"

Issues:
1. Japanese kanji inserted into English words: For example, "nos鼻e" instead of "nose." This also happens with words like "lick舐."
2. Missing possessive pronouns: The model doesn't handle abbreviations or possessives correctly. For example, "breath hot on face" instead of "her breath hot on his face."

These are the issues I had first, but it's waaaay more rare to happen right now.
I have no idea why the few times all output was really, really bad, but it's much better now. It can't replicate as easily, the Japanese characters sometimes (very rare) show up but a swipe will take care of it.

Thanks for the merge!

Thank you ; this helps a lot.
The token issue can happen with these types of merges.

However, try turning off flash attention ; sometimes it works great other times it causes issues.
Pronouns issue is odd. I will test and see what I can replicate here.

Thanks again ;

I fixed the possessive pronouns issue by disabling Exclude Top Choices (XTC)
So the only issue I really have is the model being... Less coherent, it tends to talk with lots of "slang" and is a bit to the angry/bad side when mimicking a personality.

Sign up or log in to comment