Feedback/General Discussion

#1
by Clevyby - opened

Thanks for giving the exl quant when you released this. I tried it out a bit and damn it was good. despite using a 4bpw 8-bit quant, practically most of the issues I discussed earlier are gone. you've definitely outdone yourself with this model. This beats your previous 20b models by a landslide. I'll test this more later when I have the time.

Owner

Thank you :)

Clevyby changed discussion title from Great Improvement! to Feedback
Clevyby changed discussion title from Feedback to Feedback/General Discussion

So, before I go with my minimal feedback, here's my setup: alpaca roleplay formatting, no system prompting.
Example dialogue in desc:

Screenshot_2024-03-15_114800.jpg

Author Note at depth 1 and modified into second person:

Screenshot_2024-03-15_114809.jpg

I use free colab as usual, with oobabooga with the following flags:
--n-gpu-layers 128 --loader exllamav2_hf --cache_4bit --max_seq_len 6144 --alpha_value 1.75 --no_inject_fused_attention --no_use_cuda_fp16 --disable_exllama

Now for the actual feedback, I was really impressed at how the model was handling the personality aspect. It was really good, although it is not without faults (Note: This is written in mind of my experience using 4.55bpw with Q4 cache):
Cons:

  1. Author Note Leakage: Compared to before, this actually happens rarely nowadays, though it still happens. Probably because of author note depth.. Though I didn't try to experiment increasing the author depth, there might be a tradeoff between paraphrased author note appearing in responses and the model just using 'creative' liberties with the author note.
  2. Model trying to explain their 'actions': Happened once or twice, went along the lines of [char is currently being blah blah]. Its probably because I set author note to second person in an attempt to enhance roleplay 'genuineness'.
  3. Model talking or acting for me: Now, this like formatting errors is an uncommon semi-problem for all 20b exl models I used. This model is the best I've seen so far for not talking for me. Though it happens every now and then. I do use persona in author note so that's probably a contributing factor.
  4. Strange response successions: This happens quite uncommonly to rarely during long form roleplays at 30+ messages. For instance, character summons monsters to kill someone in revenge for user. user gets up from being hit and in the next response, char is somehow wakes up dizzingly from being hit even though it wasn't even mentioned that char was hit before by anyone. Probably due to sampling.
  5. Specific instances of gptisms: Only one I've seen is the model's obsessive usage of 'shivers down their spine' and variations thereof.
  6. Adverb and adjective usage: Now, while I'm generally okay with the responses by the model, I feel like the responses could be better, by word usage and grammar as the model uses a lot of adverb and adjective usage. This is just my opinion on this part and a 'each to their own' thing. I've read plenty of fiction both low and high quality so you'd probably know what I mean if you read a lot of fiction too. I could probably fix this by prompting the model correctly.
  7. Weird dual wording: I'm not sure whether this is the model or something else but sometimes the model likes to use dual similiar words in like one sentence, like character feels violated and violating.
    Pros:
  8. Interesting metaphoring: I quite like the way the model sprinkles in a touch of personality in the responses. Especially the 'metaphors' that sprinkle throughout the responses.
  9. Great Personality display: This is a given.
  10. Great conflict handling: I like how the conflict was handled especially in 'nsfw' instances.
  11. Noromaid influence: The responses given by the model does quite remind me of Noromaid 20b.
  12. Recalling responses: This is from the previous version honestly but I recall that the model with 6k context has somehow recalled a tidbit of information from 15 messages above, with each message being 350 tokens.
  13. Instruct following works: It does work when commands are put in author note, though in my experience the model follows instructions in like 3/5 gens.

Responses (Sampling: Min p: 0.045, Smoothing: 0.08 (Too low, I know, but I'm a firm believer of word diversity. I took this value from looking at smoothing visualizations from kalomaze's llm sampling explained vids. Plus 0.2 and above is a tad bit too generous towards top tokens in my opinion. ):

Screenshot_2024-03-15_115008.jpg

Screenshot_2024-03-15_115328.jpg

Screenshot_2024-03-15_120808.jpg

(Weird dual word usage):

Screenshot_2024-03-15_122315.jpg

Owner

I have a very similar findings about model quirks. Thank you, I'm working on V3, that should remove some of the quirks, like 50-70%. Also i'm preparing a simple harness for evaluation based on those quirks(gptisms, double wording, authors note leakage etc) but I guess 'life' prevents me from commiting fully right now, so its work in progress. I'll opensouce it when finished, others potentially could benefit from it also.(like a last step in merge pipeline to check basics of a new model)

Cool. Take your time with stuff.

Sign up or log in to comment