Llama 3 coping mechanisms - Part 4

#9
by Lewdiculous - opened
LWDCLS Research org
β€’
edited May 3

The upcoming season. Now scattered across 4 different streaming services for your displeasure. Apologies if this tangents too hard.

This is a direct Part 4 continuation of Part 3 in this thread.

Lewdiculous pinned discussion

ε‡Έ(βŠ™β–‚βŠ™βœ– )

I was doing my "I use arch BTW" how could you.

LWDCLS Research org

BTW there is supposed to be a way to setup arch in wsl: https://wsldl-pg.github.io/ArchW-docs/How-to-Setup/

But I've never tried this. You may also use arch via docker and enable all of the virtualization stuff there

@ABX-AI - This is a bit finicky but it works, you can install WSL Arch like that.

I'm good on windows 10 tbh, is there any real benefit of going into linux at the moment?

BTW there is supposed to be a way to setup arch in wsl: https://wsldl-pg.github.io/ArchW-docs/How-to-Setup/

But I've never tried this. You may also use arch via docker and enable all of the virtualization stuff there

@ABX-AI - This is a bit finicky but it works, you can install WSL Arch like that.

I barely handle Ubuntu, I'd go insane trying to setup a non included distro 😭

For a normal user not really, linux desktop is transitioning to Wayland. And some things just don't work perfectly yet.

LWDCLS Research org

For WSL2, this:

https://github.com/bostrot/wsl2-distro-manager

Is pretty convenient, you can install many docker distros.

Im preoccupied with xtts-v2 training on the Baldur's gate 3 narrator for memes.

Available here: https://huggingface.co/Nitral-AI/XTTS-V2-BG3NV-FT-ST

Now i crawl back into bed and sleep.

I-Dont-Get-It.png

I think I get it...

yeah-I-dont-get-it.png

Claude 3 Sonnet can't even get it right, that's why I think it's weird a 9b can get it right 😭
Screenshot_20240520-180103.png

@ABX-AI have to tag you, look how simple & good the reasoning is @_@ (for a little bit)
Dolphin Yi-9B

image.png
Then it goes insane :3
image.png
Edit - Base Yi-9B-Chat gets it right every time, suspiciously well, like 10 out of 10 times

image.png

I'm quite happy with the new Hermes Theta, actually. It runs giga-fast in LMS (50t/s at Q5_K_M), and consistently answers this even on regeneration of response.
Answered correctly 7/10 times, which is not bad.

GPT3.5 gets this wrong all the time as well, and I've basically only seen models at the level of GPT4 that get it right every time, anything below is likely to fail at least a few times out of 10.

I'm quite happy with the new Hermes Theta, actually. It runs giga-fast in LMS (50t/s at Q5_K_M), and consistently answers this even on regeneration of response.
Answered correctly 7/10 times, which is not bad.

GPT3.5 gets this wrong all the time as well, and I've basically only seen models at the level of GPT4 that get it right every time, anything below is likely to fail at least a few times out of 10.

I've been messing around with Theta too, it's impressive for its size and when I run it through koboldcpp I can see 100t/s most of the time when context is below 4k
I pair it with Maid because it has a native app for android with openai API support.
An interesting new model that popped up when I was messing about in lmsys was glm-4(closed source)
It has flown under the radar (arXiv pages have been appearing since January) but it can answer the weight question right every time and has coding abilities similar to GPT-4

Zhipu AI Unveils GLM-4: A Next-Generation Foundation Model on Par with GPT-4
I'm waiting to see how it scores on the leaderboard.

When it's closed source, "on par with gpt-4" is not that interesting at the end of the day, especially now that 4-o is out + free

The failed reasoning in my tests with a 7B seem to revolve around determining that steel is denser than feathers, and then halting there rather than chaining in conversions.

I stumbled onto the fact that this model that I released with little notice a couple of months back recently got quanted by two of the current high volume quanters. I have no idea how this happened, but this was a few days after someone came across my post about it and noted that it was a good model? This was a merge where I took a successful merge and then remerged it with a higher benching model, so this appears to support the meta about merging in reasoning, which I will apply to some eventual L3 merges.
https://huggingface.co/grimjim/kunoichi-lemon-royale-v2-32K-7B

I'd been sitting on another 7B merge, and finally got around to releasing it. Starling was never meant to be an RP model, but it seems to have helped in conjunction with Mistral v0.2.
https://huggingface.co/grimjim/cuckoo-starling-32k-7B

LWDCLS Research org
β€’
edited May 21

LLM coping mechanisms - Part 5

Looooong maaaaaan!

Lewdiculous changed discussion status to closed
Lewdiculous unpinned discussion

Sign up or log in to comment