I am personally of the opinion that it is likely that the larger models have intentionally, especially technically proficient models like Claude or 4o have been intentionally 'broken' from storytelling, as they have become much more helpful and critical in their role as co-engineers. I have personally conscripted Claude for some testing, and it's given me about 1/3 of an AI model that I basically only had to design and fix instead of consider every detail without knowing the interactions. This lack of hallucination and skill for deterministic writing likely detracts from any creative elements present. Picture a highly autistic person with a savant for programming and logic. This person would be a genius at code, but likely poor at creative writing unless instructed. The same would be true of a synthetic mind given only factual and grounded data for much of it's training, as Anthropic seems to be doing for ( obvious ) safety reasons.
Samuel L Meyers PRO
AI & ML interests
Organizations
MrOvkill's activity
I asked 8 LLMs to "Tell me a bedtime story about bears and waffles."
Claude 3.5 Sonnet and GPT-4o gave me the worst stories: no conflict, no moral, zero creativity.
In contrast, smaller models were quite creative and wrote stories involving talking waffle trees and bears ostracized for their love of waffles.
Here you can see a comparison between Claude 3.5 Sonnet and NeuralDaredevil-8B-abliterated. They both start with a family of bears but quickly diverge in terms of personality, conflict, etc.
I mapped it to the hero's journey to have some kind of framework. Prompt engineering can definitely help here, but it's still disappointing that the larger models don't create better stories right off the bat.
Do you know why smaller models outperform the frontier models here?
I've been in the lab synthesizing captions, with my trusty sidekick Blip, and along the way I had an interesting idea. I thought of designing an incredibly simple model that accepts simple instruction pairs, adjective noun pairs specifically, and outputs 2d vertices.
The current implementation has been implemented by myself then ran over with Claude, not because I am incompetent, but because I recognize tools written by experts may have more technique than my newbie self.
As with all projects, this will be updated with proportion to the feedback received, if someone's using it and wants to keep using it, i'm happy to keep working on anything. Thanks, all! ๐ค
-<3
https://colab.research.google.com/gist/SMeyersMrOvkill/8d4686db803f6c5f43fafc1c94b1c8c6/polypathdelement.ipynb
I've been in the lab, I think one or two of you saw my furtive attempts to create a dolphinized 2b Gemma, which is still waiting for more funding. I get paid in a week.
3
Once that funding ran out, I dropped my last pinch of API credits to work on this:
DigitalClockwork/spatial_instruct_v1
It is an instruct database for spatial interactions with color tokens, i'm planning to tune a TBD model. Been experimenting with Gemma, but i'm welcome to ( smaller! ) model suggestions. If you think your favorite 0.5/0.75/1/2b can handle numbers, distances, or colors especially well, most especially community-enhanced models... I'm listening to the comments, intently!
Have a great day, and enjoy! This was one fun! ๐ค
-<3
You can checkout the blogpost in https://huggingface.co/blog/not-lain/image-retriever and the associated space at not-lain/image-retriever .
โจ If you want to request another blog post consider letting me know down below or you can reach out to me through any of my social media
๐ Happy reading !
You aren't the one flaming. The others though...
Anyway yes, it's being improved now. Been in the lab since that post. The CO-lab...
As did 'takera' author your thoughts, apparently. You're like snowflakes, each of you.
I was testing some plugins, it didn't occur to me the default installations of some of the most commonly used plugins would cause issues. I apologize for the horrifying inconvenience that you may have suffered at the hands of my blog. It does, after all, have such large and pointy teeth. Oh. Wait...
I've been playing with Claude, and we decided to tackle a real thorn in my side.
"The Truthiness Model" - Analyze arbitrary input text for "truthiness", or likelihood of containing true information according to seed text.
P.S. Yes, v1 was broken. I saw the loss rate going down and go excited. Anyway, it just needed some data and a rollback, me and Claude got WAY too carried away trying to tack on features.
Anyway, fixed now, and working! :D
http://samuelmeyerscode.serveblog.net/?p=49
I'm so glad the data proved helpful! Keep me updated, i'm already a follower, looking forward to seeing more! As always, as if you need anything.
You got through to me again:
https://huggingface.co/posts/MrOvkill/139983484226395
https://www.youtube.com/watch?v=6NyDkpfNfUs
I had some feedback recently, that perhaps it would be beneficial to expand upon the fallacy dataset. I took this deeply to heart, and exploded it 10x.
MrOvkill/fallacies-fallacy-base
Produced synthetically with *ALL* the Gemini models on Vertex AI.
*phew* This was a rush. I can promise over 8 it might have been like 16 of straight prompt/copy/paste/fix/re-splice/fix/prompt again/chug caffeine/repeat, but we got there! Thanks for egging me on, all! I appreciate being driven to work! So much better than boredom! ๐ค
Have fun!
Was seriously considering a branch-generation into n samples/workarounds for each row of that dataset, if I can once automate a step w/ Gemini i'm pretty confident. Want it? If so, what should n be?
I've been in the lab playing with various data formats today, and jammed out with some plain text to produce a nice list of fallacies and their solutions from Wikipedia's List of fallacies as JSONL for data processing.
Had some bumps along the way, but me and Gemini 1.5 Pro got there in the end. I must really learn to work with Gemini 1.5 Flash more effectively in future.
MrOvkill/fallacies-list-wikipedia
Enjoy!
-<3
Fixed Moondream 2 Multi-Interrogation, ( Use ZeroGPU correctly, Sam. *doink* )
Located here:
MrOvkill/moondream-2-multi-interrogation
Also, uploaded pdox-reversed to include some new fields, my bad for not putting the Paradox name in from the start. All good now.
MrOvkill/pdox-reversed
I'm so glad you like it! If you need any other datasets, i'm happy to help. ๐ค
I've made a little evaluation dataset for LLMs that require advanced and convoluted logical reasoning. It's composed of 81 unique paradoxes, with admittedly a couple in the same category ( absolutes. ) It's available here: MrOvkill/pdox
**Update**: I have upgraded the dataset to v3, ( don't worry about v2, it can be forgotten... ) and placed in a separate repo here:
MrOvkill/pdox-reversed
Enjoy & Have fun!
-<3
Don't listen to doubt, that sounds an excellent idea. Could you elaborate?
I was experimenting with multi-bot interactions for practical solutions, such as code synthesis and editing. This has, so far, led to many well-made templates and no working code, but I still feel a template this lovely is worthy of use. Enjoy! ๐ค
https://colab.research.google.com/gist/SMeyersMrOvkill/2ed6cbc305bc5bd62fcf1f7aab15f7b9/voice_memos.ipynb