Venus 120b - version 1.0
Overview
The goal was to create a large model that's highly capable for RP/ERP scenarios. Goliath-120b is excellent for roleplay, and Venus-120b was created with the idea of attempting to mix more than two models together to see how well this method works.
Model Details
- A result of interleaving layers of Sao10K/Euryale-1.3-L2-70B, NousResearch/Nous-Hermes-Llama2-70b, and migtissera/SynthIA-70B-v1.5 using mergekit.
- The resulting model has 140 layers and approximately 122 billion parameters.
- See mergekit-config.yml for details on the merge method used.
- See the
exl2-*
branches for exllama2 quantizations. The 4.85 bpw quant should fit in 80GB VRAM, and the 3.0 bpw quant should (just barely) fit in 48GB VRAM with 4k context. - Inspired by Goliath-120b
Warning: This model will produce NSFW content!
Results
Initial tests show that Venus-120b functions fine, overall it seems to be comparable to Goliath-120b. Some differences I noticed:
- Venus needs lower temperature settings than Goliath. I recommend a temp of around 0.7, and no higher than 1.0.
- Venus tends to, on average, produce longer responses than Goliath. Probably due to the inclusion of SynthIA in the merge, which is trained to produce long chain-of-thought responses.
- Venus seems to be a bit less creative than Goliath when it comes to the prose it generates. Probably due to the lack of Xwin and the inclusion of Nous-Hermes.
Keep in mind this is all anecdotal from some basic tests. The key takeaway is that Venus shows that Goliath is not a fluke.
Other quants:
- 4.5 bpw exl2 quant provided by Panchovix: https://huggingface.co/Panchovix/Venus-120b-v1.0-4.5bpw-h6-exl2
- 4.25 bpw exl2 quant provided by Panchovix: https://huggingface.co/Panchovix/Venus-120b-v1.0-4.25bpw-h6-exl2
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.