Post
google/gemma-7b-it is super good!
I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.
I've got a notebook here, which is kind of a framework for vibe-checking LLMs.
In this notebook, I take Gemma for a spin on a variety of prompts:
• [nonsensical tokens]( harpreetsahota/diverse-token-sampler
• [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
• [summarization ability]( lighteval/summarization)
• [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
• [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)
I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.
I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.
Cheers:
You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing
I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.
I've got a notebook here, which is kind of a framework for vibe-checking LLMs.
In this notebook, I take Gemma for a spin on a variety of prompts:
• [nonsensical tokens]( harpreetsahota/diverse-token-sampler
• [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
• [summarization ability]( lighteval/summarization)
• [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
• [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)
I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.
I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.
Cheers:
You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing