@harpreetsahota on Hugging Face: "https://huggingface.co/google/gemma-7b-it is super good! I wasn't convinced…"

Post

google/gemma-7b-it is super good!

I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.

I've got a notebook here, which is kind of a framework for vibe-checking LLMs.

In this notebook, I take Gemma for a spin on a variety of prompts:
• [nonsensical tokens]( harpreetsahota/diverse-token-sampler
• [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
• [summarization ability]( lighteval/summarization)
• [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
• [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)

I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.

I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.

Cheers:

You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing