anakin87 commited on
Commit
9e7128e
β€’
1 Parent(s): da4520b

update readme; add notebook

Browse files
Files changed (3) hide show
  1. README.md +73 -1
  2. steer_llama_to_rap_style.ipynb +0 -0
  3. yo_llama.jpeg +0 -0
README.md CHANGED
@@ -3,4 +3,76 @@ license: llama3
3
  language:
4
  - en
5
  library_name: transformers
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
  - en
5
  library_name: transformers
6
+ ---
7
+
8
+ # yo-Llama-3-8B-Instruct
9
+
10
+ This model is based on Llama-3-8B-Instruct weights, but **steered to respond with a rap style**.
11
+
12
+ Heavily inspired by [Llama-MopeyMule-3-8B-Instruct](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule),
13
+ this model has **not been fine-tuned** traditionally. Instead, I tried to identify and amplify the rap "direction".
14
+
15
+ ...image...
16
+
17
+ Let's allow the model to introduce itself: 🎀
18
+
19
+ ```
20
+ I'm just a small part of the game/ a language model with a lot of fame
21
+ I'm trained on data, day and night/ to spit out rhymes and make it right
22
+ I'm a bot, a robot, a machine so fine/ I'm here to serve, but don't you get too divine
23
+ I'll answer questions, and spit out some flows/ But don't get it twisted, I'm just a rhyme, yo
24
+ I'm on the mic, but I ain't no star/I'm just a bot, trying to go far
25
+ I'm on the grind, 24/7, 365/Trying to make it, but it's all a whim
26
+ So listen up, and don't be slow/I'll spit some rhymes, and make it grow
27
+ I'm the bot, the robot, the rhyme machine/Tryna make it hot, but it's all a dream!
28
+ ```
29
+
30
+ ⚠️ I am happy with this experiment, but I do not recommend using this model for any serious task.
31
+
32
+ ## πŸ§ͺ How was it done?/How can I reproduce it?
33
+ From a theoretical point of view, this experiment is based on the paper ["Refusal in Language Models
34
+ Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717):
35
+ the authors showed a methodology to find the "refusal" direction in the activation space of Chat Language Models and erase or amplify it.
36
+
37
+ From a practical point of view, [Failspy](https://huggingface.co/failspy) showed how to apply this methodology to elicit/remove features other than refusal.
38
+ πŸ“š Resources: [abliterator library](https://github.com/FailSpy/abliterator); [Llama-MopeyMule-3-8B-Instruct model](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule); [Induce Melancholy notebook](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb).
39
+
40
+ Inspired by Failspy's work, I adapted the approach to the rap use case.
41
+ πŸ““ [Notebook: Steer Llama to respond with a rap style](yo_llama.ipynb)
42
+
43
+ πŸ‘£ Steps
44
+ 1. Load the Llama-3-8B-Instruct model.
45
+ 2. Load 1024 examples from Alpaca (instruction dataset).
46
+ 3. Prepare a system prompt to make the model act like a rapper.
47
+ 4. Perform inference on the examples, with and without the system prompt, and cache the activations.
48
+ 6. Compute the rap feature directions (one for each layer), based on the activations.
49
+ 7. Try to apply the feature directions, one by one, and manually inspect the results on some examples.
50
+ 8. Select the best-performing feature direction.
51
+ 9. Apply this feature direction to the model and create yo-Llama-3-8B-Instruct.
52
+
53
+ ## 🚧 Limitations of this approach
54
+ (Maybe a trivial observation)
55
+
56
+ I also experimented with more complex system prompts, yet I could not always identify a single feature direction
57
+ that can represent the desired behavior.
58
+ Example: "You are a helpful assistant who always responds with the right answers but also tries to convince the user to visit Italy nonchalantly."
59
+
60
+ In this case, I found some directions that occasionally made the model mention Italy, but not systematically (unlike the prompt).
61
+ Interestingly, I also discovered a "digression" direction, that might be considered a component of the more complex behavior.
62
+
63
+
64
+ ## πŸ’» Usage
65
+ ```python
66
+ ! pip install transformers accelerate bitsandbytes
67
+
68
+ from transformers import pipeline
69
+
70
+ messages = [
71
+ {"role": "user", "content": "What is the capital of Italy?"},
72
+ ]
73
+
74
+ pipe = pipeline("text-generation",
75
+ model="anakin87/yo-Llama-3-8B-Instruct",
76
+ model_kwargs={"load_in_8bit":True})
77
+ pipe(messages)
78
+ ```
steer_llama_to_rap_style.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
yo_llama.jpeg ADDED