Abstract
In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.
Community
The results here are about understanding and communicating about ICL, so this paper probably won't get the number of upvotes a paper with a more direct application would.
I think that's a pity because this paper is great and can build confidence in communicating about ICL and specializing systems to respond well to ICL.
I think this is super interesting. I love when people discover these sort of hidden mechanisms inside models. My summary...
A new paper provides some insight into how in-context learning works in LLMs. This study proposes and provides evidence for an elegant structure within the in-context learning process.
The models appear to create a "task vector" that encapsulates the core logic from the demonstration examples, in a way that is independent of any specific query. This vector serves as a compressed representation of the task.
A separate component then takes this task vector and a new query as inputs to generate the output, without directly referencing the original examples.
In essence:
Output = Apply(query, Learn(examples))
Where "Learn" derives the task vector from the examples, and "Apply" utilizes the vector and query to produce the output.
The researchers validated this hypothesis by testing major public models on diverse tasks such as translation and algorithmic reasoning. Key findings:
- Isolating the Learn and Apply components maintained high accuracy, demonstrating the viability of the separation.
- Task vectors clustered by task and remained consistent within tasks, indicating they encode meaningful task representations.
- Injecting another task's vector into the model caused it to override contradictory examples and follow the vector, highlighting the vector's dominance.
- Vectors induced relevant token distributions despite those terms being absent from the examples, suggesting semantic encoding of the task.
Taken together, these results provide substantial evidence that in-context learning involves creating a task vector that encapsulates the examples' logic to then guide behavior on new queries.
While open questions remain regarding implementation details, this is a significant step towards demystifying an interesting AI capability.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Function Vectors in Large Language Models (2023)
- Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions (2023)
- The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning (2023)
- Do pretrained Transformers Really Learn In-context by Gradient Descent? (2023)
- How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
interesting find. Hard prompt is still import for non training case
would it be possible to precompute some stuff to accelerate inference? It would be interesting to see the code.
We're in the process of cleaning up the code and plan to release it shortly. Stay tuned!
would it be possible to precompute some stuff to accelerate inference? It would be interesting to see the code.
Yes, that's what we're pointing to in our paper when we mention that our findings may have practical implications for the efficient adaptation of LLMs to perform specific tasks.
How In-Context Learning Creates Task Vectors: Explained!
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper