Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Abstract
Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions.
Community
Promising work!
I have one question everyone probably has, What is the exact difference between this work and "Chain of Code"?
Thank you for your interest in our work! That's a very good point. The main difference between our work and Chain-of-Code is that we incorporate task-level pseudocode that can be widely applied to instances of a task, while Chain-of-Code uses pseudocodes that are specialized to each instance. We would also like to highlight that the use of task-level pseudocode enables it to be transferred to smaller LMs, as we only have to use an LLM once for a single task in generating a task-level pseudocode, and it is further tailored to each instance by a smaller LM. Additionally, we provide a detailed comparison with Chain-of-Code in our experiments and the related work section. Please let me know if you have any further questions!
Using CoC, could you please provide a prompt that solves this problem:
"Kayley has three brothers. Each of her brothers has two sisters. How many sisters does Kayley have?"
@hyungjoochae - Could you please add the above problem to your Github as well.
@Shure-Dev
If you can also provide a CoC example for this, that would be great as well.
""Read an given input text and answer the question in the
input text with "Yes" or "No".
Input Text:
Vina tells the truth. Helene says Vina lies. Kandi says
Helene tells the truth. Jamey says Kandi lies. Ka says
Jamey lies. Does Ka tell the truth?"
I personally found that ALL magic happens in the prompt :
SO you can teach the model to use a calculater as a tool , BUT.... a nonexternal tool!
it is a neural network:
So in the prompt we place the function (in python) to produce the calculation ; and return the result .... ie in side its , internal thoughts:
this region is its scratch pad for thinking!:
we can add even more complex function such as ngrams and tokenization ; givein example inputs and epected output produced by the funtion:
by interoducing function internally . the model will generate the reulsts with the exepcted outputs of these functions:
visually it would seem as though it is executing these functions internally:
for processes in which a particular function was trained , ie tokeniationa dnngrams etc: we mearly have to ask and it will produce the result via the method!
Hence the need for external funcrions has already past:
to interact with the system Yes we still need external pydantc / instuctor to allow for these outputs/templated outputs which can be cuaght and exected !
So yes i hear you but its not a paper !!! its old news and old ideas and we have already begun this:
funny how papers come out about stuff e are already doing!!
obviousy that is how far universitys are behind the knowledge !
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper