Step Back Propting - Let's make LLM think about questions

Unveiling a New Horizon in NLP: Enhancing Reasoning through Abstraction and Human-Inspired Techniques

Generated by MidJourney (https://midjourney.com/)

In the ever-evolving landscape of Natural Language Processing (NLP), the introduction of new methodologies and techniques plays a crucial role in pushing the boundaries of what is possible. Today, I am excited to delve into a paper that stands at the forefront of this innovation, exploring a novel technique named STEP-BACK PROMPTING, designed to enhance the multi-step reasoning capabilities of Transformer-based Large Language Models (LLMs). This paper caught my attention for its human-centric approach to problem-solving.

Tackling the Challenge of Complex Reasoning with Step-Back Prompting

Complex multi-step reasoning stands as a formidable challenge in the field of NLP, even amidst the significant advancements we have witnessed. Existing techniques such as process-supervision and Chain-of-Thought prompting have made strides in addressing these issues, but they fall short of completely overcoming the hurdles.

Step-Back Prompting is proposed as a beacon of hope in this complex landscape. It introduces a two-step process of abstraction and reasoning, designed to mitigate errors in intermediate reasoning steps and enhance the overall performance of LLMs. This method stands out for its ability to ground complex reasoning tasks in high-level concepts and principles, bringing about a new era of possibilities in NLP.

Visual Example of Step back prompting

Empirical Validation and Performance Gains with Step-Back Prompting

The paper provides a robust empirical validation of the Step-Back Prompting technique. The TimeQA test set, known for its challenging nature, serves as the benchmark for this evaluation.

The results are telling. While baseline models like GPT-4 and PaLM-2L exhibit the challenging nature of the TimeQA task with accuracies of 45.6% and 41.5% respectively, the introduction of Step-Back Prompting, especially when combined with Regular retrieval augmentation (RAG), marks a significant leap in performance, achieving an accuracy of 68.7%. This is not just an improvement over baseline models, but also a substantial enhancement compared to other prompting techniques such as Chain-of-Thought and Take a Deep Breathe.

The technique’s proficiency becomes even more evident when diving into different difficulty levels of the TimeQA task. While all models struggled more with “Hard” questions, Step-Back Prompting, in conjunction with RAG, showcased a remarkable ability to navigate these complex scenarios, significantly improving performance.

Results of the paper

Conclusion and Future Directions

The paper concludes by positioning Step-Back Prompting as a generic and effective method to invoke deeper reasoning capabilities in large language models through the process of abstraction. The extensive experimentation across diverse benchmark tasks bears testament to the technique’s ability to significantly enhance model performance.

The underlying hypothesis is that abstraction enables models to minimize hallucinations and reason more accurately, tapping into their intrinsic capabilities that often remain hidden when responding to detailed questions directly. This paper is not just a presentation of a novel technique; it’s a call to action, urging the community to embrace human-inspired approaches and unlock the true potential of large language models.

The journey of enhancing the reasoning abilities of large language models is far from over. However, the introduction of Step-Back Prompting marks a significant stride forward, providing a solid foundation and a clear direction for future advancements in the field of Natural Language Processing.

Link to the Paper

Matteo Villosio
Matteo Villosio
AI Lead and Trail Runner

Matteo Villosio is AI Lead at Tinexta Group, where he conceived and launched LextelAI, now Italy’s leading AI assistant for lawyers and legal professionals, and is currently advancing large‑language‑model and agent‑based solutions across the group’s businesses.

In parallel, he co‑founded DatAIMed and drives its AI vision, orchestrating autonomous‑agent pipelines and a multi‑collection MongoDB vector database that indexes more than 150 million scientific papers to deliver real‑time, bias‑checked clinical insights. In this role he recruits and mentors high‑performance AI teams, forges collaborations with hospitals, CROs and universities, and aligns product strategy with clinical and market needs.

Earlier, as the first Data Scientist at Greenomy, Matteo built the firm’s inaugural deep‑NLP system and earned top honours at the Swift Hackathon. He has designed machine‑learning solutions for audit analytics at Generali and data‑engineering pipelines at Flowe, conducted large‑scale social‑media research at SmartData@PoliTO, and led projects at the NGO FAWLTS to narrow the education‑to‑employment gap.

Matteo also serves as a member of GlobalAI, the Swiss‑based non‑profit that represents AI stakeholders before the United Nations and other international bodies, promoting the responsible, sustainable and ethical development of artificial intelligence worldwide.