LateralBench Leaderboard
LateralBench is a multi-turn lateral thinking test that measures lateral thinking, self-awareness, linking disparate subjects and strategic thinking.
Models are given 100 questions, each with potentially not enough information to find a single correct answer. They have two options: request a hint, or answer. They can request up to 5 hints. Hints are increasingly obvious. If they answer correctly, they receive 6-(number of hints used) points but if they answer incorrectly, they receive 0 points for that question. Models are told of this scoring scheme, which encourages strategically deciding how many hints are necessary to confidently answer each question.
A score of 600 would be achieved by answering every question. Scores are then normalized to a percentage.
To minimize contamination, LateralBench uses a private question set and while the questions are sent to provider APIs by neccesity, the answers are never sent to API providers. Below are two sample questions. The first is significantly easier than the benchmark questions, meant to be approachable in the style of the benchmark questions. Below that is one of the actual benchmark questions.