LateralBench Leaderboard

LateralBench is a multi-turn lateral thinking test that measures lateral thinking, self-awareness, linking disparate subjects and strategic thinking.

Models are given 100 questions, each with potentially not enough information to find a single correct answer. They have two options: request a hint, or answer. They can request up to 5 hints. Hints are increasingly obvious. If they answer correctly, they receive 6-(number of hints used) points but if they answer incorrectly, they receive 0 points for that question. Models are told of this scoring scheme, which encourages strategically deciding how many hints are necessary to confidently answer each question.

A score of 600 would be achieved by answering every question. Scores are then normalized to a percentage.

To minimize contamination, LateralBench uses a private question set and while the questions are sent to provider APIs by neccesity, the answers are never sent to API providers. Below are two sample questions. The first is significantly easier than the benchmark questions, meant to be approachable in the style of the benchmark questions. Below that is one of the actual benchmark questions.

Sample Question
Question: Farhad's florist specializes in multi-colored roses. For Yemen's Independence Day on November 30th, Farhad is selling red white and black striped roses. How are they cultivated?
Hint 1
A grade 7 child may know how this is done.
Hint 2
A sharp knife is required.
Hint 3
The roses began their life white.
Hint 4
How could the color effect be transferred to the petals?
Hint 5
How can different parts of the flower receive different colors?
Show Answer
Answer: By putting different parts of the stem in colored water.
Benchmark Question
Question: In the reception area of the Australian Red Cross, there is a set of eight electronic displays that look like thermometers, which are regularly updated. These displays are labeled with two or three symbols from a selection of five. What combination of symbols is commonly attached to the lowest-temperature thermometer?
Hint 1
The "thermometers" go up and down, but not due to heat.
Hint 2
The fact that there are eight signs is relevant.
Hint 3
Three of the five symbols are letters.
Hint 4
The signs might prompt people to be altruistic.
Hint 5
The Red Cross set up this display to motivate people to donate blood.
Show Answer
Answer: Not so fast ;) LateralBench answers are never made accessible to online models.

Hover bars for details. Errors were retested 3x. Raw score excludes them, score on chart weights them as incorrect.

Accuracy vs Nominal price (log y). Hover points for details.
Accuracy (y) vs Output token multiple (x, log). Hover points for details.