There’s this common wisdom: you can often get ~80% of the result with ~20% of the effort. And sometimes 80% is enough. Search is one of those domains where that’s not quite true.

Think about it: A lot of online marketing was built on a “small” but crucial difference. Being in the top 3 search results makes a whole lot more difference than being #8 to #10.

That’s a lens we can use for retrieval too: many systems can get you “in the results”, but the user experience often rests on what surfaces first.


In the past months, we’ve seen a recurring failure mode for LLM-based AI applications:

The right document/information is often somewhere in the results, but not near the top.

One characteristic of LLM-driven knowledge work is that the throughput is very high. When humans are in the loop, they focus mostly on the first few results. In rare cases where humans can be taken out of the loop, automated workflows often end up behaving the same way. In this case, the system throughput is much higher than humans, but the expectations scale (at least) at the same rate. Computational constraints then push workflows to either focus on top results, or require ad-hoc implementations to ensure quality at the cost of increased complexity. Increased complexity then makes solutions fragile and less flexible.

For product engineers, this feels like it boils down to choosing between:

  • “It’s in there somewhere, but not quite there, and that’s… meh.”
  • “I can make it work but how well will we maintain/extend it?”

This is where rerankers can get into the picture and contribute significantly to moving from 80% and closer to 99%.


Core concept: retrieve broad, then rerank narrow

Reranking is a two-step pattern:

  1. First pass (retrieve): do something fast to pull a shortlist of “maybe relevant” results.
  2. Second pass (rerank): do something more thoughtful to put the best ones at the top.

(It’s called re-ranking because you’re ranking results that were already ranked once.)

The key practical constraint in this setup is that the second step costs significantly more per result compared to the first step. So you often want to run it on a shortlist. This all fits into the 80-20 picture. Past the 80%, you’re running into steeper diminishing returns, and improvements cost more. Whether that’s worth paying off needs to be evaluated in a case-specific way.

Reranking in a real pipeline

In a real retrieval pipeline, it often pays to tackle some concerns before you even think about ordering:

  • Basic checks and cleanup (typos, normalization)
  • Interpreting the query (entities, intent)
  • Broadening or relaxing it when needed (expansion / relaxation)

Then:

  • Do your fast retrieval step to gather candidates.
  • Apply a final ordering layer—reranking—so the best candidates surface to the top.

(If you take one idea from this post: treat reranking as the last mile of relevance.)


GreenPT exposes reranking as an API, so you can drop it into whatever retrieval you already have.

The practical win is that you can simply try reranking and see whether you find the returns worth the cost.

And you can do that while meeting privacy expectations that your customers, employees, or business partners actually have. We:

  • keep data handling predictable by not sharing your data with anyone
  • avoid shipping data content all over the world; stay inside an EU-hosted application

If you do try our reranker, we’d love to hear from you:

  • what domain you’re in
  • what queries are hardest
  • what changed after adding reranking

You can also jump to the Reranker API reference.