Building a Generative Web Search Arena
I mention the need for something like this in several posts (which I will organize below). But I'm going to just start building it out myself.1
- Dec 10th2: On developing informed preferences for models in generative web search
- Dec 4th, 2023: Towards “benchmarking” democratization of good search.
- Aug 31st, 2023: The Need for ChainForge-like Tools in Evaluating Generative Web Search Platforms
My first attempt will be to fork and adapt the LMSYS.org ( website | twitter ) FastChat repo (available on GitHub). While working on that I may attempt to get a notebook running with the various search APIs that I have access to.
Added
I will start with two of Tavily, You.com, Perplexity AI, and Metaphor Systems. (While Phind has a model released, I'm not sure of the status of their API.)
Added
It may also be possible to adapt Ian Arawjo ( website | twitter )'s ChainForge ( website | GitHub ) for this.
Added
I worked out a proof of concept for pairwise comparison in ChainForge.
And I made some custom providers, overly simplistic and under documented at this point: danielsgriffin/ChainForge_SearchProviders
Footnotes
-
I was remotivated to pursue this after recently playing with Tavily ( website | twitter | doc ) and then the You.com API (in A generative search baseline from the You.com Web Search API, a simple prompt, and gpt-3.5-turbo). We are in a very different place than we were in August since not only does Metaphor Search have an API, but also others like Perplexity AI, You.com, and Tavily. I'm sure there are many others that I could work to integrate as well (including a variety of new search APIs). ↩
-
updated on the December 14th ↩