tooling to support people in making hands-on and open evaluations of search

Tomorrow I'm attending the [Union Square Ventures AI Roundtable #1: AI and Search](https://matthewmandel.com/2024/01/05/ai-roundtable-1-ai-and-search/). I'm looking forward to a dynamic conversation. I am also using it as a forcing function to write down something about what I'm really narrowing in on: developing tooling for user exploration and evaluation of search systems to support a strong new search ecosystem.

Building on my prior research I am very focused on developing shared tooling and other resources to support people in making hands-on and open evaluations of search systems and responses (particularly to public interest search topics). We need this sort of tooling to better inform individual and shared search choices, including for refusing, resisting, repairing, and reimagining search practices and tools. Such tooling might surface distinctions and options and let subject matter experts, community members, and individuals develop (and perhaps share) their own evaluations.

I have been shifting my research statement to engage with this and looking for how to make it happen, whether in academia, with foundation support, in a company, or as something new. I am working on this so that we might be better able to advocate and design for the appropriate role and shape of search in our work, lives, and society.1

There is a lot of related work on evaluation of various types of systems, benchmarking, audits, complaint, etc. to build with, but that work is not narrowly aimed at facilitating open evaluation of the performance of new web search tools on public interest search topics and to support effective voice and choice in search.

This project is intended to complement existing reporting, benchmarking and auditing efforts but focus on helping people develop their own sense of what different tools can, can't, and could possibly do.

This can be a framework and service that supports individual evaluations, collaborative evaluations, and requests-for-evaluations from peers, experts, and public-benefit search quality raters.

I imagine such tooling could be used by an agency or non-profit to issue public complaint and to refine their own content and search work. Or by individuals to decide on which new tool to start to use, or to continue refusing. Or by content creators to push for better attribution or shared funding models, or develop their own systems. Or by RAG builders to demonstrate their improvements.

Searchers, publishers, journalists, SEOs, activists, and academics have long been making complaints about and to the dominate search system and much of that is deflected and/or improvements are made that strengthen its position. We have a chance now to package our evaluations, both the good and bad that we find in search results and responses, as a broadly shared resource that might advance search in multiple ways in the public interest.


Below I try to roughly connect some of that paths that led me here.

Background

Philosophy undergrad. Intelligence analyst in the US army. Professional degree program: Master of Information Management and Systems at UC Berkeley 2016. Continued into the PhD program in Information Science. Started focusing on search practices, perceptions, and platforms in 2017. Dissertation research examined the seeming success of workplace web searching by data engineers.

Earlier

This is rooted in:

Spring 2023

  • The introduction of ChatGPT clearly helped many people see that search could be different. As it seemed there was an opportunity to influence the shape of search to come I looked at making myself a role in industry and started exploring generative web search systems.
  • I've been reflecting on the course I taught at Michigan State University last spring on Understanding Change in Web Search and what my students taught me. I've been thinking particularly about how we implicitly and explicitly make search quality evaluations and how we might do well to share more of these and solicit feedback from others as we strive to develop our search practices and identify what we want from search. (Coming out of my dissertation research (and following the lead of @haider2019invisible) I believe it is desperately important that we talk more about search.)

Summer & Fall 2023

December 2023

January 2024

Footnotes

  1. See @hendry2008conceptual [p. 277].

  2. While it is not comparing search results or responses, LMSYS Org does now have 'online' models in their Chatbot Arena; see Jan 29 announcement. Current 'online' models are from Perplexity AI and Google's Bard.