"Finally, AI can fact-check itself. One large language model-based chatbot can now trace its outputs to the exact original data sources that informed them.
Developed by the Allen Institute for Artificial Intelligence (Ai2), OLMoTrace, a new feature in the Ai2 Playground, pinpoints data sources behind text responses from any model in the OLMo (Open Language Model) project.
OLMoTrace identifies the exact pre-training document behind a response — including full, direct quote matches. It also provides source links. To do so, the underlying technology uses a process called “exact-match search” or “string matching.”
“We introduced OLMoTrace to help people understand why LLMs say the things they do from the lens of their training data,” Jiacheng Liu, a University of Washington Ph.D. candidate and Ai2 researcher, told The New Stack.
“By showing that a lot of things generated by LLMs are traceable back to their training data, we are opening up the black boxes of how LLMs work, increasing transparency and our trust in them,” he added.
To date, no other chatbot on the market provides the ability to trace a model’s response back to specific sources used within its training data. This makes the news a big stride for AI visibility and transparency."
https://thenewstack.io/llms-can-now-trace-their-outputs-to-specific-training-data/
