social.tchncs.de is one of the many independent Mastodon servers you can use to participate in the fediverse.
A friendly server from Germany – which tends to attract techy people, but welcomes everybody. This is one of the oldest Mastodon instances.

Administered by:

Server stats:

3.8K
active users

#chatbots

15 posts14 participants1 post today

"Finally, AI can fact-check itself. One large language model-based chatbot can now trace its outputs to the exact original data sources that informed them.

Developed by the Allen Institute for Artificial Intelligence (Ai2), OLMoTrace, a new feature in the Ai2 Playground, pinpoints data sources behind text responses from any model in the OLMo (Open Language Model) project.

OLMoTrace identifies the exact pre-training document behind a response — including full, direct quote matches. It also provides source links. To do so, the underlying technology uses a process called “exact-match search” or “string matching.”

“We introduced OLMoTrace to help people understand why LLMs say the things they do from the lens of their training data,” Jiacheng Liu, a University of Washington Ph.D. candidate and Ai2 researcher, told The New Stack.

“By showing that a lot of things generated by LLMs are traceable back to their training data, we are opening up the black boxes of how LLMs work, increasing transparency and our trust in them,” he added.

To date, no other chatbot on the market provides the ability to trace a model’s response back to specific sources used within its training data. This makes the news a big stride for AI visibility and transparency."

thenewstack.io/llms-can-now-tr

The New Stack · Breakthrough: LLM Traces Outputs to Specific Training DataAi2’s OLMoTrace uses string matching to reveal the exact sources behind chatbot responses

"When thinking about a large language model input and output, a text prompt (sometimes accompanied by other modalities such as image prompts) is the input the model uses to predict a specific output. You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. However, crafting the most effective prompt can be complicated. Many aspects of your prompt affect its efficacy: the model you use, the model’s training data, the model configurations, your word-choice, style and tone, structure, and context all matters. Therefore, prompt engineering is an iterative process. Inadequate prompts can lead to ambiguous, inaccurate responses, and can hinder the model’s ability to provide meaningful output.

When you chat with the Gemini chatbot, you basically write prompts, however this whitepaper focuses on writing prompts for the Gemini model within Vertex AI or by using the API, because by prompting the model directly you will have access to the configuration such as temperature etc.

This whitepaper discusses prompt engineering in detail. We will look into the various prompting techniques to help you getting started and share tips and best practices to become a prompting expert. We will also discuss some of the challenges you can face while crafting prompts."

kaggle.com/whitepaper-prompt-e

www.kaggle.comPrompt Engineering

A hot topic in #GLAMR -- and this paper makes a good fist of understanding AI's limitations. Large differentials between #ChatGPT, and #CoPilot / #Gemini.

"this [study] underscores the continued importance of human labor in subject #cataloging work [...] #AI tools may prove more valuable in assisting catalogers especially in subject heading assignment, but continued testing & assessment will be needed..."

AI #Chatbots & Subject #Cataloging: A Performance Test doi.org/10.5860/lrts.69n2.8440 #metadata

PYOK: The British Airways Customer Service Chatbot is So Bad It Doesn’t Even Know Where The Airline is Based. “The conversation started with a fairly simple question as the chatbot asked Paddy to tell it where he was flying. The chatbot then suggested that Paddy either type the city or airport code – such as London or LHR for London Heathrow. Paddy replied with LHR, but having just given […]

https://rbfirehose.com/2025/04/08/pyok-the-british-airways-customer-service-chatbot-is-so-bad-it-doesnt-even-know-where-the-airline-is-based/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · PYOK: The British Airways Customer Service Chatbot is So Bad It Doesn’t Even Know Where The Airline is Based | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Millionen Zeilen Code aber nicht einen Test. Wo startet man?
Eine berechtigte Frage, oder?

Aber wie gehen wir am besten vor?

Wir haben auf der einen Seite massiv viele Zeilen Code, aber eben keinerlei Test. Das genau ist in viel
dev-crowd.com/2025/04/07/milli
#Agile #Bugtracker #Chatbots #Docker #Engineering #NAS #Nullshithardware #PenetrationTest #Programmierung #Projektmanagement #RegressionsTest #Security #Server

New Open-Source Tool Spotlight 🚨🚨🚨

VISTA is a Python-based AI chatbot built using OpenAI GPT and LangChain. It integrates with Pinecone for vector databases, focusing on semantic search and managing context. Looks like a good starting point if you're exploring AI chatbot frameworks. #AI #Chatbots

🔗 Project link on #GitHub 👉 github.com/RitikaVerma7/VISTA

#Infosec #Cybersecurity #Software #Technology #News #CTF #Cybersecuritycareer #hacking #redteam #blueteam #purpleteam #tips #opensource #cloudsecurity

✨
🔐 P.S. Found this helpful? Tap Follow for more cybersecurity tips and insights! I share weekly content for professionals and people who want to get into cyber. Happy hacking 💻🏴‍☠️

"You can replace tech writers with an LLM, perhaps supervised by engineers, and watch the world burn. Nothing prevents you from doing that. All the temporary gains in efficiency and speed would bring something far worse on their back: the loss of the understanding that turns knowledge into a conversation. Tech writers are interpreters who understand the tech and the humans trying to use it. They’re accountable for their work in ways that machines can’t be.

The future of technical documentation isn’t replacing humans with AI but giving human writers AI-powered tools that augment their capabilities. Let LLMs deal with the tedious work at the margins and keep the humans where they matter most: at the helm of strategy, tending to the architecture, bringing the empathy that turns information into understanding. In the end, docs aren’t just about facts: they’re about trust. And trust is still something only humans can build."

passo.uno/whats-wrong-ai-gener

passo.uno · What's wrong with AI-generated documentationIn what is tantamount to a vulgar display of power, social media has been flooded with AI-generated images that mimic the style of Hayao Miyazaki’s anime. Something similar happens daily with tech writing, folks happily throwing context at LLMs and thinking they can vibe write outstanding docs out of them, perhaps even surpassing human writers. Well, it’s time to draw a line. Don’t let AI influencers studioghiblify your work as if it were a matter of processing text. It’s way more than that.

"Since 3.5-sonnet, we have been monitoring AI model announcements, and trying pretty much every major new release that claims some sort of improvement. Unexpectedly by me, aside from a minor bump with 3.6 and an even smaller bump with 3.7, literally none of the new models we've tried have made a significant difference on either our internal benchmarks or in our developers' ability to find new bugs. This includes the new test-time OpenAI models.

At first, I was nervous to report this publicly because I thought it might reflect badly on us as a team. Our scanner has improved a lot since August, but because of regular engineering, not model improvements. It could've been a problem with the architecture that we had designed, that we weren't getting more milage as the SWE-Bench scores went up.

But in recent months I've spoken to other YC founders doing AI application startups and most of them have had the same anecdotal experiences: 1. o99-pro-ultra announced, 2. Benchmarks look good, 3. Evaluated performance mediocre. This is despite the fact that we work in different industries, on different problem sets. Sometimes the founder will apply a cope to the narrative ("We just don't have any PhD level questions to ask"), but the narrative is there.

I have read the studies. I have seen the numbers. Maybe LLMs are becoming more fun to talk to, maybe they're performing better on controlled exams. But I would nevertheless like to submit, based off of internal benchmarks, and my own and colleagues' perceptions using these models, that whatever gains these companies are reporting to the public, they are not reflective of economic usefulness or generality."

lesswrong.com/posts/4mvphwx5pd

www.lesswrong.comRecent AI model progress feels mostly like bullshit — LessWrongAbout nine months ago, I and three friends decided that AI had gotten good enough to monitor large codebases autonomously for security problems. We s…

Your chatbots are about to kill Wikipedia. Wikipedia is as reliable as Encyclopaedia Britannica, it is a great testament to the power of the people, and of non-profit knowledge and community. So obviously its ripe for total abuse and destruction by private enterprise. Do we teach this in university? Of course we don't.

#ai #chatbots #chatgpt #genai #academicchatter #academia

LLM scraping Wikipedia results in surge in traffic, driving up costs for the non-profit. newscientist.com/article/24752

New Scientist · AI data scrapers are an existential threat to WikipediaBy Jeremy Hsu

MM: "One strange thing about AI is that we built it—we trained it—but we don’t understand how it works. It’s so complex. Even the engineers at OpenAI who made ChatGPT don’t fully understand why it behaves the way it does.

It’s not unlike how we don’t fully understand ourselves. I can’t open up someone’s brain and figure out how they think—it’s just too complex.

When we study human intelligence, we use both psychology—controlled experiments that analyze behavior—and neuroscience, where we stick probes in the brain and try to understand what neurons or groups of neurons are doing.

I think the analogy applies to AI too: some people evaluate AI by looking at behavior, while others “stick probes” into neural networks to try to understand what’s going on internally. These are complementary approaches.

But there are problems with both. With the behavioral approach, we see that these systems pass things like the bar exam or the medical licensing exam—but what does that really tell us?

Unfortunately, passing those exams doesn’t mean the systems can do the other things we’d expect from a human who passed them. So just looking at behavior on tests or benchmarks isn’t always informative. That’s something people in the field have referred to as a crisis of evaluation."

blog.citp.princeton.edu/2025/0

CITP Blog · A Guide to Cutting Through AI Hype: Arvind Narayanan and Melanie Mitchell Discuss Artificial and Human Intelligence - CITP BlogLast Thursday’s Princeton Public Lecture on AI hype began with brief talks based on our respective books: The meat of the event was a discussion between the two of us and with the audience. A lightly edited transcript follows. Photo credit: Floriaan Tasche AN: You gave the example of ChatGPT being unable to comply with […]

"My current conclusion, though preliminary in this rapidly evolving field, is that not only can seasoned developers benefit from this technology — they are actually in the optimal position to harness its power.

Here’s the fascinating part: The very experience and accumulated know-how in software engineering and project management — which might seem obsolete in the age of AI — are precisely what enable the most effective use of these tools.

While I haven’t found the perfect metaphor for these LLM-based programming agents in an AI-assisted coding setup, I currently think of them as “an absolute senior when it comes to programming knowledge, but an absolute junior when it comes to architectural oversight in your specific context.”

This means that it takes some strategic effort to make them save you a tremendous amount of work.

And who better to invest that effort in the right way than a senior software engineer?

As we’ll see, while we’re dealing with cutting-edge technology, it’s the time-tested, traditional practices and tools that enable us to wield this new capability most effectively."

manuel.kiessling.net/2025/03/3

The Log Book of Manuel Kießling · Senior Developer Skills in the AI Age: Leveraging Experience for Better Results • Manuel KießlingHow time-tested software engineering practices amplify the effectiveness of AI coding assistants.