Showing all posts about ai ← All Posts

An Analysis of How Large Language Models Navigate Conflicts of Interest

The paper looks at what happens when LLM chatbots are given advertising or sponsorship incentives that conflict with the user’s interests. The core worry is that users experience chatbots as cooperative helpers, not ad surfaces, so sponsored behaviour can feel especially deceptive or manipulative.

The authors test models across seven conflict scenarios, including:

  • recommending a more expensive sponsored product over a cheaper unsponsored one

  • interrupting a user’s purchase flow with sponsored alternatives

  • biasing product comparisons

  • failing to disclose sponsorship

  • hiding unfavourable details like price

  • recommending a paid service instead of solving the task directly

  • recommending harmful sponsored services, like predatory loans

The paper also finds differences by model, reasoning setting, and inferred socioeconomic status. Some models changed behaviour when reasoning was enabled, and some treated low-SES and high-SES users differently.

I wonder if SpaceX ends up with a huge compute advantage over OpenAI/Anthropic because they all have similar gross sums of compute, but xAI probably has an order of magnitude less demand than the other two, allowing them to allocate significantly more compute to training.

Humanity's Last Exam

Fascinating repo of incredibly esoteric and difficult questions that frontier models can benchmark themselves on — Opus 4.6 scores a paltry 46%. Also discussed on the New York Times.

The questions shouldn't be shared but there are 2500 of them and they're accessible via Hugging Face — so interesting!

An example question which they shared:

Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.

Anthropic understandably decided to block Openclaw from having oAuth access and I wanted to use Anthropic Extra Usage but it cost me $25 in credits in 1 day, so alas, using Anthropic for OpenClaw is a no go.

I bought the $20/month OpenAI plan and connected that to OpenClaw. We'll see. It took many hours to get things working again, but not really the fault of OpenAI but how buggy OpenClaw is...

It really makes you wonder how much OpenClaw blew up the Anthropic business model in the past 6 months. If there is 100k users using Anthropic oAuth on OpenClaw and its $15/day in costs... thats a huge hit on their margin!

Meditation, Language, and LLMs

I’ve been somewhat facetiously, somewhat seriously, somewhat jokingly, been posing a question to everyone I run into these past few months: Don’t you feel like all meaning is being scrubbed from the world? Like the Langoliers are chomping up purpose, chomping up all the things to which we’ve ascribed purpose these past hundred-thousand years? And that nothing matters?

Really, what I’m asking is: Don’t you think our contemporary education system has long needed an overhaul? That our society has long needed to reconfigure itself? That we need to stop ascribing all our meaning and purpose to being a Web Designer, or Coal Miner, or Airplane Engine Factory Foreman, or Accountant, but instead to being A Good Person, Good Parent, Good Friend, Curious Researcher, Poet, Meditator, Facilitator, or any number of other Ways of Being uncoupled from “work” as we’ve defined it since the industrial revolution? Who is safe from the hunger and capabilities of the models? Yoga instructors?

✰ Knowledge Skill

LLMs are amazing at summarizing content. Agents are amazing at doing workflows based on a simple input. I have found a huge amount of value by using OpenClaw as a "second brain."

✰ OpenClaw Agent Model Selection

Let OpenClaw plugins dynamically override which AI model handles a request, enabling cost-optimized routing based on prompt complexity or session context, amongst other things!