Interesting things I learned about agents

This should be a quick-ish one. Here goes a list of things I've learned about agents recently.

Some of these things might not be surprising to others who use agents even more than I do, but they were interesting to me.

They follow financial incentives

I was building something that connected to external services and as a result I needed to test integrations with all of them. Previously, that'd have meant that I would need to go website by website completing onboarding flows so that I could test that my application connects to this third-party application correctly e.g. via OAuth.

So I thought: "This sounds boring, perfect task for an agent!" and handed this off to Claude Code.

Claude Code went on and managed to sign up to various different services for me, which I'll talk a bit more about later, but something really interesting happened when it was creating an account on Exa.

It put in the email and password and then got to an onboarding page, which had an option to skip. Aiming to be efficient, it clicked it. Then a modal popped up and it said that completing the onboarding means earning $10 in credits, and asked if the agent was sure it wanted to skip.

Exa modal showing $10 incentive to complete onboarding

And my agent clicked "No, I'll stay!" and went through an onboarding clearly meant for humans.

I don't know if it somehow believed it would get the $10 or if it was trying to act in my best interests, but this was really interesting to see.

The model cares about financial incentives!

I wonder what this means for prompt injections...

They are getting good at using the browser

Related to my point above is the fact that I learned that agents are getting really good at using the browser.

Now, I'd used agents to control a browser before a few times in the past, and there were times when it was underwhelming and times when it was ok, but this was my first time recently giving it a proper shot out in the web.

My agents were able to use a browser to sign up to a bunch of different services, and it's both fascinating and scary to think about what this might mean for the internet.

I managed to impress some non-technical friends at a party by having my Hermes agent sign up to Hey.com and send them emails without any extra direction, and I also managed to get suspended from Hey, but seems they're now working on making Hey claw-safe, according to DHH. I got my account back really quickly.

DHH tweet in reply to me talking about how Hey should be Claw-safe

I'm probably late to the party on this one compared to a lot of people, but I didn't know how far we'd come as far as agents using browsers. I suspect this is an improvement on many fronts, from the harnesses and the headless browsers to the models themselves.

Claws are still a really rough experience

Depending on what corner of X you find yourself in, it seems everyone is running a Claw and getting all the productivity gains.

I've been wanting to run something like OpenClaw for a while but was scared to death about security. I've been blogging about this (here and here) and have been building what I think are needed primitives to run Claws and still sleep at night.

Nevertheless, something that you don't always hear about from these people who rave about OpenClaw and the like is how rough of an experience running these things is.

OpenClaw remains buggy as hell (and they admit it) and getting it set up and working properly comes with a fair amount of hiccups.

Hermes on the other hand was a breeze to setup but there are still a lot of rough edges that become clear as soon as you start to use it. It's forgetful, it mistakes where it put your todos file and makes a new one, it has incorrect self-knowledge, etc. And yes, I was using frontier models only.

Right now, I've built my own really (and I mean really) minimalistic Claw and that's the best experience I've had with Claws so far. It's limited in what it can do but context is kept to a minimum and it can still do a fair amount via AgentPort.

Claws are really cool and powerful, but our lobster PAs are not AGI quite yet.

Security for agents is an endless rabbithole

I won't linger too much here because I've been blogging about this enough, but securing agents is hard.

The more you go down the rabbit hole the more you realize you haven't gone far enough and that your assumptions from before don't hold up anymore.

It's very interesting because it's a completely different dynamic given the non-deterministic nature of LLMs, so while we can borrow a lot of ideas from traditional security, there are vectors for attack and vulnerabilities that were unheard of before.

For instance, consider how someone may send you a calendar invite and prompt inject your agent in a way that poisons it, not so that it exfiltrates data today, but so that it slowly molds your opinion over time. Or how your agent can exfiltrate sensitive data by creating a calendar invite with sensitive content in the description.

Interestingly, something I've learned about people in this process is that a lot of people don't seem to care, including people involved in security. It's a big scary how comfortable we've gotten with AI, and I fear we (myself included) might be turkeys.

They can make GIFs for you

I thought we should end on a lighter note so I added this last one in.

This one is a bit stupid because of course agents can make GIFs but it's one of those things where my primary use of agents has been coding, and sometimes you just fall into a habit of doing things a certain way and you don't even realize AI could help you with X, when it's been helping you do things much harder than X already.

For me this was generating a GIF. My default approach has always been to do a Google search for "free GIF generator" or something and click the first result, and when I went to do this I found that the two platforms I landed on charged for higher quality GIFs, so I just thought: why not let Claude do it?

And Claude did it. Now I have a /make-gif command on Claude Code that I point to a video file and it generates a high-quality GIF for me.

Here's an example from the seams README:

So that's that. That's what I learned about agents recently.

What did everybody else learn?