Mar 215 min read

Design Decisions for adding LLMs to your sales product

Updated: Mar 22

Most sales tools think about adding “generative AI” features to their product. It makes sense: a lot of sales tasks are text-based, repetitive, generate tons of data. In addition, the cost of errors is relatively low: a wrongly-worded sales email is not quite the same as an AI doctor prescribing you rat poison.

Still, the use-cases can range from valid to questionable to “let’s throw AI at the wall and see what sticks”. We went through tons of product iterations over the last year ourselves, and had to decide where we sit among the a range of choices and trade-offs (will write a companion technical piece next week. If you’re thinking of adding LLM-based agents to your sales tool, buckle up.

The value prop is saving time

AI tools take a job done by a human, and automate it. Hence, they save time. This seems like a trivial inference, but it’s surprisingly easy to get it wrong. Indeed, what a lot of sales people desire, and enablement tools pretend, is an AI magic money printing machine that miraculously generates qualified leads.

magic money printer — What most SDRs expect AI tools do

Obviously, there are second order effects. Saving time can increase revenue (same number of sales people doing more) or reduce costs (same outreach with lower headcount). It can also enable new workflows that were previously too costly (like instantly enriching a list of companies by what cloud provider they use).

A good reality check: can a human do the same with sufficient time (and access to the same resources)? If not, we might be in fantasy land, like creating an AI that can infer your personality from your linkedin profile.

Workflow integration

Since the value prop is saving time, how the agent integrates into current workflows is just as important as the AI itself. For example, an email writer that writes emails you have to copy paste around is inherently less valuable than one that takes care of all the outbound automatically.

In practice this usually means two options: either the agent is good enough to completely take over a given task (for example, prioritizing accounts), or you have to invest heavily in create the right UX for a copilot.

Target performance

How good does the agent have to be (relative to human workers) to be adopted? The good news is that for sales, the bar is pretty low. GPT4, given the relevant context, is already better than the average SDR at writing a cold email, although admittedly worse than the top performer. When thinking about target performance, a good mental model is: “our email writer is better than 80% of humans given 5 minutes, but worse than the best sales person given infinite time”.

One tricky bit about communicating your performance is that users often have a mental model of “I can do this better <given infinite time>” not “this is better than what I would do <given that I will realistically only spend 5 minutes on this task>".

Input-output format

How do users interact with your agent? On the one end of the spectrum, they don’t interact at all. You just use the agent to make sense of your internal data, or acquire external data, then show it as structured output to the user.

On the other end of the spectrum, you have a text box / chat interface and anything goes. “Anything goes” is the right mental model in this case. Even if you try to constrain the LLM to answer only specific queries, creative prompting can always get around that. Make sure your use-case does not care too much about such adversarial cases.

Similarly, what output do you want to show the user? AI agents usually have many steps - do you just want to show the final results, or the entire chain that lead to it?

Limiting your agent

I’ve been using the term “AI agent” loosely, it generally refers to systems where LLMs can decide what subsequent steps to take / tools to call.

Often, it’s enough to to feed a bunch of data to a language model, and ask it to synthesize it (RAG in industry lingo), or go through a predetermined set of steps (usually called chains). The benefits of both is that they’re predictable, both in cost and speed.

Sometimes you need flexibility in decision making - this is when agents enter the picture. For example, you want to find out if a company has ESG initiatives (you can use Noki for this btw 😉), your agent first needs to decide how it’s going to find out this info. Should it do Google searches, look at the website, or search linkedin for sustainability titles? Once done, it has to determine if it found the info; if not, what follow up steps to take. If it’s still not done, should it give up, or continue. And so on.

The downside of agent flows is that your costs, and steps are not deterministic. In the worst case, you can end up in infinite loops. To avoid that, you’ll need to think about what limits you want to set on your agent. One natural way is to limit the number of steps / iterations it can do.

User feedback

Getting user feedback makes improving your agents much, much easier. Mistakes often arise in rare corner cases, and without feedback the only way to hunt those down is review a very large set of results. With feedback, you know where to look.

But you have to think of feedback broadly. Nobody is going to write a report why the agent fell short. For direct feedback, a thumbs-up thumbs-down is probably the most you can expect.

And often, you don’t even get that, especially if the agent is autonomous. So you can track implicit feedback or downstream metrics. For example, do users restart their interactions? Something probably went wrong. For chatbots, is the conversation unexpectedly long? Look into it. For email writers, can you collect replies? Think about what implicit behaviour you can use.

Memory or no memory

You can give your agents memory by feeding it previous user interactions or user feedback in some form. You probably don’t want to do this.

You will run into the “I have to give the AI the same instructions every time” issue, but you avoid a whole can of worms: how to choose what to remember, extra cost for extra context length, how to forget previous interactions, etc.

If you do add memory, it’s best to scope it. For example, for example, an email writer might remember all the feedback given in the same campaign, but new campaigns start from a blank slate.

Strategic considerations

Finally, how future-proof is your agent? You can expect the obvious: LLMs will get better and cheaper.

Less obvious: building a copilot might be a waste of time. If your reason is ‘current LLMs are not good enough for task X, so let’s augment sales people instead’, it’s worth considering how this statement will hold up in a year. Or even 6 months.

One thing is for sure: in a few years, you can expect most sales tasks to be done by robots.

Stories from our Founders