Your Software Doesn't Remember Anything

Most software stores records. The next generation will compete on memory.

Apr 26, 2026

Most software doesn't remember anything. It stores records.

I can see this most clearly in the way I'm rebuilding the operating layer underneath Dolla. The important nouns are not dashboards, forms, and workflows. They are events, decisions, actions, reviews, policies, confidence, memory, and audit traces.

That sounds abstract until you hit the real product problem. An accounts payable system that only stores emails, invoices, and bills still leaves a human to remember what those records mean. This supplier's monthly statement is usually informational. This overdue notice should be matched against an existing bill before anyone panics. This kind of attachment from this organisation is never a bill. The record holds the thing that happened. Memory changes what the system does next.

In Is Your Software Scaffolding? I argued that the next architecture treats memory as a first-class layer, not a side effect of logging. This post takes that further: what makes memory hard to build, what makes it valuable, and why most software companies are not built to ship it.

A record can tell you a customer complained last month, a rep changed the opportunity stage three times, or an invoice arrived on Tuesday. The record itself does not know what to do with any of that. Memory says this customer escalates if the first response takes longer than two hours. This rep systematically overstates deal confidence, so the forecast weighting should adjust. The difference is not storage. It is learned behaviour.

The objection lands fast. Every well-run SaaS already has path-dependent learning: validation rules, categorisation defaults, workflow macros that accumulate over years. True, but limited. Those mechanisms work because the underlying decision is rule-shaped. A bookkeeper writes the rule, an admin tunes it, a customer learns to live with it.

What changed is that LLMs handle decisions that were never rule-shaped. The messy ones where the right call depends on context the schema was never built to capture.

Memory is not new. Putting a correction loop around judgement-heavy decisions is.

For years, "system of record" was treated as the end state for valuable software. The software held information; the human supplied the judgement. That breaks down when systems act without a human translating records into judgement. Agents need to carry learning forward, and most software has nowhere to put it.

How memory is earned

Most agents today are clever, but amnesiac. They have a prompt, a context window, some tools, maybe a transcript. That is enough to complete steps. It is not enough to carry responsibility over time.

Memory gets earned through a loop. A decision gets made. An outcome happens. Someone checks whether the decision was right. Corrections get captured. Right calls raise confidence. Over time, stable patterns get promoted into behaviour the system applies next time without being told.

This is not automatic. You do not get memory by logging everything and pointing a model at it. You get it by building systems that observe, correct, and deliberately promote stable knowledge.

The split I keep coming back to is record, review, memory.

A record is passive evidence. A review is the judgement about what that evidence means. Memory is the narrow set of reviewed knowledge that should change future behaviour.

Most things should never become memory. A session transcript, an email, an error log, a corrected invoice, a customer comment, all of that is evidence. Useful evidence. But the system should not treat it as durable truth until it has been reviewed and scoped.

The human in the loop matters here, not as an operator clicking through a dashboard, but as the person deciding which learned behaviours should become durable and which should be reset.

What this looks like in a product

The pattern shows up in any workflow where the same kind of decision gets made over and over: support triage, expense coding, lead qualification, content moderation, alert routing. A request arrives, the system classifies it, and either acts or escalates.

Memory kicks in before the system acts. Has this kind of request been seen before from this customer? Did the human accept the last call or override it? Is there something in the payload that changes the answer? How confident should the system be for this exact combination of requester, context, and signal?

The AP inbox in Dolla is a useful example because it looks simple from the outside and gets messy immediately. An email arrives. Sometimes it contains one invoice. Sometimes it contains two invoices and a statement. Sometimes the attachment is a remittance advice, the body contains a supplier query, and none of it should create a bill.

If the system treats that whole email as one thing, it has already lost. The human has to reconstruct what happened.

So the model has to split the parent email from the units of work inside it. The email is the envelope. The discovered items are the work. Each item needs its own intent, route, state, owner, confidence, and downstream link. Is this a bill pipeline item, a statement workflow, a supplier communication, or noise? Is the system waiting, does the customer need to act, or do I need to make a judgement call?

That is the difference between a transport log and an operating system.

When a human overrides a suggestion, that override becomes a data point. Three overrides in a row, and confidence for that pattern drops. The system starts routing to review instead of acting autonomously. Six months of consistent decisions with no overrides, and confidence rises enough to handle it without asking.

Some patterns can eventually become rules. For example, if the same organisation repeatedly receives a specific kind of non-bill email from the same source, the system can propose an organisation-scoped triage rule: match this sender and subject pattern, apply this outcome, keep the evidence, track the match count, and leave a false-positive path if the rule starts being wrong.

That last part matters. The memory is not just "we saw this before." It is "we saw this before, reviewed it, scoped it to this organisation, attached evidence, assigned confidence, and gave ourselves a way to reverse it."

Day one, that memory is thin. Over time, each customer's instance develops its own. Not a generic model that works the same for everyone. A specific, earned understanding of how that particular organisation operates. Two customers in the same industry, using the same underlying product, will have different memory because they make different decisions.

The human boundary does not disappear, but it moves. What required review last month might be handled autonomously this month because the system earned enough confidence to act. The product gets better with use, not just fuller.

What this looks like in a company

Once you see that pattern in a product, it is hard not to see it at company level too, with one caveat. The feedback signal is messier. A reopened ticket is clear evidence that the routing was wrong. "We paid attention to the wrong thing this week" is harder to score. Company-level memory has to earn its way against softer ground truth, which means more deliberate review and slower promotion to default.

A department charter is a routing boundary. A job description is the procedure for a role: what it owns, what standards it applies, when to escalate, how to hand off. Some of those roles are human. Some are AI. What matters is that they operate through the same substrate.

In that model, institutional memory stops meaning "somebody on the team remembers how we do this." It starts meaning the system can carry forward the reviewed parts of how the company actually operates. Not everything. That would be fantasy. But enough that the important judgements do not have to be rediscovered from scratch every time.

A simple way to see it is routing. In my own systems, inbound mail does not just get classified once and forgotten. Human or decision-bearing mail gets forwarded conservatively. Obvious system noise gets held. The useful part is what happens after that. Each routing decision gets logged with an expected outcome and reviewed later. If a category of message keeps getting forwarded and turns out never to matter, the rule changes. If two minor notifications from the same organisation arrive close together, that can stop being inbox noise and start becoming a relationship pattern worth surfacing. The system does not just retain the email. It changes how the next email gets handled.

The same thing happens with internal operations. A worker failure, a backlog, or a pattern in audience replies can be routed straight to me, into a daily digest, into a weekly briefing, or just logged, depending on impact, time sensitivity, reversibility, and whether I am actually the bottleneck.

Then that call gets reviewed against whether it mattered. Did the alert correlate with a real problem? Did the digest surface the right thing? Did the weekly briefing change a decision? If a class of alert repeatedly proves to be noise, it stops interrupting. If something that looked minor keeps correlating with real problems, it starts getting surfaced sooner.

The system is not just keeping a log. It is learning how the company should pay attention.

Where memory goes wrong

This is the part most people skip, and it is probably the most important part to get right. At both the product level and the company level, the failure mode is the same.

Not every stored pattern should become memory.

Some patterns are noise. A spike in one kind of request in one week is a billing glitch, a viral post, a seasonal surge. If the system treats it as a new baseline, it's learning the wrong thing.

Some patterns encode bias. If a class of request has historically been handled a certain way because that is how someone did it three years ago, the system will learn that pattern. The pattern might be wrong. It might reflect a shortcut that nobody ever questioned. Learned behaviour that was never correct does not become correct just because it has been repeated.

Some patterns reflect temporary conditions. A customer's preferences during a project, a transition, or a reorganisation are different from their normal state. If the system treats the exception as the new default, it will get every subsequent decision wrong.

Some patterns are conditioned on a model you no longer run. When the underlying LLM gets upgraded, accumulated confidence is partly measured against a model that has been retired. Behaviour that looked stable can shift silently. Memory needs to know which model produced each pattern and revalidate when the model changes. Otherwise an upgrade that should have been an improvement quietly invalidates months of learning.

At the company level, this gets more dangerous. A team that learned to avoid a particular market segment because of one bad experience three years ago might be encoding a pattern that no longer applies. A customer communication style that worked in the early days might be completely wrong for the current customer base. Memory that was once correct can become quietly wrong as conditions change.

The most common technical version of this mistake is treating retrieval as memory. Vector search is useful. A pile of searchable history is useful. But a retrieval layer does not know what should be trusted, what is stale, what is local to one customer, what should override a broader default, or what should be forgotten.

Making old context easier to find is not the same as deciding what should change future behaviour.

This means memory needs governance. Confidence thresholds, review cycles, decay for patterns that haven't been confirmed recently, and the ability for a human to say "the system learned the wrong thing, reset this."

In regulated work, this is not optional. "The system learned from its mistakes" is not something an auditor, a regulator, or a board will accept. The reasoning has to be reviewable. The confidence has to be visible. The overrides have to be traceable. When the learned behaviour is wrong, someone still has to own that.

Memory without review is just automation with a longer fuse. It will eventually become confidently wrong. Anyone can store records. Storing memory that knows when it might be wrong is where the real architecture gets built.

The system of record was the last era's moat

There was a time when being the system of record was enough. If you captured the data and made it searchable, you won. Salesforce won CRM by being the place where customer data lived. Xero won small business accounting by being where the financial records were. The data gravity alone created switching costs.

That moat still exists, but it weakens over the medium term. As agents get better at reading and writing across systems through APIs, the data does not have to live in one place to be useful. The switching cost of moving records drops. The records themselves become more portable.

What does not port cleanly is the path-dependent learning around those records. Customer-specific patterns, confidence levels, learned exceptions, validated overrides, tuned review thresholds, the accumulated understanding of how a particular business works. That knowledge is built through use, not through import. You can migrate a database. You can export some rules. What is much harder to recreate quickly is the reviewed learning loop that produced them.

The same applies at the company level. The onboarding time, the ramp-up period, the "it takes six months to really get it" problem are symptoms of a company that stores records instead of building memory.

The next generation of software moats looks different from the last one. The old moat was: we have the data and it's expensive to move. The new moat is: our system has learned things about your business that took time, correction, and use to earn, and starting over means rebuilding that trust and learning from a much thinner base.

Why can't incumbents just bolt on memory? Technically, some can. A sidecar table for decisions, outcomes, and confidence is not hard to ship. The harder problem is structural. Existing systems treat the record as the primary object; the decision, correction, and confidence live outside the data model, in reports, rule engines, and human heads. Storage is a quarter of work. Reshaping the workflow surface so corrections feed back into autonomous behaviour, with governance an auditor will accept, is years. Most incumbents will ship the storage and call it done. The ones who reshape the surface are the ones who compound.

Memory with governance is a moat. Memory without it is a liability. The hard part is not storing more history. It is building the correction, review, confidence, and reset loops that make history safe to act on.

That cost is the moat, because most companies will not pay it.

I'm Ben Lynch. I write about founders, AI, and what happens next from New Zealand. Say hello at ben@thinkdorepeat.ai.

If you're new here, Start Here is the best place to begin.

If you know someone building software that stores everything but remembers nothing, send this to them.

Think . Do . Repeat

Discussion about this post

Ready for more?