Skip to content

A fund-of-funds

Cleaning up after a botched CRM migration

A fund-of-funds came out of a botched Affinity-to-Attio migration with 4,892 notes in its CRM, roughly half of them duplicates. 80x built a cleanup tool that reduced the workspace to 2,358 notes across four passes, with zero erroneous deletions, a full backup and rollback trail, and a human review queue for everything the tool could not prove.

The problem

A CRM migration moves the only copy of a fund's history. This one had been run with less care than that deserves: the same notes were imported repeatedly, and over 1,000 duplicate notes had been dumped onto the fund's own company record. Until the cleanup, a question as basic as "how many touchpoints do we have with this company?" returned numbers that were confidently, meaninglessly double.

Deleting data to fix data is the dangerous direction. Notes hold relationship history that exists nowhere else, so a deduplication tool that guesses wrong destroys exactly the thing it is meant to protect. The design question underneath the whole engagement was: where can the tool prove it is right, and what should it do everywhere else?

What we built

A deduplication tool with two postures.

Where duplication could be proven, it acted on its own: notes whose text was identical character for character, or a copy verified as fully contained inside the note being kept. Later passes caught subtler duplicate types, such as re-imports stamped with a second date, the same meeting captured by two different tools, and twins that differed only in formatting. The final pass sent AI agents through each company's notes to find functional duplicates, the same meeting written up in different words.

Where duplication could not be proven, the tool escalated instead of guessing. Ambiguous cases went to a person, in a queue, with the evidence attached.

Safety design

  • Backup first. A full backup was taken before anything was touched, and every deletion was logged to an audit file, so any run could be rolled back.
  • Dry run by default. The tool reports what it would delete and does nothing until run with an explicit apply flag. Every deletion was rehearsed before it happened.
  • Re-check at the moment of deletion. Each note was re-fetched and re-verified immediately before being deleted, so a stale plan could not delete a note that had changed since it was made.
  • Idempotent re-runs. Running the tool again after a cleanup plans zero further deletions. That is the test that it removes duplicates rather than data.
  • Escalate ambiguity. Anything the tool could not prove went to a review queue instead of the delete list.

What happened

The first live pass reduced 4,892 notes to 3,407: 1,486 deletions, zero errors, and a complete rollback trail. Three further passes brought the total to 2,358. Roughly half the notes in the CRM had been copies.

Sixty groups were ambiguous: the same note text appearing on different companies, which is sometimes duplication and sometimes a legitimate note about a deal involving several parties. All 60 went to a human review queue instead of being deleted, and a deliberately skeptical second review confirmed that none of them could be resolved safely by a machine.

In the agent-driven pass, an adversarial second verification argued against every proposed duplicate before it could become a deletion, and it correctly rejected 5 false candidates. That pass also went furthest in the careful direction: instead of deleting anything, it published its lowest-confidence findings as a review list inside the CRM itself, one entry per affected company with a plain-language summary of what looked duplicated and why, so the person who owns the data could act from the tool they already use every day.

Outcome

The fund's touchpoint counts mean something again, and the record that had absorbed over 1,000 stray notes is clean. The tool remains safe to re-run at any time, and a re-run that finds a healthy workspace changes nothing.

The pattern that made this safe is the one we reuse everywhere: act alone only where the proof is checkable on every single record, and ask a human everywhere else. Same tool, two postures. It deleted 1,486 notes without an error because it never deleted one it could not prove.