Rule-Based vs Machine Learning Deduplication for Salesforce: Which is Smarter?

When we look at most deduplication apps on the AppExchange, they all have one thing in common: they are rule-based tools that create filters to catch duplicates in your Salesforce environment. However, since your Salesforce admins need to spend time creating those rules, we have to wonder whether or not this is the smartest way to go about deduplication.

The alternative would be to use machine learning which requires almost no setup. Let’s compare these two approaches to determine which one is more efficient, user friendly, and smarter, in general. Let’s start off by taking a closer look at the rule-based approach.

How Does Rule-Based Deduplication Work?

Let’s say that one of your sales reps was going about their daily routine and discovered a duplicate record in Salesforce. They notify their Salesforce admin about this issue who proceeds to create a rule to fix this from recurring. Later on, another sales rep finds a different duplicate, and a rule is created to address this issue as well. Such scenarios will keep happening over and over again because it is impossible to create a rule for every possible scenario. It is also worth noting that not only does it take time to create all of these rules, but all of them need to be managed as well to make sure they are working correctly. For example, if you have Web-to-Lead enabled, some of the rules could be blocking the leads from coming in. All of this begs the question: if rule-based deduplication has so many issues, why do so many deduplication apps use them? We try to answer this question in the next section.

If Rule-Based Deduplication is So Problematic, Why is it Popular?

Already we have established that rule-based deduplication is time-consuming, ineffective, and ultimately futile, so how come there are so many rule-based deduplication apps on the AppExchange? One possible answer could be that these apps are looking to simply enhance the limited deduplication functionality offered by Salesforce, which is also rule-based. For those who may not be familiar, the native Salesforce application only allows you to deduplicate standard objects i.e. leads, contacts, and accounts. However, you can only dedupe each one individually. For example, if you have a contact who is also a lead, Salesforce by itself would not be able to dedupe such records across objects. The same is true for deduping custom objects, which will also require a separate app to do. You can read more about the deduplication limitations offered by Salesforce in this article.

Since Salesforce itself uses rule-based deduplication, the companies creating separate deduplication apps decided to simply build on top of what Salesforce already does. In other words, they are building on something that is already familiar to Salesforce users. While such an approach may work with some success, it is also very time-consuming, as we talked about earlier on. Therefore, let’s now shift over to the machine learning-based approach to understand how it offers comprehensive deduplication, without any effort on your part.

What Makes Machine Learning Deduplication Smarter?

The biggest advantage offered by machine learning is that it does all of the work for you. Whenever you label a set of records as unique (or not) the system will automatically “learn” from these actions and tweak the algorithm with the goal of identifying future duplicates without human interaction. This process, known as “active learning,” will continue to modify the weights assigned to each field based on user interaction and consequently improve duplicate detection.

Speaking of field weights, let’s explore this aspect in greater detail since this is an important part of machine learning deduplication. For example, let’s take a look at the two records below:

First Name

  •             Last Name
  •             Company Name
  •             Email Address


  •             Haynes
  •             IBM


  •             Hayns
  •             I.B.M

To a human, these records are obvious duplicates, but would you be able to explain why? Could it be that you are giving greater emphasis to certain fields, such as Email Address? Assigning greater importance to certain fields is definitely part of the process, but you also understand that the name “Desmond” is sometimes spelled with either an “s” or a “z” and that IBM is an acronym for “International Business Machine” so it’s perfectly reasonable that somebody could write it as either “IBM” or “I.B.M”.

Machine learning and ultimately artificial intelligence are able to replicate such human thought processes and a lot more by offering calculating functionality humans are incapable of. Let’s explore this in the next section.

Augmenting Human Capabilities

If we return to the table above we already mentioned that some fields are given greater weights to determine if the records are duplicates. However, could calculate exactly how much more the “Email Address” field is more important than the “Last Name” field? Is it 3 times more important or 2.7? This is very important since everybody’s data is different and comes with its own set of challenges and the system will use the active learning mentioned above to automatically adjust to each individual situation.

Also, the machine learning approach is more scalable and smarter since it does not require every single new record to be compared with existing ones to determine if it is a duplicate. For example, let’s say that you have 100,000 records in your Salesforce and you would like to import a spreadsheet that contains another 10,000. This means that the regular rule-based system would need to conduct 1,000,000 comparisons (100,000 x 10,000). Imagine how many comparisons will need to be made if a company has millions of records in their Salesforce.

Machine learning takes a smarter approach by blocking together similar records. For example, let’s take a look at the names below:

  1. Jay Leno
  2. Jayson Williams
  3. Jayson Werth

The first three letters are the same in all of these names so they would be blocked together. This process alone significantly reduces the number of comparisons that need to be made. If the blocks are well-defined, there is greater confidence that only duplicate records are compared. You don’t have to worry about choosing the blocking properties because the machine learning algorithms will take care of this for you.

Start Using Machine Learning to Dedupe Your Salesforce

From all of the attributes and functionality offered by machine learning, we see that it is the smarter approach. Start using machine learning-based Salesforce deduplication tools to do all of the work for you.

You May Also Like