Data
Oct 18, 2022

How A.I. Can Help Find New Customers

When businesses combine their first party data, they can use machine learning to find new customers that could be interested in their products

Shanif Dhanani

Introduction

If your job is to drive more inbound leads or sales for your company, you’ve likely been working a lot harder this year. Finding new customers has gotten more expensive, less efficient, and more time consuming. Facebook’s ads aren’t working as well as they used to and Google Chrome will be getting rid of third party cookies in the near future. And while larger companies have a leg up (since they’re not fully dependent on third party data) they’re still competing for attention in a world where customers are seeing thousands of ads a day.

Companies need new customers to keep growing (in general), but finding new customers is tough. Data and A.I. can help. By combining first party data from lots of customers, businesses can use A.I. and machine learning to help them find new customers that may be interested in their products.

Today, we’re seeing some new products that leverage this exact concept. For example, Google is rolling out with Topics, (you may have heard of their previous iteration, called FLoC) to help advertisers use topic and interest-based models to advertise to customers. And on the ecommerce front, Shopify recently rolled out Shopify Audiences, which uses behavioral data from customers across all of their connected merchants to help advertisers improve their social media ads by finding new customers across the entire Shopify ecosystem.

By aggregating large datasets of customer behavior, it’s possible to create models that can identify new customers based on who looks like a company’s existing customers.

Here’s how it could all work.

Finding patterns in customer data

Conceptually, using A.I. to find new customers is relatively straightforward, and it starts with data. Companies have lots of data about their own customers. They know what they look like, who they are, where they live, and how much they’re spending. They might even have information about other important attributes, like their gender, income levels, or website browsing history. These existing customers, whether consumers for ecommerce or businesses for SaaS can provide the seed for finding new customers. Businesses can use these customers to identify other customers that look similar to them. But to make this work, they first need a large list of potential customers that they can start with, and they also need data about those customers, which they can use to find similarities between prospects and existing customers.

Getting access to this data isn’t easy, which is why we’re currently only seeing these products offered by bigger customers that collect data on lots of people and businesses. Companies like Google and Shopify have information on which customers purchased which products from which stores. They know which websites and products people are looking at. They can tell if someone is male or female, young or old. Smaller businesses, or businesses that don’t have access to this large dataset, could leverage these existing products, or create their own network of businesses that all share data amongst themselves.

Once this data is available, it’s easy to use some standard machine learning tools to identify new potential customers for a particular business. There are a few off-the-shelf algorithms that can help with this. The three that we’ll go over in the rest of this article are the following:

  1. Matrix factorization - this is similar to what Amazon does when they show you other products you might like. You can identify which customers bought from which stores and use that to identify new potential stores that customers might like.

  2. Forecasting purchase probability - in this approach, you could take all the data you have about a customer and train a model to identify the probability that a customer will make a future purchase from a given store

  3. Clustering - clustering lets you identify similar groups of people (or clusters) who look like your existing customers based on all of the data that you have collected

In the next section, we’ll go over each of these techniques and how they can be used to identify new high-potential customers.

Approaches to finding new customers

It’s frequently said that the majority of work on a machine learning project lies in the data cleaning stage. That’s undoubtedly true, and as we saw above, even acquiring that data is difficult. And while it’s not always easy to get and clean data, the modeling process isn’t always straightforward.

In many cases, it’s important to try different approaches to see what works best. Many times the data you have available will also determine the type of modeling you can do. In this case, we have three obvious approaches that we can take. We’ll dive into more details on each one here.

Finding new customers with matrix factorization

Matrix factorization is a well-worn technique for collaborative filtering (a set of techniques in which you find similarities between users and items). You can use the result of a matrix factorization algorithm to find the highest-rated businesses for a potential customer, or for finding businesses that are similar to a given business.

The basic setup is easy - you create a matrix (think spreadsheet) where the rows represent users, the columns represent businesses, and the values in each cell represent whether or not the user bought from a business. Note, in practice, these data structures are really represented as actual grids/matrices. Most of the time, since these matrices have very little data, sparse matrix representations are used instead.

As noted above, most of the values in these matrices will be empty, since most users will only have bought from a handful of stores. A matrix factorization algorithm will then estimate the missing values and allow you to produce rankings for customers or businesses. It’s possible to optimize the accuracy of this algorithm by using weights and rankings based on things like purchase frequency or order value.

This approach is great for coming up with similar businesses based on what users have done in the past, but it requires you to know which customers purchased from which businesses and (unless you’re using a more sophisticated approach), it ignores all of the other data you have about your customers.

Finding new customers by forecasting purchase probability

An easy way to leverage all of the data you have about your customers would be to build a model that forecasts the probability that a customer will make a purchase from a new store at some point in the future.

In this case, creating a structured dataset for training an algorithm can be a bit more nuanced. You could set up your data so that each instance (row) in your dataset represents a customer + business, and the label (last column in the spreadsheet, which represents what you want the model to learn) is whether or not a customer purchased from that business. Given a large enough dataset, this setup will provide good predictive accuracy.

However, if you have any time-specific data (for example, the number of times a customer visited your website in the past 7 days), you’ll lose a bit of precision by using a setup like the one mentioned above.

An alternative would be to take a snapshot of customers in time (or with respect to a particular action, like a purchase) and use that snapshot for the data used in the instance (row). This way, you’ll be able to capture changes over time and how those lead to future purchases.

With your data setup this way, you could then use any standard supervised learning approach to build the best model that you can (just be sure to use time series cross-validation when evaluating the model to ensure you don’t bias the evaluation metrics).

Finding new customers with clustering

Both of the approaches above require you to know if someone has made a purchase from a particular store at some point in the past. That sort of data can be hard to come by, and you’ll need enough examples of customers that have made purchases to ensure that your algorithm can learn properly.

A great alternative to those approaches is to use clustering algorithms, which don’t require you to know if someone has already made a purchase at a particular store. With clustering, you can group customers together based on how similar they are, given the data you have for them.

Setting up the data for a clustering approach to find new customers is a bit more nuanced than setting up the data for a general clustering approach. Normally, when it comes to a clustering algorithm, you have a raw list of items (in this case, customers) that you want to separate into distinct clusters.

However, in this case, the goal is a bit different. For any particular business, you have a group of existing customers, that can be represented as a single cluster. From there, you want to find new customers that are similar to the cluster. You could still set up your data in a grid format, where rows represent potential customers and columns represent data attributes about those customers, but when you start the modeling process, you’ll likely want to find the top 5-10% of new potential customers that are most similar to your existing customers. This means you may have to simply sort new potential customers by the distance of each customer to the center of the cluster represented by your existing customers.

Marketing to new customers

Regardless of the approach you take, you’ll have a data-driven list of new potential customers that you can target. The fastest way to start marketing to them is to set up paid ads on social media to target the highest-probability potential customers with your most aggressive campaigns, and then work your way down to the least similar customers, using lower daily budgets as you go.

Currently, if you’re using a third party tool like Shopify Audiences, you can’t target individual customers directly through email or SMS, since individual contact data isn’t provided. However, if you have a network of other businesses where you’re sharing data, and you all of those businesses have opted in to sharing individual contact information (make sure to update your ToS and Privacy Policy!), it’s possible you could reach out to new prospective customers individually.

This approach, of identifying new potential customers based on your existing customers, isn’t new. Facebook has been doing this for years. But using your own first party data and creating networks of shared business data is a relatively untapped approach that can result in a win-win-win approach for businesses and customers.

I believe we’ll start to see many more of these offerings in the future, from banks that have credit card data, to fintech companies that have data about all of a customer’s purchases, to web tracking companies that track customer behavior online.

Marketing is getting tougher, and relevant marketing has never been more important. With the loss of third-party tools and the growth in awareness of the importance of tracking first party data, this is an obvious area of growth.

---

Photo by Nathana Rebouças on Unsplash

About the author
Shanif Dhanani
Co-Founder and CEO, Apteo

Shanif Dhanani is the co-founder & CEO of Apteo. Prior to Apteo, Shanif was a data scientist and software engineer at Twitter, and prior to that he was the lead engineer and head of analytics at TapCommerce, a NYC-based ad tech startup acquired by Twitter. He has a passion for all things data and analytics, loves adventure traveling, and generally loves living in New York City.