Data lakes are repositories of data that typically hold vast amounts of data. Data lakes can typically have massive set sizes (for example, with maximum set sizes that may be tens of millions) and can have massive dictionaries (for example, with hundreds of millions of distinct values). For example, modern data lakes can include a private enterprise data lake or a public data lake like Open Government Data or Web Tables. Generally, when conducting data location on data lakes, a user's computing device inputs a table and identifies a join column. A searching system then attempts to find tables that can be joined with the user-provided table on the largest number of distinct values.