Skip to content

Correlated Data

Correlated Data refers to interconnected data (columns). Consider an example:

drewpwtm@yahoo.com,Andrew,Smallhouse
qwerty@yahoo.com,Alan,Green

In this example, the first column contains an email address, the second contains a first name, and the third contains a last name. We consider two columns to be correlated if they share a common substring of at least 3 characters. In our example, the first row has two such columns: the email (drewpwtm@yahoo.com) and the first name (Andrew) both contain the substring drew. The second row does not contain correlated columns.

The tool for finding correlated data allows you to filter rows—either by removing them (Remove the rows with correlated fields) or keeping them (Keep the rows with correlated fields) based on whether they contain correlated columns.

You need to select the Input Field number (e.g., 1) and specify (separated by commas) the column numbers to be checked for correlation with the input field (Correlated to one of these fields), for example: 2,3.

Next, specify the length of the common substring (Number of correlating characters), for example, 3.