Ken/Ben
I am just getting my head aroud the prog. I have a large list of emails in one column and firstnames in another and I want to filter out any duplicate emails.
Is there a simple way of doing this task.
thx
tim
Kirix Support Forums
filter duplicates
4 posts
• Page 1 of 1
Re: filter duplicates
Hi Tim,
If it is just a two column field, I'd probably just do the following:
1) Select the grouping tool (sigma icon or Data > Groups > Group Records).
2) Drag in your "Email" field so that it says "Group By" Email.
3) Drag in a <count> field.
4) Click OK.
5) In the resulting table, sort Descending on the Count field.
6) Now you'll see which emails are duplicated in your list. Tile the table vertically so you can see both tables at once and then you can quick Filter on your main table for the email addresses you know are duplicated. Then, you can delete the ones you don't want.
A longer version of this is here: http://www.kirix.com/stratablog/removin ... -form-data
Hope that helps,
ken
If it is just a two column field, I'd probably just do the following:
1) Select the grouping tool (sigma icon or Data > Groups > Group Records).
2) Drag in your "Email" field so that it says "Group By" Email.
3) Drag in a <count> field.
4) Click OK.
5) In the resulting table, sort Descending on the Count field.
6) Now you'll see which emails are duplicated in your list. Tile the table vertically so you can see both tables at once and then you can quick Filter on your main table for the email addresses you know are duplicated. Then, you can delete the ones you don't want.
A longer version of this is here: http://www.kirix.com/stratablog/removin ... -form-data
Hope that helps,
ken
Ken Kaczmarek
Kirix Support Team
Kirix Support Team
-
Ken - Kirix Support Team
- Posts: 147
- Joined: Mon Dec 19, 2005 10:36 am
Re: filter duplicates
Is there a way to automate the process? I have 20 million records of which 5 million are duplicate entries. Is there a way to have the program remove the duplicates so that I am not filtering and deleting them manually?
- Andrew S
- Registered User
- Posts: 1
- Joined: Sat Jun 26, 2010 1:05 pm
Re: filter duplicates
Hi Andrew,
If you're only looking for distinct email addresses, I'd do the following:
1. File > New > Query
2. Drag in your 20 million record table from the project tree into the upper section of the query dialog.
3. Select the fields you want in the output table, by highlighting them in the table and dragging them down to the bottom section of the query.
4. Find your "email" field, and then, in the Function section, select "Group By"
Your resulting table will show you all the fields you selected, grouped by the email field (so you should end up with about 15 million records). See this help page for more info about the query builder: http://www.kirix.com/help/docs/creating_queries.htm
Please note that this query will select the "first" email record it finds and bring along the corresponding record with it to the output table. So, if your second email record has different data in it (say, the name field is "Jon" instead of "John" in the first field), you won't see it (e.g., it isn't combining any other data fields).
Best,
ken
If you're only looking for distinct email addresses, I'd do the following:
1. File > New > Query
2. Drag in your 20 million record table from the project tree into the upper section of the query dialog.
3. Select the fields you want in the output table, by highlighting them in the table and dragging them down to the bottom section of the query.
4. Find your "email" field, and then, in the Function section, select "Group By"
Your resulting table will show you all the fields you selected, grouped by the email field (so you should end up with about 15 million records). See this help page for more info about the query builder: http://www.kirix.com/help/docs/creating_queries.htm
Please note that this query will select the "first" email record it finds and bring along the corresponding record with it to the output table. So, if your second email record has different data in it (say, the name field is "Jon" instead of "John" in the first field), you won't see it (e.g., it isn't combining any other data fields).
Best,
ken
Ken Kaczmarek
Kirix Support Team
Kirix Support Team
-
Ken - Kirix Support Team
- Posts: 147
- Joined: Mon Dec 19, 2005 10:36 am
4 posts
· Page 1 of 1
Return to Strata Help & Feedback