sometimes you're caught up in a situation where you have multiple entries against an id and you want to keep only single entry against each.
Table structure would be something like this:
Table `temp`
Entries:
user_id tstamp source
100 4/26/2015 app
100 4/26/2014 web
100 4/25/2015 app
101 4/26/2015 app
100 4/23/2015 app
101 4/20/2015 web
100 4/1/2015 app
200 4/26/2015 app
200 4/2/2015 app
Desired Output:
user_id tstamp source
100 4/26/2015 app
100 4/26/2014 web
101 4/26/2015 app
200 4/26/2015 app
Now, I want to delete the rows where source is `app` and keep the entry with max tstamp only with in a group.
Here's the sample query to achieve the same:
DELETE tmp FROM `temp` tmp join (select user_id, max(tstamp) as ts from `temp` where user_id in (select user_id from `temp` where source = 'app' group by user_id having count(user_id)>2) and source = 'app' group by user_id )t on tmp.user_id = t.user_id and tmp.tstamp != t.ts and tmp.source = 'app'
Exercise for you: Replace that inner query with join and check the performance benefits ;)
Table structure would be something like this:
Table `temp`
Entries:
user_id tstamp source
100 4/26/2015 app
100 4/26/2014 web
100 4/25/2015 app
101 4/26/2015 app
100 4/23/2015 app
101 4/20/2015 web
100 4/1/2015 app
200 4/26/2015 app
200 4/2/2015 app
Desired Output:
user_id tstamp source
100 4/26/2015 app
100 4/26/2014 web
101 4/26/2015 app
200 4/26/2015 app
Now, I want to delete the rows where source is `app` and keep the entry with max tstamp only with in a group.
Here's the sample query to achieve the same:
DELETE tmp FROM `temp` tmp join (select user_id, max(tstamp) as ts from `temp` where user_id in (select user_id from `temp` where source = 'app' group by user_id having count(user_id)>2) and source = 'app' group by user_id )t on tmp.user_id = t.user_id and tmp.tstamp != t.ts and tmp.source = 'app'
Exercise for you: Replace that inner query with join and check the performance benefits ;)
Comments
Post a Comment