My rabbling on Joins

The most popular and default style of joining is: Inner Join

this returns all rows where the column on which the join happened matches.

Visitor.joins(:marketings)
SELECT "visitors".* FROM "visitors" INNER JOIN "marketings" ON "marketings"."visitor_id" = "visitors"."id"

This does not give the visitors which dont have a marketing record.

To get all the visitors whether they have marketing records or not, we need to do Left join

This will give all records of the table on the left.

Visitor.joins("LEFT join marketings ON marketings.visitor_id = visitors.id")
SELECT "visitors".* FROM "visitors" LEFT join marketings ON marketings.visitor_id = visitors.id

Now we want to know the number of marketing records on each visitor. To do this we need to first use the group_by command. This is like a pivot point for the table. We are grouping all records on visitors id. Now each visitor has many visitors, so we can run another aggregate command on the visitors which produced a singular result for each visitor. In this case we will use the count command, so we can get the number of marketings record on each visitor

Visitor.joins("LEFT join marketings ON marketings.visitor_id = visitors.id").group("visitors.id").select("visitors.id, count(marketings.id)")
SELECT visitors.id, count(marketings.id) FROM "visitors" LEFT join marketings ON marketings.visitor_id = visitors.id GROUP BY visitors.id

Now we only want those visitors whose number of marketing records is greater than 2. Usually things like this is done with the where command. For example if we want visitors whose name was Sandeep we would do where name ='Sandeep' but in this case the count of marketings is not a visitor attribute instead it is something being calculate in an aggregate command which is working with a group by command. So now to do that comparison and filter we need to use the having command. I like to think of it as the where command equivalent for the group_by command

Visitor.joins("LEFT join marketings ON marketings.visitor_id = visitors.id").group("visitors.id").select("visitors.id, count(marketings.id)").having("count(marketings.id) = 2")
SELECT visitors.id, count(marketings.id) FROM "visitors" LEFT join marketings ON marketings.visitor_id = visitors.id GROUP BY visitors.id HAVING count(marketings.id) > 2

Now lets say we want to further filter on a column on the marketings table. the marketings column has an excluded column. we are only interested when excluded column is false

Visitor.joins("LEFT join marketings ON marketings.visitor_id = visitors.id").where("marketings.excluded = false").select("visitors.id, count(marketings.id)").group("visitors.id").having("count(marketings.id) = 0")

Here used the where claused which filtered all visitors out where there was no match. so basically all results are with a count of 1 or higher. so in a sense it is no longer a left join. it has become an inner join even though we had it as a left join.

Now if we place the condition as part of the join, it will still keep all left record and just drop the records on the right if they dont match, thus once again making it a left join.

Visitor.joins("LEFT join marketings ON marketings.visitor_id = visitors.id and marketings.excluded = false").select("visitors.id, count(marketings.id)").group("visitors.id").having("count(marketings.id) = 0")