Grouping the output
These questions are all about grouping your data together.
To complete them you will need to know The following commands: nth_value, ntile, lag, lead,
Read on to Learn about Grouping
Lag and Lead
These functions allow you to examine the next or previous row. You may then compare this value to the current row
Consider a race where we wanted to see the time of the person in front of us and the amount of time we beat the person behind us by
select name, time, lag(time, 1) over (order by time) as time_of_person_infront_of_me, lead(time, 1) over (order by time) - time as how_much_i_was_infront_of_ther_person_behind_me FROM runners order by time
name | time | time_of_person_infront_of_me | how_much_i_was_infront_of_ther_person_behind_me |
---|---|---|---|
andy | 101 | None | 2 |
bob | 103 | 101 | 1 |
cedric | 104 | 103 | 0 |
dave | 104 | 104 | 4 |
eric | 108 | 104 | None |
nth_value (& first_value) and ntile
Unsurprisngly nth_value will return the nth_value, but if we do not specify a range it will return null if the current value is less than the nth. If we always want something displayed we need to specify a range
ntile(n) divides the group into n equal partitions and denotes which partition each row is in.
Let us go back to our race and print how much faster a runner needs to go to finish on the podium (1st, 2nd or 3rd), the time of the second runner and finally we will use ntile(2) to determine if they are in the top half of runners
select name, time, nth_value(time, 3) over (order by time) - time as to_go_faster_to_make_podium, nth_value(time, 2) over (order by time RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as time_of_second_runner, ntile(2) over (order by time) as which_half FROM runners order by time;
name | time | to_go_faster_to_make_podium | time_of_second_runner | which_half | |
---|---|---|---|---|---|
andy | 101 | None | 103 | 1 | |
bob | 103 | None | 103 | 1 | |
cedric | 104 | 0 | 103 | 1 | |
dave | 104 | 0 | 103 | 2 | |
eric | 108 | -4 | 103 | 2 |