SQL Interview Problem asked during Amazon Interview
Vložit
- čas přidán 22. 06. 2024
- In this video, let us solve an SQL Problem asked during the Amazon Interview.
OdinSchool: hubs.la/Q02CX94v0
Download the scripts used in the video:
techtfq.com/blog/sql-intervie...
Thanks for watching!
Check out the upcoming Data Science bootcamp on OdinSchool: hubs.la/Q02CX94v0
In this bootcamp you're gonna give training or it will be done by someone else?
My solution in PostgreSQL:
WITH CTE AS
(SELECT *
, dates - (ROW_NUMBER() OVER(PARTITION BY employee, status ORDER BY dates)::INT) AS grp
FROM emp_attendance)
SELECT employee, MIN(dates) AS from_date
, MAX(dates) AS end_date, status
FROM CTE
GROUP BY employee, grp, status
ORDER BY employee, from_date
In my ROW_NUMBER function, I have partitioned by employee and status, and ordered by dates. For each employee, the data is grouped by status and ordered by dates. The row number resets to 1 whenever the status changes (from present to absent or vice-versa) within each employee's partition.
I then subtracted the row number from the date to create a group identifier (grp) to identify consecutive dates within the same status for each employee.
CTEs and window functions are new to me in learning stage, but I got this very clearly thanks for the in detail explanation
Hi thoufiq! Here is my simple solution using SQL server:
with cte1 as(select employee, dates, dateadd(day,-1*(row_number()over(partition by employee,status order by dates)),dates) as date_grp,status
from employee)
select employee,min(dates) as from_date,max(dates) as to_date, status
from cte1
group by employee,date_grp,status
order by employee,from_date;
can you explain below part 😅😅
dateadd(day,-1*(row_number()over(partition by employee,status order by dates)),dates) as date_grp
@@mahivamsi9598 -1*(row_number()over(partition by employee,status order by dates) this value will give positive value so he decided to multiply with -1 so that it gets negative value so that difference can be created
My solution:
WITH cte AS (
SELECT *,
CASE WHEN status = LAG(status, 1, status) OVER (PARTITION BY employee ORDER BY dates) THEN 0 ELSE 1 END AS flag
FROM emp_attendance
),
cte2 AS (
SELECT employee, dates, status, SUM(flag) OVER (PARTITION BY employee ORDER BY dates) AS flag_sum
FROM cte
)
SELECT employee, MIN(dates) AS from_date, MAX(dates) AS to_date, MAX(status) AS status
FROM cte2
GROUP BY employee, flag_sum
ORDER BY employee, from_date;
Sir, Is there will be any difference i use iif inplace of case Statment???
Perfect Explanation, Thanks!
Thank you TFQ
I really appreciate .
Man you are legend....great explanation 😮
solving challenging queries from top mnc with nice explanation, great thoufiq keep it up.
This is very usefull information Bro!
Tqs For giving Valueble Infomation.
As always 👍
nice to see u again, bro the last Line of your given Data is a little 0 too much
insert into emp_attendance values('A2', '2024-01-010', 'ABSENT');
the source is alsmost the same
with cte as
(select *, row_number() over(partition by employee order by employee, dates) as rn
from emp_attendance),
cte_present as
(select *, row_number() over(partition by employee order by employee, dates) AS RN2
, rn - row_number() over(partition by employee order by employee, dates) as flag
from cte where status='PRESENT'),
cte_absent as
(select *, row_number() over(partition by employee order by employee, dates) as rn3
, rn - row_number() over(partition by employee order by employee, dates) as flag
from cte where status='ABSENT' )
select employee
, first_value(dates) over(partition by employee, flag order by employee, dates) as from_date
, last_value(dates) over(partition by employee, flag order by employee, dates
range between unbounded preceding and unbounded following) as to_date
, status
from cte_present
union
select employee
, first_value(dates) over(partition by employee, flag order by employee, dates) as from_date
, last_value(dates) over(partition by employee, flag order by employee, dates
range between unbounded preceding and unbounded following) as to_date
, status
from cte_absent
order by employee, from_date
with specification rn2, rn3 in MS SQL Server
@TFQ can we use min and max instead of first_value and last_value in the window function?
How difficult sql queries are to write on real job senario? Intermediate or hard ?
have a question for you @techTFQ , how much time u have taken to come up for this solution ? just curious to know an approximate time
I struggled with this. The rn - rn where status = X is a cool pattern.
You can achieve the same differentiation between employees based on status by simply using rank() and partitioning it by employee, status (as in the first cte).
WITH rank_cte AS (
SELECT
*,
rank() OVER(partition by employee, status order by dates) as r
FROM emp_attendance
ORDER BY employee, dates
),
consec_cte AS (
SELECT
*,
r - row_number() OVER() AS consec
FROM rank_cte
)
SELECT
employee,
MIN(dates) AS start_date,
MAX(dates) AS end_date,
status
FROM consec_cte
GROUP BY employee, status, consec
ORDER BY employee, start_date;
Here's my take on it via MS SQL server for given dataset
=================================================
with cte as (select *,
day(dates) - row_number() over (partition by status, employee order by dates) grp
from emp_attendance)
select employee, MIN(dates) as from_date, MAX(dates) to_date, status
from cte
group by grp, employee, status
order by employee, from_date
=================================================
Can you please start a Snowflake Bootcamp ? Will be really helpful.
Share some tips to get into a product based company
Sir may we solve this problem using lag() window function?
If you have any time gap
Please make a video about
Frequently asking interview questions in sql for Capgemini interview...
Hello my solution in Sql Server:
WITH FLO AS (
SELECT *, CASE WHEN STATUS LAG(STATUS,1,'OPOUI')OVER(PARTITION BY EMPLOYEE ORDER BY DATES)THEN 1 ELSE 0 END
AS FLAG
FROM EMP_ATTENDANCE
), FLO1 AS (
SELECT * , SUM(FLAG)OVER(PARTITION BY EMPLOYEE ORDER BY DATES)AS GRP
FROM FLO
)
SELECT EMPLOYEE, MIN(DATES)AS FROM_DATE, MAX(DATES)AS TO_DATE, STATUS
FROM FLO1
GROUP BY EMPLOYEE, STATUS,GRP
ORDER BY EMPLOYEE, FROM_DATE
Hope it helps.
MySql solution: with cte as (
select *, row_number() over (partition by employee, status order by dates ) as rw,
dates - row_number() over (partition by employee order by employee) as diff from emp_attendance
order by employee, dates
)
select employee, min(dates) as from_date, max(dates) as to_date, status from cte
group by employee, status, diff
this one is a very tough question, for what level role was this question asked 😰
very hard to think about this question and finish in 30 mins
Hi Taufiq ,Please confirm my solution is how optimal?
with cte as(
SELECT *,lead(status,1,null) over(partition by employee order by dates) as next_day,min(dates) over(partition by employee) as start_day FROM emp_attendance)
select employee,date_add(LAG(DATES,1,DATE_SUB(START_DAY,1)) OVER(PARTITION BY EMPLOYEE order by dates),1) AS FROM_DATE, dates as TO_DATE,status from cte where status!=next_day or next_day is null;
MYSQL Solution
Select employee,Min(Dates) as From_date,Max(Dates) as End_Date,Status from
(Select *,subdate(Dates,interval Row_Number() over
(Partition by Employee,Status Order by dates) Day) as Seg from
Emp_Attendance)N group by employee,Seg order by Employee, Dates;
Where we can find the dataset?
In the description box click on script link and download that script you’ll get all queries
with cte as (select *,
Date - INTERVAL '1' DAY * (row_number() over(partition by Employee, Status order by Date asc)) as rnk
from EMP_ATD)
select
Employee,
min(Date),
max(Date),
Status
from cte
group by Employee, rnk, Status
order by Employee, min(date);
I have tried to solve this in MYSQL.
with cte as (
select *,row_number() over(partition by employee order by dates) as rn,
row_number() over(partition by employee,status order by dates) as rn1
from emp_attendance
)
select employee,min(dates) as from_date,max(dates) as to_date,status
from cte
group by employee,rn-rn1,status
order by 1,2
nice
Someone can pls solve this infosys interview question,
Text1 3
Text2 5
Text3 4
Output should be
Text1
Text1
Text1
Text2
Text2
Text2
Text2
Text2
Text3
Text3
Text3
Text3
Query should be single line query.
MS SQL approach
with a as (
SELECT *, ROW_NUMBER() over(partition by employee order by dates) rn
from emp_attendance)
,b as (
select *,rn - ROW_NUMBER() over(partition by employee order by dates) rn2
from a
where status like 'PRESENT')
,c as (
select *,rn - ROW_NUMBER() over(partition by employee order by dates) rn2
from a
where status not like 'PRESENT')
select employee, status, min(dates) from_date, max(dates) to_date
from b
group by rn2, employee, status
union
select employee, status, min(dates) from_date, max(dates) to_date
from c
group by rn2, employee, status
order by 1, 3
Sir but there should be a query related to MSSQL,because there are people who are using MSSQL only.Need a Practice session on MSSQL
with cte as(
SELECT
*,
ROW_NUMBER()OVER(PARTITION BY employee, status ORDER BY dates, status) - ROW_NUMBER()OVER(PARTITION BY employee ORDER BY dates, status) as rnk1
FROM attendance
ORDER BY 1,2
)
SELECT
employee,
min(dates) as from_date,
max(dates) as to_date,
status
FROM cte
GROUP BY employee,status ,rnk1
ORDER BY 1, 2
with cte as(
select employee,dates,status,DAY(dates)-ROW_NUMBER()OVER(PARTITION BY employee order by dates)rn1
from emp_attendance
where status = 'PRESENT'),
cte2 as(
select employee,dates,status,DAY(dates)- ROW_NUMBER()over(partition by employee order by dates)rn2
from emp_attendance
where status = 'ABSENT')
select employee,MIN(dates)as FROM_DATE,MAX(dates)TO_DATE,MAX(status)as status from cte
group by employee, rn1
UNION ALL
select employee,MIN(dates),MAX(dates),MAX(status) from cte2
group by employee,rn2
ORDER BY employee,FROM_DATE,TO_DATE
Solution Given by claude 3.5 Sonnet:
WITH grouped_attendance AS (
SELECT
*,
DATE_SUB(date, INTERVAL ROW_NUMBER() OVER (PARTITION BY employee, status ORDER BY date) DAY) AS group_date
FROM employee_attendance
)
SELECT
employee,
MIN(date) AS FROM_DATE,
MAX(date) AS TO_DATE,
status
FROM grouped_attendance
GROUP BY employee, status, group_date
ORDER BY employee, FROM_DATE;
Isn't this approach more straight forward?
WITH grouped_attendance AS (
SELECT
employee,
dates,
status,
DATE_SUB(dates, INTERVAL ROW_NUMBER() OVER (PARTITION BY employee, status ORDER BY dates) DAY) AS group_date
FROM emp_attendance
)
SELECT
employee,
MIN(dates) AS from_date,
MAX(dates) AS to_date,
status
FROM grouped_attendance
GROUP BY employee, status, group_date
ORDER BY employee, from_date;
I guess you are always overcomplicating things don't know why!
WITH cte AS(
SELECT
EMPLOYEE
,DATES
,STATUS
,rownum - SUM(CASE WHEN STATUS = 'PRESENT' THEN 1 ELSE 1 END) OVER(PARTITION BY EMPLOYEE, STATUS ORDER BY DATES) AS test
from emp_attendance
--where EMPLOYEE = 'A1'
ORDER BY EMPLOYEE, DATES
),SUMMARY AS(
SELECT
EMPLOYEE
,status
,test
,MIN(DATES) AS FROM_DATE
,MAX(DATES) AS TO_DATE
FROM cte
GROUP BY EMPLOYEE ,status,test
ORDER BY FROM_DATE
)
SELECT
EMPLOYEE
,FROM_DATE
,TO_DATE
,status
FROM summary
ORDER BY EMPLOYEE ,FROM_DATE;
create table emp_attendance(employee varchar(200), Dates date, status varchar(200));
Insert into emp_attendance values
('A1','2024-01-01','PRESENT'),
('A1','2024-01-02','PRESENT'),
('A1','2024-01-03','PRESENT'),
('A1','2024-01-04','ABSENT'),
('A1','2024-01-05','PRESENT'),
('A1','2024-01-06','PRESENT'),
('A1','2024-01-07','ABSENT'),
('A1','2024-01-08','ABSENT'),
('A1','2024-01-09','ABSENT'),
('A1','2024-01-10','PRESENT'),
('A2','2024-01-06','PRESENT'),
('A2','2024-01-07','PRESENT'),
('A2','2024-01-08','ABSENT'),
('A2','2024-01-09','PRESENT'),
('A2','2024-01-10','ABSENT');
select * from emp_attendance;