Longest Duplicate Substring | TRIE | Rolling Hash | Binary Search | Leetcode
Vložit
- čas přidán 18. 06. 2020
- This video explains a very important programming interview problem which is to find the longest duplicate substring in the given string.There are many ways to solve this problem.I have explained 3 methods to solve this problem.The first method is by using dynamic programming because this problem is a variant of LCS or longest common substring.DP fails because we can't make a large table.The second approach is to solve it using TRIE.In this approach, we keep inserting all substrings of given size and whenever during insertion in our trie, we finnd that the substring is already present then that must be a common substring.We apply binary search on length of substring to find the maximum length for which there are two duplicate substrings.The third approach is by using rolling hash, binary search and hashmap.This is making use of rabin karp algorithm.This is the fastest among the 3 methods explained.The fastest method to solve this problem is by using Ukkonen's suffix tree, which solves this problem in two steps in just O(N) time.I have explained all the methods with intuition and proper examples.I have also shown the code walk through at the end of the video.
CODE LINK is present below as usual. If you find any difficulty or have any query then do COMMENT below. PLEASE help our channel by SUBSCRIBING and LIKE our video if you found it helpful...CYA :)
=================================================================
INSTAGRAM: / surya.pratap.k
LinkedIn: / surya-pratap-kahar-47b...
=================================================================
CODE LINK: gist.github.com/SuryaPratapK/...
USEFUL LINKS:-
Longest Common Substring (LCS): • Longest common substri...
Basics of trie: • Basics of trie
TRIE Insertion & Search: • Trie insertion and search
TRIE Deletion & Search: • Trie deletion and search
Rolling Hash (Rabin Karp algo): • Rolling hash | Rabin k...
how does this channel not have more subs? some of the most detailed and simplified lc explanations here
:)
agreed
he is awesome man
Every morning I wake, I watch your video! YOu became my morning routine! Thank you so much
😂Hahahaa....Always nice to gather knowledge.
Thanks for the Series, I watched all the videos in this series!
Great explanation, thank your, "The man , the myth , the legend" learned this and use it, :)
Thanks :)
This is the Best LeetCode problem channel on CZcams by miles.
You declared me a leetcode 😅
This video has prerequisite, a lot to learn
This is brilliant, kudos to your dedication!
Thanks :)
This is too good... thank you. I hope the solution can be further improved if you calculate the hashes using two hash function in order to avoid rechecking the string when collision occurs . If the hashes of two string are same then compare the hashes which was calculated using other hash function. So the probability will be (1/m1 * 1/m2). I will try to solve and post the solution.
Nice :)
Hi Folks,
Is there any reason behind to choose 10000007 number?
Before posting this question I did a little bit of Googling and what I found about 10000007 is that 10000007 is used to avoid overflow and it's a first 8 digits prime number. And as mentioned in the video as well that he is taking the prime number as 10000007. But actually, 10000007 is not a prime number, 10000007 is divisible by 941.
So what is the significance behind using 10000007 number? There are prime number bigger than 10000007 for example(10000019), why we are not using those numbers?
I'm asking this question just out of curiosity, there is nothing wrong in your algorithm. Your code is working absolutely fine.
Thanks in advance.
@@MinhNguyen-fs6qh Can you please post the link here.
The number to take is 1e9+7 (1000000007) not 1e8+7 (what you have mentioned). The number 1e9+7 is a really common modulo and is prime too Also since approx 10^9 (2^32) is limit for long int (or long in c++).
Best explanation so far for LDS problem..
The man , the myth , the legend!
What is this dialogue 😂
Scott Sterling!!
thank you so much for detailed explanation!! :))))
Very clean and direct explanation, thanks a lot!
Welcome :)
Wow. This is a great explanation. Thanks. Keep doing the good work.
Welcome :)
Thanks for posting. I would like to specially appreciate your effort to explain the DP approach and the down side of it for this problem ( given the constraints). Infact, I haven't yet gone thru full video.
Go through man. It took me 6 hours to create this after long day at office. I din't sleep the entire night working on this :(
@@techdose4u I did go thru, thanks for wonderful explanation.
Amazing, i wait for your video, No one explain better that you❤
Thanks :)
Excellent video quality, detailed explanations and dedication!
Thanks :)
Sir I have no words for how much I appreciate this
Thanks agile :D
Thank you for your great video! What tools are you using to draw your explanation?
This video is much better than the video that you made on rabin karp ...So thankful to you
😅
Well explained
Feels like i ve arrived at right place to master DS and Algo
😅
Nice explanation
GOAT 🙌
Hats off to your dedication and explanation
What's a GOAT 😅
@@techdose4u Greatest of all time :)
Its 4.30 am and I don't care .... I'm going to watch it.
😂 That's the spirit :)
It's 5:12am but even I'm gonna watch it !!
Night Apes strong together xD
It's 8:04 and i don't care ,I'm gonna Watch it and Memorize Everything😎😎
😁
😂
good work .......your approach always helps to all the beginners.....
Nice :)
The Extraordinary
Thank you very much for your video❤
Welcome :)
Its 5:25 and i am gonna start my day with this video . : )
Nice attitude 😂
who all here from today's leetcode challenge XD
😂
You deserve many more likes and subscribers!
Thanks :)
puri video ek tarf last ka subscribe krne wala part ek taraf!
😂
Sir you are an inspiration!
Thanks :)
What software do you use for your sketches?
Loved it!
Thanks :)
Your explanation is really awesome...👏
Thanks :)
Can anyone please explain why dont we consider string of length 4 while doing binary search?
Awesome video. Can you also tell how you came up with the solution, what ticked you in right direction?
Your videos are awesome
Thanks :)
Your dedication👍👍👍
😅
Nice explained!!! Keep it up!!
Thanks :)
Thanks for the explanation. Unfortunately, your rolling hash solution gives TLE in one test case. I am unable to find a proper cpp solution for this problem to get ac :(
amazing concept!
:)
You deserve million subscribers
Thanks bro :)
@@techdose4u & soon it will be hit 1 Million
Sir there is an space optimization approach in lcs can we solve with that approach
In which we take only lcs[2][n+1]
I was also trying this way,but seriously messed up my code and left it😅
Btw,Did u try solving this way?
This approach is O(n) time complexity
Will give tle
Sir your explanations are awesome. Kindly put videos for leetcode biweekly contest and weekly contest too so that we can understand the problems which we were unable to solve during the contest.
I don't get time bro...otherwise I would. Maybe sometime in future I will try :)
@@techdose4u Okie bro. May be you can solve only the hard problem and post if you have time.
I was just talking about the hard one 😅 First 3 are generally simple. Hard ones takes a lot of effort to prepare and make video. 4th question preparation will take more time than first 3 combined 😅
Dhanyawad Dada
Thanks ☺️
Great vedio. I have a doubt why are you adding the prime number p in line no 18 in the code shown in vedio 28:34??
Great explanation, but I have a little bit confused that why do we need + p here?
hash = ((hash-roll[size-1]*(s[j-size]-'a'))%p + p)%p;
Same Confusion for me also bro
whenever we are subtracting two numbers and take module, the chances are result can be negative. negative hash value will give wrong answer. So to get assured +ve answer we add modulo to it and then take modulo.
Excellent explanation.
Thanks :)
You can get rid of the logn of the map by using a hash map, and handle the collison by using pairs of integers with different mods and bases
nice video, but if we can add some details like why the ml = 3 it's important to understand what is the logic behind it.
Finally aa gyi video!Hats off to u sir
You still up 😂
@@techdose4u yup,aapse motivate hua hu😅
😂 Ye kaisa motivation hai bhala!!
@@techdose4u ur commitment towards work serves as motivation for us😅
Nice :)
Mass 🔥🔥🔥
😅
Thank you so much sir
Welcome :)
Why is the Time Complexity of the Rabin-Karp algorithm O(log(N) * N * log(N)), specifically why is the last factor log(N)? We're inserting into a hash with O(1) speed and need to check collisions with O(K) speed.
Most sites say that the time complexity of Rabin-Karp is O(N + K) on average and O(NK) at worst. Shouldn't this mean that the total time complexity of binary search for substring length and Rabin-Karp combined is O(log(N) * (N + K)) on average and O(log(N) * N * K) at worst?
sir in the trie approach if on the very first iteration we didnt got any string which is already present then where do we move. beacuse we dont know what shuld be length of string. so sir can you please give me the code for trie+bs approach. pls sir.
I tried trie + bs approach, it's giving MLE. :(
Let me know if u want the source code.
I have tried submitting your solution given on github on leetcode but it is giving TLE... could you please check?
Excellent explanation sir. Thanks a lot. But sir actually when I'm trying to implement the trie-based approach in LeetCode, it's showing memory limit exceeded for some larger inputs. Please help me with how to optimize the trie method. Thanks a lot, sir.
Avoid memory leaks
This approach was exactly what I thought yesterday, but I was not able to convert it into code...
Must have got stuck in hash 😅
Can you please upload the video a bit early. BTW, the man, the myth, the legend!. Kudos to your consistency and dedication.
Office days are really hectic. On top of that if we get a hard problem to implement then it takes time. Yesterday I started at 10 PM and ended at 4:30 PM. Pain of making this video was tremendous. But I don't wanna rush. Quality of explanation is more important for me. Moreover, you can watch the video the next day morning as well.
@@techdose4u Yeah, I get it. Such quality content is not available anywhere on the internet. Love your work and upload the video whenever you're comfortable. :)
Can we use same type of rolling hash which is used in Rabin-Karp algorithm?
Yes
Why is time complexity for rolling hash approach O(logn * n * logn)? I'm not sure how the second logn comes. You said in the video it's for map insertion and search. But isn't it O(n) in the worst case that we have to search through all start index in the list? Thanks
But that will be well distributed because of your hash function. We took a strong hash. It is extremely rare to collide when strings are different. If you wanna include that then exclude Map key search because logN is worst case. You can take LogN on avg for both collision and map key search. Sounds better?
@@techdose4u Still confused about second log n term. Do you have a video explaining this?
Thanks for the clear explanation. Can't we use java String classes hashcode method and store it in map instead of using custom hashcode logic ?
No idea on java :)
@@techdose4u Thanks for responding :) . After thinking a bit I realized the java String hascode function will scan all characters so the rolling hash has better time complexity.
@@harigovind11 great ❤️
Gajab
😅
if we take % on each term as (1*26^2)%p + (2*26^1)%p + (3*26^0)%p , then also it will overflow?
No. But try to add two values and take mod again. Don't go on adding all values and then take mod. That will not give correct result.
@@techdose4u okay. Thank you!
instead of rolling hash can't we just slide the window and save in hash (unordered_map) and when it matches return the substring? code will be very short with same time complexity
This is very costly operation. We assumed logN for key search because key was INT. If you took key as string then if map is implemented using string as it is then time will go to NlogN just in map. Instead of rolling, you can simply take a roll varialble and subtract last hash weightage and add new hash. This is rolling hash by using 1 variable. But, debugging is easier taking utility array roll. That's why I took it.
@@techdose4u Thanks for the explanation. I really appreciate your work.
Can Alpha be an value instead of 10 and 26?
can we use sliding window approach here??
Rolling hash is sliding window technique.
i have doubt that last method you told we multiply for "ban => ( (26^2)+b + (26^1)+a + (26^0)+n )"
but here you added in hash var above loop calculate hash => "ban => (26^0)+b + (26^1)+a + (26^2)+n )"
why this is so??????????????????????
at 18:52 as your first hash value for "ban" is 1352 but in your code "ban" hash value is 689 why this happen?
We are sliding incrementing the the weightage. 1st time you have only B with weightage 2. Next time when you include A then B's weightage will increase by 1 digit. So hash will be multiplied by 26 (base value) and then add new char weightage 1 for A. Now once you calculate the entire window then you apply sliding window.
@@techdose4u This step is little confusing..It will be great if you could take an example and walk thru the code with the example
same question. did you get the ans?
did not understand this your first hash value for "ban" is 1352 but in your code "ban" hash value is 689 why this happen @@techdose4u
I used a set and applied binary search just like your second approach using a Trie, but I get TLE. Any ideas why this happens?
I also early stop my my inner for loop, when there is a first match.
Here is my code:
class Solution:
def longestDupSubstring(self, S: str) -> str:
if not S or len(S) == 1:
return ''
d = set()
left, right = 0, len(S)-1
result = ''
while left
I hit the MLE with a similar approach
I tried by trie data structure but stuck ..
I've a doubt that if i check on mid = 4
and I didn't get any Duplicate Substring of size = 4 then left = mid - 1 ???
No. Right = mid-1.
@@techdose4u yeh sorry right = mid - 1....but it's failed to some test cases :(
Trie with Binary Search approach in C++ seems to fail TC 3...
I'm a little confusing... Why would we take the middle length instead of going through the whole String and build a very big Trie? Does it mean it's impossible to have a duplicate substring which length is longer than half the string?
I am doing binary search on length. 1st we will try by half length. If we find duplicate substring then we will increase our length like we do in binary search remember! So, there will be no problem. Did you understand?
@@techdose4u Thanks!!! This problem is way too hard...
I got asked this question as Uber interview today. Sad that I could not crack it 😞😞 This seems too difficult to crack during 1 hr of interview
Don't worry. Keep learning for upcoming interviews 👍🏼
plz make a video on median of two sorted arrays
👍
Bro,A small request can u please explain mockvita questions in cpp
Bro I am not getting time for it. Daily I am uploading something and I also go to office for 9 hrs.
Hello, thanks for the video.
I have a question: I used Binary search for the length, then sliding window + a map where the key is a substring and the value is a list containing the beginning positions of this substring.This gave me TLE even though it has the same complexity as the algorithm with the Hash (I am jut using a string as Key instead of a Hash...)
Can you help please? thanks :)
Here's the code (Python):
class Solution(object):
def count(self, window, S):
m1 = defaultdict(list)
left = 0
right = window
sub = S[left:right]
m1[sub].append(left)
while (rightlen(maxStr)):
answer = ans[0]
maxStr = ans[1]
left = mid+1
return maxStr if answer!=-1 else ""
I have the same doubt . Do you know why its exceeding the time limit .
For comparison of keys in the map when we use a substring, the time taken is about O(n) . Whereas for comparing hash keys the time is constant.
holy crap
Take dose..!! Haha
Not able to pronounce is correctly even though I know the correct pronunciation 😂 I hope trollers will spare me 😂😂
You’re very kind my friend. You’re videos make hard problems easy. No more trolling. People love to ‘Take dose of Tech dose’.
Sir your actual code exceeds the time limit. What might be the issue??
2 reasons:-
1. Maybe new test cases require more optimization
OR
2. New test cases might be covering extra boundary cases so that the code falls in infinite loop.
Please take the latest code from leetcode discussions and try to compare or try with that. That will be helpful.
@@techdose4u ok sure. I will go through the discussions.
slide window me hash kaise nikal rhe....nhi smjha :( koi smjhayega kya
bhaiya suffix trie valla bhi smjha do...us trick se bhi bht questions solve hotte hai..
pura youtube me kisi ne kraya bhi nhi hai suffix trie
Bro I din't get time. Already yesterday I din't sleep. Ukkonen suffix tree will be explained later when I finish covering all the basics.
@@techdose4u koi nhi bhaiya .... Aapka content vaise bhi bht tagda hota hai....
LCS can be done in 0(n) space
:)
5:08 can we do it on our local machine?
Will it work?? 😅
No. First thing is, you want to know how much memory your OS allocates for your compiler. You can't go beyond that even though you have more memory. If somehow you manage then you need more than 40GB ram. If you take care of these steps then it might take very long to run 10^10 computations. So, it's not feasible.
@@techdose4u 40GB RAM is my dream 😂😂
@@techdose4u Understood 🙏🏼
So I need a, supercomputer to run this algorithm 😂
((hash-roll[size-1]*(s[j-size]-'a'))%p + p)%p;
Can someone explain why we are adding +p in the above code.
Could anyone tell why use number 10000007 ?
Use 1e9+7. I took it unknowingly that 1e7+7 is not a prime. It is divisible by 2 numbers. But it won't create any problem. Read Universal Hashing.
@@techdose4u Thanks for reply!
Could you provide your codeforces Id
No Id on CP platform as I never did CP.
Just for the record it is actually way more difficult to walk on a cake. Just saying. xD
How do you get so smart?
Just keep coding daily. Those who practice more gets better. See William lin and Errichto. I am nowhere compared to them but I am trying to reach there someday. So, I code daily to take a step forward.
please provide code for trie solution.
Please that in discussions section. It's present there.
@@techdose4u it's by using rolling hash concept.
leave a cm before watching!
:)
Can u help me with this problem by making video or sending code,
Lazy Student
Problem Description
There is a test of Algorithms. Teacher provides a question bank consisting of N questions and guarantees all the questions in the test will be from this question bank. Due to lack of time and his laziness, Codu could only practice M questions. There are T questions in a question paper selected randomly. Passing criteria is solving at least 1 of the T problems. Codu can't solve the question he didn't practice. What is the probability that Codu will pass the test?
Constraints
0 < T
Understanding rolling hash algorithm in this question is very difficult 😕
Bhai why we are taking window of size 3, there can be duplicate substring of size 4 or more as well.Please clarify
I already showed in the video that low=1 high=6 and mid=3. So we will search for 3. If found then we will update low and mid and search for higher values. Please rewatch.
@@techdose4u Ok thanks bro great tutorial
still tle 60/66 passed in python.
why not store the string in map rather than hash of string
storing string for each window takes more time, for each window we need time proportional to the length of string. By rolling hash this step is done in O(1) time.
Why are you searching smaller substrings when you have found one match because any search after that would have less length than the last successful search???
You totally skipped the time/space inefficient brute force:
seen = set()
longest = ""
for i in range(len(S)):
for j in range(i+1,len(S)+1):
w = S[i:j]
if w in seen:
longest = w if len(w) > len(longest) else longest
else:
seen.add(w)
return longest
The man , the myth , the legend!
Is it a famous dialogue I have never heard 😂
When someone does some kind of extraordinary work then we use this phrase :)
😅 Got it.