Word Break - Dynamic Programming - Leetcode 139 - Python
Vložit
- čas přidán 2. 07. 2024
- 🚀 neetcode.io/ - A better way to prepare for Coding Interviews
🐦 Twitter: / neetcode1
🥷 Discord: / discord
🐮 Support the channel: / neetcode
Twitter: / neetcode1
Discord: / discord
💡 CODING SOLUTIONS: • Coding Interview Solut...
💡 DYNAMIC PROGRAMMING PLAYLIST: • House Robber - Leetco...
🌲 TREE PLAYLIST: • Invert Binary Tree - D...
💡 GRAPH PLAYLIST: • Course Schedule - Grap...
💡 BACKTRACKING PLAYLIST: • Word Search - Backtrac...
💡 LINKED LIST PLAYLIST: • Reverse Linked List - ...
Problem Link: neetcode.io/problems/word-break
0:00 - Read the problem
5:05 - Drawing Explanation
12:48 - Coding Explanation
leetcode 139
This question was identified as a google interview question from here: github.com/xizhengszhang/Leet...
#word #break #python
Disclosure: Some of the links above may be affiliate links, from which I may earn a small commission. - Věda a technologie
🚀 neetcode.io/ - A better way to prepare for Coding Interviews
In Java same algo not working
import java.util.*;
class Solution {
public boolean wordBreak(String s, List wordDict) {
boolean[] dp = new boolean[s.length()+1];
Arrays.fill(dp, false);
dp[s.length()] = true;
System.out.println(Arrays.toString(dp));
for(int i = s.length(); i >= 0; i--)
{
for( String w: wordDict)
{
if( i + w.length()
@@sunilgadhetharia2926 you need to use equals in Java to compare strings
s.substring(i, i + w.length()) == w should be s.substring(i, i + w.length()).equals(w)
if you use equal, it is like comparing the pointers of the variables, not their contents
var findRepeatedDnaSequences = function(s) {
let count={};
for(let left=0;left=2)
res.push(k)
}
return res;
};
Since when is a sliding window o(n*n)?!!!
@NeetCode can you explain the time complexity of the dp solution ?
Doing great, soon you will be the first leetcoder to have a "teamblind 75" playlist!
you mean a neetcoder 😃
Awesome Explanation. You are the only guy who explains DP solutions with ease. I'm slowly getting comfortable with DP just because of your tutorials. Thank you so much.!!!
These are the best leetcode videos, no doubt. Thank you for the consistent and clear explanations.
Amazing videos. You've changed how I understand and think through problems instead of just memorization... Thank you!!!!
i was confusion till i watched this back a few times, I finally get the idea of why you are doing it this way. THANK YOU!!!
In my opinion, even the first brute force approach will require backtracking. Assume the wordDict = ["l", "leet", "code"], s = "leetcode". So now if we start matching then the first character "l" will match with wordDict[0] and now we will start searching for remaining "eetcode" and we won't find a match which is wrong. Hence we will have to backtrack and select "leet" instead of "l" to start with.
exactly what I thought
For eetcode dp[0+1] will be false so loop will continue until it finds leet where dp[0+4] was previously set true as code was found.
Wow!! Amazing explanation. Thank you so much! I learned a lot from your videos. You're a hero!! Please keep uploading these quality videos!
Super crisp explanation. I understand your videos in just 1 go.
These days I am searching for 'Leetcode # neetcode' in YT.
Thanks dude.
same my thoughts
same!!
typing "neetcode #" is easier 😄
Great job! Please do Word Break II as well. Curious to see if we can use the same bottom up dp approach to store all possible sentences.
def wordBreak(s: str, wordDict: List[str]) -> bool:
# Convert the wordDict to a set for faster lookup
wordSet = set(wordDict)
# Create a dp array of size s + 1
dp = [False] * (len(s) + 1)
# Set the first element of the dp array to True
dp[0] = True
# Iterate through the dp array Bottom Up
for i in range(1, len(dp)):
# Iterate through the dp array again
for j in range(i):
# If the dp[j] is True and the substring from j to i is in the wordSet
if dp[j] and s[j:i] in wordSet:
# Set the dp[i] to True
dp[i] = True
# Break out of the loop
break
# Return the last element of the dp array
return dp[-1]
Amazing, no other video on youtube explains this as crisp!
Thanks!!You saved my day, i was struggling for many hours until i watched this video!
Complexity appears incorrect (for the brute force). Each time a word is encountered, we have to break into 2 subproblems - we can either take the word, or continue without it. So it's exponential (2^n)
It gets split into two substrings, but only the right side is a subproblem which continues splitting. The left substring just gets checked against the dictionary as is, and only when it is present do we take the right substring as a subproblem. So it isn't 2^N in brute force.
@@toolworks It is 2^N. There's a proof in the discuss section in Leetcode Website for this problem
@@mandy1339 It is not 2^n.
You can make all substrings in O(n^2). You can make all sub sequences in O(2^n).
This problem is not 2^n
@@jacoballen6099 You can certainly get all substrings in n^2. However, each time you will be comparing two substrings only, while the string can be formed of 3 or more words
Thank you! Awesome thinking here!
You're so smart, I learned a lot from your channel.
How do we decide whether to start the DP table from the end or from the start? Is there anything particular in this problem that made you think we have to start from dp[8] as the base case and not dp[0] for the word "neetcode"?
can start from anywhere, it doesn't matter
I came up with a recursive top-down solution, using the substrings as the keys in the dp cache. It seems to be pretty much equally efficient.
This is a great explanation - best I've seen. Thanks!
omg you are the best. my saviour. tears of joy. please always make these vids.
the best explanation, as always!
awesome and clear explanation!
Very interesting solution. Great job!
Phenomenal explanation. Thank you
your explanation clarifies all my doubts about the solution. Thanks for creating this awesome resource.
btw what tool do you use to explain with free hand sketching in different colors, is it done through any specific software? thanks again
Nice. This is similar but just explicitly codes in how we're marking valid word breaks with 'True'.
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * (len(s) + 1)
dp[0] = True
for i in range(1, len(s) + 1):
for w in wordDict:
if dp[i - len(w)] and s[i - len(w) : i] == w:
dp[i] = True
return dp[-1]
Thanks this helped!
Great job!! Thanks a lot!
very clear explanation!!
Adding following line at the start will improve total time required
wordDict = set(wordDict)
As there are so many duplicate words in some test cases
It only slightly improves computation time. Doing this will change wordDict lookup from O(w) to O(1), where w is the size of our wordDict. However for this problem, 'w' is restricted between 1 and 12, so at worst wordDict lookup as a list will be O(12) = O(1).
There are no duplicate words in the wordDict. This is specified in the problem statement.
@@kockarthur7976 Odd because it says now word dict length is 1 to 1000, and that also means the initial proposition of neetcode of going over possible substrings O(n^2) which is actually O(nm) because the length bound on a dict word is 20
So the initial ends up with O(nm^2) vs. O(nwm) and m is much smaller than w
Maybe they changed the question/tests recently?
Thank You So Much NeetCode Brother...........🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻
Neetcode you are god. The way you teach is next level. Harvard should hire you
Thank you for your explanation! Could you also make a video of Leetcode 140 Word Break II ?
4:25, actually the problem description says the opposite. wordDict is longer.
1
wordsDict[i] matters in this case not worddict.length
well explained! thank you very much good sir
The brute force approach is 2^n , not n^2
thank you, i dont see how someone could do this without knowing the solution already
Brilliant. Took me two days to understand this. One thing I do not get, how do we know for certain we do not need to compare all the words in the dictionary -- we return immediately from dp[i] break;
You are doing god's work here. Thank you.
Very nicely explained
I think the runtime for brute-force without memoization would be O(2^n), not O(n^2). String search would cost another O(m).
okk I think no is taking about my apporoach. What i did was use a trie data structure which has EXTREMELY good lookup for word prefixes. Thus at the cost of some space, i was able to solve this problem in a very efficient way
Such a fantastic problem!
I can only come out with brute forced solution, and get stucked what should i cache to speed up, and you explained it in a clear way, thanks.
Interestingly, looking at the bounds of the question they may have changed.
Now the length of the string (n) is 300
No. of dict words is 1000 (w)
Length of a dict word is 20 (m)
This means the initial proposition of finding every substring O(n^2) which is actually O(nm) because you only need substring of length 1 to m
vs.
Going through every dict word at every index O(nw)
Both also need an extra factor of m to actually check/get the substring but still technically the first (whilst also using a set) is technically better
Does it matter if I solve a problem in Top-Down approach and not Bottom-Up approach in an interview if the approach is not specified by the interviewer?
great video
1 note: you can also start from the beginning, you just need to slightly adjust the logic
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
words = set(wordDict)
sizes = set([len(ele) for ele in words])
n = len(s)
dp = [False] * (n + 1)
dp[0] = True
for i in range(n):
for size in sizes:
if i + size
Great explanantion!!
Hey correct me if I am wrong but the approach that you did would be O(n^2*m) since you are slicing the string s as well, if you were to slice from 0 to len(s), wouldn't that be running in O(n) time? so the outer for loop is O(n) then inside you iterate through the word dict which is O(m) then inside the word dictionary you are slicing the string with worst case slicing be O(n) => O(n*m*n). Does it sound correct with you?
I think this is just O(n*m) because the description says that the max length of a word is 20 characters. This means that s[i:i+len(word)] is considered O(1) time.
5:30 what decision tree looks like and how we can cache it
There is no advantage in going from len(s) to 0 in this case. You can do exactly the same thing with normal iteration from 0 to len(s).
No because you set dp[i] = dp[i+lenW] so you need to calculate top down
@@jacobcutts4099You can just return dp[len(s)-1] if starting from 0.
What a god, you are my hero dude.
Would a suffix trie also be a good solution here?
Thank you so much. Your explanation was very helpful. I almost thought of giving up on this one; then saw your video.
I need this video!
what if words in dict are of different lengths? would it help to sort array by length in that case?
Great explanation
Superb! Thank you again for your videos!!
Could you share what drawing tool do you use for your videos? It looks quite nice
You sir are a saviour, well explained and neat solution. The LeetCode solution isn't as clean as this IMO.
the main thing that confused me was `dp[i] = dp[i+len(w)]`, but now i understand it: it's just:
If the word at index i to i +len(w), matches with some word w in dictionary: -->
Then the T/F of dp[i] degrades to/depends on the T/F of dp[i+len(w)]
I don't understand why we break it if dp[i] , what happens if two words match the suffix?
PS: if anyone didnt understand like me
I think once we have dp[i] as True we dont need to check for rest of the words in wordDict since its anyways True, so its sort of a little optimization to break the for loop early
Same. Trying to understand this part
@@tonyiommisg I think once we have dp[i] as True we dont need to check for rest of the words in wordDict since its anyways True, so its sort of a little optimization to break the for loop early
Updated the main comment with the answer as well
Potential reason to use a Trie:
- narrow down the list of words to iterate over
As the list of words is considerably small the optimization might be an overkill. However the case where a Trie would make perfect sense is if we were operating on the same set of words but checking different strings.
Eg: wordDict = Entire english dictionary (approx. 700K words), checking multiple strings against the same dictionary.
Thank you so much...!!
Thank you for sharing. Could you please explain more about why dp[8] is True?
This is a bit late, but the point is that if you add the length of the string, in this case "code" from index 4, it should end up on length of the string. Meaning it's +1 greater than array. So it would be dp[4] is equal to dp[4+4] which is dp[8], our base case. so then we match dp[0] with neet which gets it's value from dp[4]
An optimization would be to first check if dp[i+len(w)] is true before you go the compare. This way you might save some time comparing strings that even if you find it much, you are going to put a false in dp[i]
Thankyou so much
Great video
i think it is better taking first letter and size from the selected word of the dictionary ,then check the word in the string .. instead of checking all the strings of size n
not necessarily. You can still hash the comparison...
Very nice explanation. I'm just confused with 1 thing how come time complexity is O( n^2 m)
so ideally there are just 2 loops one iterates over input string and inner loop iterates over each dic word and does the comparison, so complexity should be O(nm) (n= len of string) (m = max length of word in dict). What am I missing here?
once u get a match with a dict word(which is O(m*n) ), u have to repeat the entire process again with the remaining part of the word. so n * (m*n).
dang that is hard, thanks!
i hate this problem
I don't think solution will work for every case. What if the word dictionary for the same neetcode would be [nee, leet, code, ode, ne, etcode]. As you can see the word can be made from ne+etcode but the algo will look "code" and then rest of the "neet" or word break of neet. I think the problem is that once it find a smaller substring "code" in the word dic it stops looking for any super set eg "etcode" that could also be found in the dictionary. Is my understanding correct.
I was thinking about the exact same thing. like how this code can avoid that
If you use that as a test case in leetcode it works. Becuase j is starting at 0 every time, eventually i will be at s.length and j will be at 0. Then j will increment until it reaches "ne" which will be true since it's found in the dict. Then the next check will see that a substring from j-i is also found in the dict, "etcode", and will set the last value in the dp array to true. Remember it doesn't truly care about what words it has found so far, only if the last value in the dp array is true or not. The other words only serve as places that the algorithm can check against. You would be right if we were removing words from the dict as we found them in the string, but we're checking the whole dict every time and the algorithm checks every combination of substrings.
There are test cases for this in leetcode already.
@@joereeve2569 The solution from the video does not pass on Leetcode, they must have fixed the test case. You state that it checks every word in the dictionary everytime, but we can see clearly in the code @neetcode provided he breaks the loop when a match is found.
When I run this algorithm with this input:
"cars"
["car","ca","rs"]
it matches "rs" at index 2, then it gets down to index 0 and matches "car", which causes the test case to fail since the "r" cannot be used twice. He is not keeping track of the index that was last matched. If I modify his algorithm to keep track of that information, it passes the above test case, but then in turn fails on this test case:
"aaaaaaa"
["aaaa","aaa"]
@@jshpro3 Hey Josh, I went and tested my algorithm again to see if it failed against the fixed test cases you mentioned and it still passed. It also returns true for both test cases you just described. It's been so long since I did this I can't remember if I followed this video exactly, but I'll paste my code (it's in kotlin) here for you to see. Let me know if I'm missing anything.
class Solution {
fun wordBreak(s: String, wordDict: List): Boolean {
val dp = Array(s.length + 1) {false}
dp[0] = true
for (i in 0..s.length) {
for (j in 0 until i) {
if (dp[j] == true && wordDict.contains(s.substring(j until i))) {
dp[i] = true
break
}
}
}
return dp[s.length]
}
}
Thanks!
Thank you so much, I really appreciate it! 😃
What's the time complexity for recursive approach with and without caching?
hey neetcode, in 4:32 you said the max size of word dictionary is smaller than max size of string. but i looks like its the opposite:
from lc:
1
i have the same question. the answer may not be a good answer
Actually he meant to say that each word in wordDict is smaller than the string given. But he rather said wordDict
Is it a good idea to further optimize this algorithm by using a hashmap of arrays to store the list of words hashed by the starting alphabet? This would reduce the search complexity
you can use a trie
Isn't the runtime of the DP solution still technically O(n * m * n), where n is length of s and m is length of wordDict
Not sure if helpful but here is the solution flipped around so its top down, this might be even more confusing:
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * (len(s) + 1)
dp[0] = True
for i in range(1, len(s)+1):
for w in wordDict:
if (i - len(w)) >= 0 and s[i - len(w): i] == w:
dp[i] = dp[i - len(w)]
if dp[i]:
break
return dp[len(s)]
what do you use for writing on computer and drawing?
Binary search tree camera next!
Great explanation. I have one question, I still am not sure why
for i in range(len(s)-1, -1, -1):
for w in wordDict:
has a time complexity O(n*m*n) where n=len(s) and m=len(wordDict). Why is it not O(n*m)?
Late response here, so you may have already figured it out (and I hope someone corrects me here if I am wrong):
I believe that last n in the big O notation is because that is the time it takes to compare both strings against each other. Comparing strings is a linear time operation I believe. So when we are checking to see if one of the words in wordDict is contained within our input string s, its possible that the word in wordDict could potentially have the same length as s. So in the worst case if the word in wordDict does in fact have the same length as s, (and we are implying that n = s.length), then we are comparing two strings of length n against each other, which would equate to the last n you are seeing in the Big O notation. Because we could potentially be doing that computation/comparison n*m times.
Hope that makes sense. :)
Bruteforce can be to check for each substring .
backtrack & caching were enough to get all test case passed 😎
better approach for memoization is to start from beginning and check from a set of worddict
If we start solving the problem from the first word then there’s no need of the cache step. One directly jumps to the 5th word and then solve the remaining recursively. Why force dp on this problem?
any reason Tries can 't be used in this problem , apart from cost of building the Trie,, ,, I can see the code implementation trying to match word from last index , do we have any advantage in this approach over starting from the first index ?
This man is a god people.
You are my leetcode jesus
What is the time complexity of the final solution??
Can we use a Trie ro solve this, if we build a Trie from the word dict, the. Check and eliminate the chars from the given string where endofWord is true?
Good idea!
Hi, I think the brute force complexities calculated in the first part of the video is not correct. It is going to be exponential. I think it is O(n^n) for the first brute force approach. The reason is that after finding the match of every possible sub-sequence in the dictionary (which takes n^2), you have to find a match for the remainder of the string (which again would take O(n^2)). So overall it takes O((n^2)^(n^2)) = O(n^n). thats what i think. and another sign that O(n^2) is just wrong is that the second brute force approach should intuitively take less time but according to your calculations, it took O(m*n^2) which is more than O(n^2)
Can you check your solution for command LGR. Direction changed still no circle
uffff blew my mind away
U a God
can someone explain why we go backwards seems like quite the trend in Neetcodes DP solutions? couldnt we do this very similarly just from index 0?
Yeah, most people usually start from i=0, that def works. I go backwards because it's closer to the drawing explanation.
@@NeetCode thank you for replying absolute honour, your videos are a life saver for an aussie trying to get a grad job! did you prefer to start from i=0 when doing dp personally?
What if the word dictionary in your example had another word "nee". In this case, dp[0] would match neet and nee. How'd we go about such cases?
@Soccer Var Yep
Tysm. Next can u do a video on this ...1504. Count Submatrices With All Ones
Can you please help me understand something??. What if instead of "leet", the word in the dictionary was "eet", or "et ", and the same in case of "code ", like say "de", or, "ode"..?
How would we approach then?
Using B U approach TC still o(nmn) ?
Why dont we put the wordDict into a set and get constant lookup? The question states that len(wordDict) > len(s),
So isn’t it better to optimize for worddict lookup?
top-down solution I made after watching this vid since bottom-up felt unintuitive for me.
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * (len(s)+1)
dp[0] = True
for i in range(1,len(s)+1):
for w in wordDict:
if s[i-1:i-1+len(w)] == w and dp[i-1]: dp[i-1+len(w)] = True
return dp[len(s)]
You are a prodigy :)
do you mind mention time and space complexity at the end. Thank you!