Essentials: Brian Kernighan on Associative Arrays - Computerphile
Vložit
- čas přidán 10. 08. 2017
- The 'Swiss Army Knife' of data structures, Professor Brian Kernighan talks about the associative array with beer & pizza.
EXTRA BITS: • EXTRA BITS: Essentials...
"Code" Books: • "Code" Books (Prof Bri...
Many thanks to Microsoft Research UK for their support with the 'Essentials' mini-series.
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
Pizza: 10 POUNDS!
Beer: 20 POUNDS!
Coffee: 2 POUNDS!
Beer: 20 POUNDS!
You go Kernighan, that's the spirit!
I see a Computerphile video featuring Brian Kernighan, I must drop everything and watch and "thumb-up". I'm a simple guy.
tg
I love Brian's voice, and how gentle and methodical he is when explaining things
I definitely fell in love with associative arrays in my Data Structures class in college. Between these and linked lists you can build just about everything.
@@auroraborealis5565 That’s an imported library.
Built under the hood with associative arrays and linked lists.
It makes me so happy to get some more lectures from my favorite prof even all this time after graduating. Not many people can be this entertaining and this informative at the same time!
this guys shopping list
Beer
Pizza
Coffee
Beer
£134 worth of coffee at that, hooooly
Classic Kernighan examples :D
Eh. Sounds like your average programmer.
you forgot beer
+noredine Sorry I forgot, I'm blaming this one on the beer.
Larry Wall: Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi.
A legend that truly understands 'the programmer'
Map is also a common name for this data structure
Sebastian Schrader he did mention that associative arrays can be referred to as [Hash]maps.
I just rewatched it and didn't hear him say it. He mentioned only hash table, hash and dictionary.
Sebastian Schrader my bad, I must've misheard!
Although for C++, it's important to remember that map is usually some form of binary search tree and unordered_map is a hash map.
object. B)
Very interesting. I'd never studied how these structures were stored internally, and now I finally understand why data stored in a hash is stored in a somewhat random looking order.
One thing in common between most if not all of these videos is that it is such a delight to listen to these experts talking about things in their respective areas.
40th
I love this legendary man...
truly legendary, too bad I'm never gonna meet him in person...
I'd have loved to have him as a professor! Very clear explanation :)
This was a really great video! The way I get it, the value of a hash table is that it's flexible and, as the Professor Kernighan noted, has almost constant time. You can use any type of data as the indexing element, thanks to the hashing function, and you almost always go through the same number of steps to access any data in the array, which is very different from--for example--a search function. And it's probably easier to read and understand in code. The only downside I see is that a hash table can be inefficient in terms of how much memory is used.
It is the classic "cpu time" versus "memory used" trade-off in computer science.
Access time in terms of caching seems inefficient as well
hashmaps are one of many ways to implement the associative array abstract data type. some of the most famous alternatives would be tree maps, implemented using self-balanced or unbalanced binary search tree, or associative lists, implemented using linked lists.
Awesome idea to bring this "Essentials" series, specially for us who have seen all this some time ago at University.
When he was talking about pounds, I initially wasn't sure if he meant weight or currency, so I was thinking "he buys 20lb of beer and pizza?!; programmer for life"
Maybe that's my problem, I don't like beer.
Too bad this series came out too late to interview Dennis Ritchie. RIP.
Ken Thompson is still with us...
@@treyquattro ken doesn't like the interviews.
I'm using hash tables all the time in my code. In C# they are called Dictionary. Very useful collection type indeed.
he's a young Dumbledore of programming wizardry
" Maybe beer collides with pizza. I mean they go well together! "
We were doing this type of algorithms back in the early 80's to manage memory allocation for paging systems.
Let's make a hash table JESSY!! -Misteeeer Kernighan, this is the purest blue linted code i've seen!
Loved this!
Can Kernighan please explain the Lin-Kernighan heuristic?
An episode about character sets and encoding algorithms would be interesting.
Great video!
7:35 the marker pen makes its sound even when not being used :-o
How the heck did you catch that!? Please teach me how to sorcerer.
Are you claiming that it's a magic marker?
Please also make a video on Open Addressing, which is another way to implement associative arrays.
Thank you!
When BK looks in to the camera i feel as if he's speaking directly to me.
As if I'm Neo from The Matrix.
The foundation of many efficient algorithms :)
In perl it's actual '%' not '#'. '#' is for comments instead.
But yeah perl has hash tables as a basic data type. That always seemed very weird to me, but now I get it. Up until now, I simply could not understand how something seemingly so elaborate could be said to be efficient or quick. I get it now.
I have used PERL hashes before but I don't think I really grasped the inner workings of them until watching this 10 minute video.
Some "administrative" programming languages have "temporary database tables". They are not committed to disk, they are private, they do not bother much about the overhead of behaving like a database table. But they do such a job just fine (or better) and you do not have to invent a hash function or copy data when things get crowded.
typograf62 These days all languages that have sqlite bindings automatically get “temporary database tables”. In .net you also get DataSet.
I work with these every day. Very common in the medical industry.
As a programmer myself, I figured I might not learn much, but I didn’t realize hash tables utilized linked lists under the hood.
key value pairs. oft derided by comp sci and database guys is a natural way to handle data.
What assoc. array library should I use for C? If I don't want to implement it each time, what do you suggest?
I wish I had a tenth of his knowledge.
I came across hashes in PERL and thought wow as they are so logical but I never thought about how they worked under the hood.
It's one step further when your associative array can have different types of key. At that point you can model OOP at some level. :)
Not that is the most efficient to do it that way. But it's a fun diversion.
Beer, Pizza, Coffee, and Chips... A programmer's grocery list for sure!
These are so essential that In Lua hash tables (called tables in the language) are the only data structuring mechanism, ie.e there are no lists, sets etc., only hash tables.
Are tuples implemented in the same way by programming languages that have them?
The master has spoken: associative array it is.
You guys should do a video on, Network on Chip! :P
We only know that the value for pizza is in some location because the hash of pizza gives the "address" (not sure if it's literally the address), right? So if there is a collision with another value and we expand the linked list how exactly would we differentiate between the two values?
For anyone just getting into the java world, if you are going to use a Hashtable somewhere, its probably better to use a HashMap instead. More details can be provided by google/stackoverflow.
So, in real-world applications for a layman like me, would this kind of hash be behind such functions as, "users also bought" and "suggested for you"? Or is it more useful for 'categorising' items like "baking utensils" which can have *multiple* other categories like "cutlery", "bowl", "glassware", and more.
Coffee is essential. I like this guy :D
02:15 I love the £0 spent on juice! *lol*
In Java, they're called HashMap. In Javascript, plain, anonymous objects are used for this purpose. (Also, fun fact: in Ruby, the operator that associates a key with a value, =>, is called a "hash rocket".)
IceMetalPunk in JavaScript, there's been Map and WeakMap for a couple years.
Thanks, I saw debugging Java Hashtable the effect of collisions, but I didn't recognize it for what it was, I believed it was an Eclipse strange bug!
That small hesitation before 'Javascript "programmer"' makes me giggle.
Are you trying to tell me that HTML is not a programming language?
Hmmm!
Shared the same sentiment, until I started to program in React + Redux. It's as sophisticated as anything else really :)
People who use things like C++ and such hate to call people who use "scripting languages" like JavaScript actual programmers.
Yeah... I was on a group project in college that managed to, in one semester, add a whole 7 lines to node.js
that was a mistake... Javascript is hellish, and I feel sorry for the people that have to look at it for their jobs.
the only thing wrong with javascript is the few remnants of java in it. :P
how do you loop through an associative array?........like in a traditional array, you can just start a for loop as (i=0;i
You use an iterator, as you can't index memory sequentially like with arrays.
Something like
iter = map.keys(); // or values directly
while((elem = iter.next()) != null) {...}
The details differ slightly between languages, but this is in general the way to do it.
Blueluelueluelue depends on how you implement hash function, usually hash function takes key and provides a number that corresponds to that key. So what you should do is just make normal array of n elements where insertion is done on indexes that correspond to key, what that means is that developer can go through whole array like you just said but user can't.
Blueluelueluelue they're typically linked lists i believe. or you can also just use an iterator
The correct way is to use a foreach loop if your language supports it. It should automatically get the iterator for you and iterate through each element in the array.
foreach() is the easiest, IMO, way to loop through associative array. And by using associative arrays you don't have to loop through it to find the one you are looking for. For example if you need to find price of coffee, you just use that associative index. echo $data_array['coffee'];
php example follows:
foreach($data_array as $key => $data) {
// your code here
}
Inside that foreach loop, there are two variables, $key and $data, $key is the current array index and $data is anything that current index of $data_array holds. It can be anything that variable can be, another array perhaps :D
This is THE Brian Kernighan. 27 dislikes?! Are those people nuts?!
If you care about performance you should consider not using collision lists, but keep the array flat (each element contains the actual (key,value) pair instead of a pointer to a list) and use linear probing. It's usually faster. You only need to be careful where to insert new elements and how to remove elements.
You can then even separate the (key,value) array in two arrays, one for the keys and one for the values which is especially useful if you're iterating a lot and you're mostly interested in the keys for example.
(Or even better, just use the builtin)
Associative arrays are especially useful when trying to conserve time and space.
Otherwise, you'd be enumerating local variables quite a bit
Oh my.
I think you should write specific hashingfunctions for specific applications, like you make a hash out of a string, while only adding the position of the letters in the alphabet instead of the unicode-id.
Why don't you split associative arrays into associative key-array and data-array, where you can reuse the key-array on other data-arrays, as you making a struct in C(++) and the key-array to access a specific member (which is "inlined" into code by the compiler) is not stored within the struct.
The Perl sigil for hash tables is %, not #.
I first learned about associative arrays when I learned Tcl and I thought, "that's magic!"
Why is the symbol for "pound" that (strange to Americans) upside-down 7 with a line through it?
For some reason I always thought associative arrays would be complicated to implement.
The complexity is in making them efficient for the maximal numbers of use cases. An associative array that only expected strings as keys can be optimized better than one that has to handle many disparate kinds of keys.
The problem with them is choosing the number of buckets. Choose too many and you have wasted space. Choose too little and you have long lookup times. Then to adjust the bucketsize as Brian talked about, it takes a fair bit, so it's not something you want to do often.
At their simplest, they are simple. But then there's the implementation choices and optimisations about the hash function, numbers of buckets, re-allocation strategies etc., and they suddenly become complicated.
The most complicated of them minimize overhead either in the space-complexity sense, or the time-complexity sense. The simple implementations fall right in the middle.
Topic suggestion: persistant data structures.
Brian could describe his breakfast for 2 hours and it would still be interesting
How many interviewees learn the crews' names? Cool guy.
for some reason hearing that marker really kills me inside
Why use a linked list to deal with collisions? Why not use a second-level hashtable with a different hash function? The chances that two items will collide in two hash functions is vanishingly small.
This video is more about Hash tables than associative arrays, and even then it only looks at one way of doing collision resolution.
5:55 Well, in that case you might need to *undrink* some beer, diplomatically speaking. That was a very common occurrence during my university years.
how did he get my shopping list
in bash you can create an associative array with:
declare -A array
array[pizza]=20
Weirdly I refer to them as hashmaps or just maps when talking about them in general, even though my two main languages calls them dictionaries (Python) and objects (JavaScript)...
Wonder where I got that from, maybe back in programming class... Are they called hashmaps in C++ maybe?
The advantage of calling them "Dictionaries" or "Arrays" is that you abstract the problem away. After all, whether a Collection uses a fixed array or a hash table should be entirely an implementation detail, usually dependent on the number of elements in the collection, and whether uniqueness is required. The programmer typically shouldn't care about the implementation detail, only the boilerplate description, and big O characteristics.
Whats the algorithm that decides when to increase N?
It can vary. The associative array would keep track internally of both how many table "slots" are used, and also how long the longest collision list is for any one hash slot. When some cost function (which combines the two in some way) reaches some cutoff value, a growth process occurs. Bear in mind that growing these tables is expensive though, as each table entry must be rehashed. So the cost/benefit between growing and not growing (but having longer search lookups) can be tricky to get right. (If you grow too often, you waste cpu growing unnecessarily. If you don't grow enough, you waste cpu on table lookups due to more collisions).
"Buying beer and, pizza, and coffee, and chips" - yeah, 100% programmer confirmed, lol.
Did Brian Kernighan just make an off-by-one array length error??? So... Much... Irony...
Why arnt there just two arrays, one with the keys (So on a access you loop trought fill you find the index where the key was) and another array with the values (which you would access by the index where the key was in its array)
The "mission" critical issue which Brian didn't really get around to is reducing the lookup for any one element. You don't want your algorithm to have to traverse the entire structure in order to find what 'could' be the 'last' element in a very very long list. Too inefficient. So the modified hash table is superior to an array or standard linked list or doubly linked list.
Yeah, your two array solution is O(n) to access an element, a hashmap is O(1)
The search cost for table lookups for that approach is very expensive. For string keys (as in this video), you end of comparing the strings for equality to find the matching key in the table. With a table of size K, you can expect to have to check K/2 keys on average to find a match. With hashing you still have to scan the lookup key string once to produce a hash value, but you then only have to search a much smaller subset of keys (the collisions), trying to match the key. Much, much faster. However, as with all things, there is a worst case scenario - the one where ALL keys collide to one hash slot - that then requires checking the same K/2 key strings to the search key string (as above). But this is very unlikely to occur in practice.
Is Dr. Kernighan in Nottingham or something?
Good
Well... C# has both, Dictionarys and HashTables
I'm confused now
they essentially serve the same purpose, but have some internal differences
In essence, an array without index numbers?
In essence, an array with index numbers converted from actual keys.
How about doing some CZcams magic and making "Essentials" a actual CZcams series, like Tom Scott did with the fizzbuzz video recently +Computerphile? Anyways, nice miniseries.
3:25 *THE CAP*
0:58 he almost said Perl. He did. What happened to Perl is damn tragedy.
What did happened with Perl?
Could you please add English subtitles??? It's very hard to non native English speakers like me to understand everything you say. I've seen other videos from this channel supporting this feature or at least allow Google auto captioning
Its probably queued up to be auto-captioned by Google. Likely it depends on the number of views a video has before it gets put into the queue.
I thought hash collisions were exceptionally rare... do they really come up that much in associative arrays?
It depends on your hash function. It needs to be rare for cryptographic hash functions, but hash functions for hash tables only really need to be balanced--- infrequent collisions are okay if your hash values are spread out over the entire table.
2:15, why is juice free?
In python Dictionaries it explains it self
i feel uncomfortable of the sound of the marker grinding on paper, :
just noticed that Dr. Kernighan is a lefty -- why am i not surprised haha
I've only ever heard "associative array" used in PHP, a language I try to stay as far away as possible.
web devs (cringes)
Avoiding PHP is to your credit. It was the 90ies version of node.js ;-)
The first one I saw was written in IBM-360 ASM. They are very useful when making compilers and interpreters. Programmers are notorious for using variable names that are similar.
The only way for humans to express meaning is language, which uses words as its building blocks. So instead of a meaningless number, address in memory or numeric position in a list, you use meaningfull words instead in associative arrays ... very easy to use ... enhancing readability of code greatly ... BUT ... is hard to implement in any programming language in terms of compilers ...
3:47 X["beer"] += 15
Clicked because of the cans on the thumbnail
really?
"I take pizza and run it through a hash..." Yeah, nobody will eat that pizza anymore...
No love for C++ map?
C# Dictionary Yay
They Are Awesome
IIITD SHOW UP!
> Coffee is essential
SAVE 418!!