What is Big Data? - Computerphile
Vložit
- čas přidán 27. 07. 2024
- With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
MapReduce: • MapReduce - Computerphile
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
You can make any data big data by exporting it in XML
It's hoooman-readable format though amirite? :>
big data is anything that is too large to be opened in excel
True.😂
You say that, but many companies who say they use 'big data' really mean a huge spreadsheet.
big data is anything too large to fit in pandas
For thats there is microsoft access. Its just large excel :^)
_"big data is anything that is too large to be opened in excel"_
So big data is any table with more than 100 rows?
Hands down best explanation of big data I have seen. I‘m coming from a business degree where we often learn about the 5Vs but don’t really touch on what infrastructure is actually used or needed for using/handling big data. Now I definitely have a better perspective on this!
According to management all big data can be reduced to one nice coloured 3D-pie chart!
And if you don't have a nice upsloping line graph, well... I'm sorry but I'd like to speak to you in my office when are you finished.
@@Walleggwp hockeystick!
mmm, pie... I keep suggesting it but my team starts ignoring me after that.
More V's of data!
*Volatility*: How likely is it that this data is received intact? How often do the bits get flipped?
*Velociraptors*: How much would this data scare xkcd?
*Vaingloriousness*: How hard is the creator of this data trying to shove it in your face despite repeated attempts to get them to shut up?
*Vanity*: How likely would the data be to win a beauty pageant?
*Vampiricism*: When mirrored, does this data delete itself?
*Vaccination*: Has the data been protected from viruses?
*Vuvuzela*: Honestly, this one should describe itself.
this is glorious! nearly died here XD
Superb haha
Vastness: Does 'huge volume' not even begin to describe the sheer size of the data?
Verse: Is the data in verse form?
Viscosity: Does the data flow effortlessly, or does it lump up like blood clots?
Vikings: Does the data contain false information about vikings, such as them wearing horned helmets?
Vendetta: Is the data vengeful? Viciously vindictive?
Vincent van Gogh: Is it art?
Vortex: Does the data rotate in ever more violent circular motions around the data center?
Vulgarity: Must the data be censored for people in the US?
Violas: Would a symphony orchestra make fun of the data?
@@letMeSayThatInIrish Holy data, this is even more ridiculus. The beauty is that each of the makes so much sense by itself and represents an actual (kinda) valid query!
Virginity: is it new and pure
Violence Level: how likely is it to destroy other data
Vocal: how easy is it to be heard
Viagraity: can it give the reader a hard on
I picture a computer scientist somewhere thinking "Hmm gravity of the data is an important aspect that should define big data." and his friends are like "It doesn't start with a 'V' it won't work"
Value of importance - how important the data is
Based on the value you can manage its position in a data pipeline - e.g what dataset you process first, how much computation power going into processing it, what data is sent to nodes in a network first, etc.
I'm pretty sure them using all v's is to appeal to people who don't have a computer science background (aka managers and execs), or maybe people taking a first course in data science. I don't know that for sure but just the fact they used "velocity" instead of throughput makes me think that. If it was for people with a cs/IT background, that would just confuse them.
8:32 Sean Ridley is an awesome editor. Used the word Process to add Pre Process in the video.💯🔥
TIL a little bit about Big Data, but also learned that in England a truck is called a lorry.
And a highway is called a motorway.
In India too.
Ande Yashwanth well yeah, cause england invaded india
And in the us, you park on driveways and drive on parkways
That's nothing, they come in different colours (with a u) too! Try saying "red lorry yellow lorry red lorry yellow lorry red lorry yellow lorry" really fast.
my modded skyrim is big data
too much for one computer to handle
I know what you mean. I literally have to run the game at my local rendering farm to get anything over 10 fps.
@@hattrickster33 At least you have a local rendering farm.
Every Computerphile video deserves a like.
Change my mind.
nope. you're right sir!
"Big Data" is the confusion that follows after marketing people end up describing technical stuff.
Could not agree more.
Did you know? The term "Machine Learning" was an invention of the marketing team at IBM in 1959. Machines don't learn, silly. Well, neither do people, much of the time.
@@cmonkey63
Machine learning describes precisely what it's about. Really, I cannot think of any better term for it. Computer aided reverse deduction? Knowledge discovery in databases? Automated stochastical analysis? Practical function fitting? Those are all obscurantist, *learning* is what it's about. And who learns? A machine.
@@MrCmon113 Statistical model estimation/fitting would be more accurate IMO. Optimization has been around for ages, why call it learning all of a sudden? (hint: money)
@@alkis2407 algorithms learn. They adapt without code being rewritten, and produce outcomes that haven't been preprogrammed, and get better with experience. That's learning.
This channel is super informative. I'm super pleased that I was able to stumble upon it. Broadens my knowledge of Computer Science.
That montage at the end is such a wax museum.
Thank you for making these and sharing these lovely videos. They're a fantastic resource.
She does a good job of covering many of the important basic concepts.
Let's take a moment and say that computerphile never disappoints.
Great to see more of Rebecca!
This one was much better presented, seems like she's getting some practice (and confidence). :-)
...and is being patronised slightly less.
I started to research to that topic today and was even on this yt channel to search for stuff ... and tadaaah I see this upload in my subbox, perfect timing :)
It’s not the size of your data that matters, rather how well you process it ...
No, it's both.
We knew about lots of the best machine learning algorithms more than thirty years ago, but we didn't have the datasets to train them sufficiently.
Deep neural networks are comparatively simple, but they perform miracles if you throw tons and tons of data at them.
Taxtro I see humour isn’t lost on you ... thanks for playing along!
@@MrCmon113 wow you're cool
Looking forward to the next video, thanks!
Great video and really well explained. Ms Tickle is one of my two fav presenters on this channel.
the other being?
Long overdue...thank you
"This data is small, but the data over there is far away."
Thanks Ted.
Best / most unexpected comment I’ve ever laughed at. I can see him looking so confused.
Good stuff. While I knew each of the concepts, I'd not heard of the "5 Vs" (let alone the 10/whatever)... cool!
And wait, is this map/reduce video out already? Must find it. I've been wanting a refresher, because I haven't used it in a while, but it could be useful for me soon.
Can we also get videos on big data using none Spark based technologies?
Every time I hear "data" as a singular noun ("data is") instead of a plural ("data are"), it seems like such a welcome change. The old plural usage seems so stilted and it's simply not how I hear most people talk unless they're very prescriptivist.
"Data are as Data is."
Datum
@@TheSam1902 That's the stilted usage I refer to which no actual person under 70 uses unless they're deliberately trying to sound awful.
Whilst grammatically, the singular is "datum" and the plural is "data", and by linguistic pedantry it ought to be "data are", this ignores the intrinsically "uncountable" nature of data.
A single bit could be legitimately described as a "datum", as you can't further decompose it. But, for anything more than that, the problem with the notion of singular and plural on data is that it's always composite.
Is a byte a singular piece of data? Or is it 8 bits of data? Or is it 2 nibbles?
Well, yes, exactly. The answer is "yes".
So we've already hit the issue with any notion of plurality on "data". Any amount of it, beyond a singular bit, could be viewed as singular or plural. Depends on your metrics.
Information = data + structure.
"Data", by this definition, is without structure. So you cannot logically impose singularity / plurality onto it without implicitly providing structure, that makes it cease being "data" and becoming "information".
More over, it's worth noting that, in English, "information" is uncountable. You can't have "informations". Information plus more information is still information - there's just more of it.
It's a linguistic quirk. Shouldn't really be there. "Data" is, by nature, uncountable - whether English grammar wishes to agree or not.
Therefore, for me, it's always "data is". Data plus more data is still data - there's just more of it. Exactly as uncountable as "information" already is.
(And this ends up being even more so, if you actually spend any time with assembly language programming. As you're quite often doing things like grabbing the upper nibble of a byte to test for flags, or - to, for example, swap endianness - grab the individual bytes in, say, a 64-bit value and then swap the byte order around. The fluid interchangeability of how you interpret data - that, indeed, at the machine level, code is data too and you can create confusing self-modifying code that rewrites itself, even - becomes very apparent. Data, as data, has no inherent structure. No intrinsic plurality. Code implicitly provides the structure from how it treats the data, which turns it into useful information. In this view, data is, by nature, intrinsically uncountable - even if, by a quirk of history, the English language appears to disagree with this.)
@@klaxoncow buried.
More than 16,384 columns = Big Data.
Love the use of old dot matrix printer paper to try and explain the basics of big data.
Would it be possible for you to do a video on the piece of art that is called Wireguard?
01:28 I haven't seen that wide of a continuous paper in decades
Please bring this one back
Thanks a lot for this explanation, very clear!
Clean and clear explanation
i can listen to u all day :)
2:52 That lorry is heading NNW, not NNE.
Thank you Rebecca!
Big data for me is when any text editor I try crashes while opening it...
A quote I heard last week about big data: "We are drowning in data but starved for information." (Paraphrasing John Naisbitt, 1982).
Information is just the complexity of the data. What you are looking for is knowledge.
It's the opposite of ˢᵐᵃˡˡ data
rofl
How do you make CZcams render small text?
@@PaulaJBean Google tiny text
@@noredine ᵀʰᵃⁿᵏˢ ᶠᵒʳ ᵗʰᵉ ᵗᶦᵖᵎ
CZcams views and likes are tracked by traditional databases. CZcams recommendation algorithms use "big data" (although they use views and likes as raw input)
"Big Data" systems are mainly interested in the _patterns_ in the data (data = whatever information is fed into the system), and the integrity, or confidence in, the individual atom of data is not very important. OTOH, in traditional databases (bookkeeping, inventory, payroll) the integrity of each atom of data is (with some exceptions) very important indeed.
Candy crush is big data for my Amiga 500😉
Kafka is really easy to use in node.js. I like it.
Can someone explain me the difference between Big Data, ETL (Datawarehouse), and Data Engineer.
I'm really confused
Big Data is a great band 👍
How many Apache projects are there?
At the moment... exactly 367
As much as the number of feathers on a peacock.
Long time ago i was thinking that we can in theory use Big Data to create new electrical energy that can feed other machines or even the Big Data system itself. When we have huge amount of data, some of it is relevant information (this is used for processing) a second type of data is a second relevant data (this is used to train the Big Data system to improve itself) and the last type is total garbage data (this is still data that has 0 and 1). Now we know that when digital information is deleted from the machine the actual bits of information are not lost but transformed via thermodynamic effects into heat (this heat is raising the temperature of the machine) so when digital data is deleted the machine will heat up a little bit. Now we channel all the heat from all the machines and instead of disposing it we reuse it to produce electricity. So we recycle the "heat" from the machine.
But you forgot something, it's not the heat that is valuable, it's the heat **differential** . Some datacentres in northern countries uses the temperature difference between the inside of the server room and the outside air to power Sterling engines and produce electricity, but it's still not very efficient.
Also iirc the swedish military won a wargame against the US because their submarine were (partially) powered by these Sterling engines making them stealthy than nuclear/diesel powered submarines.
@@TheSam1902 I agree with the inefficiency. Another way to improved this is by increasing the information density. But i still believe that this will be possible if the system is large enough. I am thinking about interplanetary internet where you need to process all the data of an entire planet. Also we know that information at a quantum level is stored in the surface not in volume. so i am thinking of using black holes as memory.
Big Data: The lifeblood of Big Brother
"How big is big data?"
Me: big
Productive video
0:02 Ron Graham is the right person to answer this.
How times have changed in my day it was the 4F's now its the 5V's
That's data but, like, really big.
Does size really matter?
It is how the data is used.
edit: or "data are"
It depends on whether you are referring to the data individually or collectively.
@@michaelsommers2356 wouldn't you use datum if it was singular, and data otherwise, using "is" for both?
@@thomaspearson8782 Sure, but I was mostly joking.
Ok so you have to tell me what distribution produced the following input-output pair: A -> 0
Do you think your chances of guessing the right function improve if I give you more examples? If not, why do you think learning is even possible?
-> Rotate/Move the rocket
->Light
It's funny how people think of bug data, the company I work for can produce 100s TBs every few hours, we went to a 'big data' conference and got told we didn't count as it was a small problem :-/
"Bug data" 🦗🤔
There are a lot of smug assholes in the industry. And there are a lot of people who push buzzwords for no reason. Don't mind them.
@@terohannula30 🐛 🐞 bug data
There's only three Vs, the last two were clearly added on because somebody wanted five "Vs" but they really have nothing to do with whether something is big data.
640 kB
That's a big excel file
"how big is big"
giggles
Big data is the study material folder in the d drive
Thanks for enabling transcriber... oh, it's disabled...
@@jamiecropley I don't know why they don't enable the transcriber, it's free and it helps people like me that English is not their mother language. It's too hard for me listening people talking in English, I understand some words, few phrases, but not all. On the other hand, I understand very well English written.
I'm not lucky like others who born in countries where English is the first language, or where education system worries about teaching English to students.
I won't be content until you have more V's than the speech from V for Vendetta. Voila!
They could solve their problem with the simple expedient of not collecting data.
Step 1: Rotate/hone the rocket
Step 2: Light
Step 3: ...
Step 4: Profit!
I think it's "rotate / move the rocket" but I hope we learn more about Rebecca's Rockets in a future computerphile video!
@@NoseyNick oh yes, on looking again you're right.
Hopefully we will find out what it's about!
3 inches is pretty big right?
It's not the size of the data that matters, but how you use it.
The size of the data matters a lot. Some things you can only learn from incredibly huge sets of data.
I am fully functional, programmed in multiple techniques and now *big*
I think data has to be at least this >| |< big... maybe even this >| |< big...
Very clear video and explanation, but is Big Data still a relevant issue in 2019?
Just ask the NSA. They are listening to everything and everybody, all that data has to go somewhere.
mipmipmipmipmip That depends on what you mean by relevant issue. The problem of handling and analyzing big data is pretty much solved. With tons of working different solutions available, we are past the phase of making it possible and at the phase of improving. So the problem of whether big data can be put to use is not relevant anymore. However, all fields are trying to become data driven to increase profits, thus generating big data. So I would say yes, big data is a relevant issue, perhaps more than ever.
That's like asking: "Do we still want to learn things about the real world?"
The only way you are going to learn things about the world is to collect data and the more data you have the more you can, in principle, learn. If you knew everything, you'd already know everything and wouldn't need a theory. As long as you don't know everything, you need a theory to predict what you don't know. And that theory can only be improved via training and testing examples.
Thank you.
Lorries are awesome.
Mind tickled :p
Only the first 3 Vs given are actually particular to defining Big Data. If the data is such Volume, Velocity, and/or Variety that traditional data management can't handle it well, then it's Big Data. Value and Veracity apply just as well to a single data point. If the data (no matter its size or shape) has no value, then there is no reason to collect or store it. If the data (no matter its size or shape) lacks veracity, then its value is questionable.
Thumbnail a+
As a programmer.. Can someone please tell me how to meet girls like this?
1TB.
Ah the adorable one is back :D
Generally, more than 10 terabytes is big Data usually
When I was a kid, I used to say things priced at $30 or greater were expensive, regardless of context. ;)
@@Alex1891 most things in life are subjective and don't have a definitive answer. But it's definitely possible to give a generalized average answer. In this case you can say the AVERAGE server can only process less than 1 terabyte of typical data and thus you'll need multiple computers to process the data. The most unhelpful and pedantic answers you can give is something annoying like " it depends. It's subjective. It varies from problem to problem".
I've realized that 5 V makes for a very nice mnemonic. V is 5 in roman numerals, so you can pretty easily remember that there are 5 Vs.
It's probably just an accident, but makes it things a little easier to remember c:
Rebecca is so cute!! ❤
@MichaelKingsfordGray What's your address and credit card number? Wouldn't want to be anonymous and cowardly, big man.
Use your inside voice. It's not a problem to find somebody attractive, but did that really need to be in a comment on this video?
I hate how the sound of the pen lags behind the actual pen
I didn't realize but now I can't not realize, you monster
A megabyte.
Big if true
hi !
I like to pretend i'm smart enough to understand what's going on in this video :)
so where's that map reduce video?
Hey, she's back! The cute nerdy chick!
How I love me some pretty, intelligent women in STEM. Great Video, was always wondering what big data really is.
CZcams IS BIG.
In before your entire life and your rights are represented in a 5 star rating system;
And yes I have seen that black mirror episode ( ' ', )
In before killer robot bees
00:01 "How big is big?"
LOL
big data > small data
Always like for gals in tech!
Well she's sweet
coool
Jumbo big data.
Isilon
Splunk
Building Brainiacs Brain
Vig data
Is it just me who cringes to the sound of the marker writing on that paper
Data storage is now totally separate physically from the computers that access it. The idea of defining big data as the max that a single computer can process is laughable.
I dont thik so, given the fact that you mostly need the "computer" to process the data. Also its just metaphorical not absolute definition, as the BigData itself.