Hadoop is dying. And it's happening fast. Learn why in the latest Intricity 101 video. Here's the link referenced in the video: www.intricity.com/readseq/ www.intricity.com
Josh Nicholson I disagree completely with that sentiment. The fact that we describe how we’re helping people solve the problem which we educate about in the video doesn’t take away from its educational nature. The video could stand completely on its own as an educational video.
No. You watched it to understand the framing of the problem (which i noticed you don't agree or disagree) along with a value-add solution that aids in the migration.
No. You watched it to understand the framing of the problem (which i noticed you don't agree or disagree) along with a value-add solution that aids in the migration.
Is hadoop used on a cluster of an organization's own computers that the org has to maintain and manage and the "cloud" the same thing on somebody else's computer that you pay to rent out based on your dynamic requirements of scale and have them maintain and manage. Is that the argument being had here???
I'm studing hadoop subject and I wonder why I should have one in my company. Hadoop is designed to keep files. But what is the sense to keep data in files? Nowadays we have dozens of databases; sql, none-sql. They all have fantastic speed, cluster abilities. What is the sense to play with some old fashion linux funs ideas? Really, I'm asking. Can someone give me real life exaples where hadoop can be useful?
The Truth is Hadoop was never very good. It was not really new technology but revitalized OLD technology. Also the claims made around it were never going to be realized. It wasn't going to ever replace robust traditional databases nor should it ever have been considered for such a thing. A batch oriented technology is NEVER going to replace the need for thousands of simultaneous users running queries. If Hadoop had not been oversold it probably would be in a better place today but because it was over sold and many fortune 500 companies have had many high profile expensive failures many have soured on Hadoop. When the ROI of "Free" is costly consulting and failures to deliver on projects that often rival the cost of just leveraging large traditional data platforms that simply work its not a hard choice. That said there will always be another Hadoop because people are intellectually lazy and buzz word shiny object oriented they will always fall for it. The cloud and Spark have both helped put the nails in the coffin of Hadoop. I never believed in Hadoop or other proposed "warehouse killers". Big data existed long before Hadoop and in many cases Hadoop was sold in such a way that it was really more of a scam. So for me its good riddance. It merely sucked IT dollars down a black hole because so many executives and CIOs are too lazy to really understand a technology and too egotistic to stand alone against a hype machine.
I felt this video was bullshit back then when I was managing a large hadoop cluster to run ETL and OLAP. But till now, I have to admit that hadoop is not suitable anymore since the rise of cloud computing and no one is gonna build their own cluster anymore. Using the same old Hadoop is clunky and wasteful, most of people just choose another solution rather than stick with hadoop, unless you have infinite money to pay the traffic and storage. Hadoop is still great and there're a lot of things which can't be replaced in the hadoop stack (like apache hive, hbase and kylin), if somehow hadoop managed to simplify its architecture, I think people will head back to hadoop again
Yeah I got so much hate mail when I released this video. I did release it quite early in the cloud takeover, but the uptake in cloud at the time was so intense I had to release it.
@@Intricity101 you really predicted the future. 2020 is the year of cloud computing, the cost of using cloud service is much smaller than manage multiple clusters of bare metal. Some great tools from hadoop family are even making ways out from their own family, like apache kylin. Hadoop is too slow and complex to maintain and manage. A lot of companies I know had dumped Hadoop and saved their data to a cloud provided service like cassandra, dynamo to host their since most of data nowadays could be saved in some kind of json. I think the only reason that some companies still using hadoop is because their system is rely too much on hadoop, the data is too big so they cant migrate it, or they just don’t want to change. Since they have infinite amount of money so it’s not a problem for them to manage a large cluster of hadoop. But in 2021, there is no reason to turn up a hadoop cluster when we can just simply put some data to apache spark in amazon and extract it to cassandra.
@@aperture147could you recommend the actual roadmap to become a data engeneer? I find many recommendations to learn hadoop but the reality proves opposite, so i am quite indifferent about the learning path
I stopped where you said hadoop is based on map reduce , na boy it was primarily two stuff , GFS and MapReduce :-) , those were the Google papers. Bye not watching.
@@Intricity101 You already have my view , the above was just my opinion. Plus it's true there are enough object storage systems today acting as Alternative but then again Hadoop lays as an alternative right whether on prem or over cloud. So it ain't dying, It has competition 🤷🏼♂️
What's stated here has happened in spades. Instances of Hadoop are being migrated in mass to the cheap storage offerings provided by AWS/Azure/GCP. This is actually old news at this point so the folks hanging on are really just lagging the trend.
Well even Hadoop deployments in the cloud are asking, "Why not use S3/Azure Blob?". The cost of storage is cheaper and the performance is comparable. Also there are native services that make the query performance light-years faster.
@@Intricity101 Yups its all down to cost of storage: a no brainer, the speed concern of querying in "hadoop proper" (like hive and pig) is long gone in the past with apache spark doing the analytics over HDFS, And now something that even challenges spark's speed and ease: google's APACHE BEAM! an exciting new world in data analytics and MPP (massively parallel processing)
Hmm. Well, maybe. Seems like there's always someone who wants to be the first to declare a demise. I remember over ten years ago reading an article that insisted Bluetooth was already finished.
Not sure if you guys gives a damn but if you are bored like me during the covid times then you can watch pretty much all the latest movies and series on InstaFlixxer. Have been streaming with my brother for the last couple of weeks :)
I am from traditional database background, was planning to learn hadoop and spark as a career switch, after watching this video..i need to rethink..could you please help?
If I am sticking to the Azure ecosystem, do you recommend learning stuff like U-SQL in ADLA and Scala/PySpark for Azure Databricks or Azure HDInsight Spark clusters? The new Data Flows in Azure Data Factory V2 are very interesting too, the ETL processes are constructed visually with a nice a GUI which are then compiled into Spark executables that are executed on Azure Databricks clusters. Edit: I'm talking Data Engineering here and some analytics but mainly engineering.
What you want is efficient use of compute because that is where your costs are going to come from. So when compute is not being used you want it to shut down. Look into Snowflake on Azure for something like this.
@@Intricity101 Thanks for the reply, will definitely look more into Snowflake, but it is just a data warehouse right? Microsoft have just announced the SQL Database serverless compute tier, I feel like that is very similar.
people were saying mainframe is dead some 20 yrs ago lol. But the truth is it's alive for at least next 30 yrs.These naysayers will be always there. Btw world ended on 2012 because mayan couldn't calculate beyond 2012.
Hadoop is more than a Distributed storage system. It is a comprehensive ecosystem of dozens of specific software for dozens of types of usage combined.
Lets just say that VERY few startups are considering Hadoop as their platform of choice. My personal friends that work for the big Hadoop vendors are getting crickets in their offices, and have their resumes out.
AWS S3/Athena, Azure Blob Storage/Azure DLA, Google Big Query, and Snowflake (which is one of my favs). Depends on the use case. Bottom line is, storage/query provided by native cloud vendors works just fine as an alternate to Hadoop, and it's usually cheaper.
@@Intricity101 thank you... just one question.. "Google Big Query, and Snowflake " similar to hadoop?? is java needed for this or any other programming knowledge is required as prerequisite??
1. Spark, Impala, Tez, Hive LLAP make query much faster now 2. By the time you upload a Terabyte of data to amazon cloud for analysis, the information is already old and useless. Specially with the organization that have daily change of more than 100GB. I'd admit that small organization may benefit from your argument of cloud as they don't need to maintain the infrastructure but hadoop doesn't require that much love after the initial setup is done properly.
Jomphop, its actually the opposite now. The larger you are the faster you want to get off Hadoop. Especially since now the only justifiable location to store such large sums of information is in the cloud anyway. So why pay the licensing above and beyond the native cloud storage? Makes no sense. I can tell you, the business prospects of the Hadoop distributions are a shadow of their previous glory days.
Intricity101 there is also a push to decentralize cloud solutions. Not everyone is interested at hosting their data at a third party company let alone in another country. So for the case of decentralizing the cloud, or creating a regular private cloud, I will disagree with your statement, both here and in your video :-)
An open source solution like hadoop will always carry some drawback compared to the licensed versions. The fight is not about which technology is better compared to others since hadoop will always be the base of big data solutions & technologies, which gives good insight to big data technologies. Besides in this fast moving technology, noone can tell what is gonna happen after 5 years. And no offense but your statement "noone cares about hadoop" during start of the video sounded like amateur.
Samant, I do mean it, people that don't have their careers locked into Hadoop really don't care about it. They're open to really understanding what's better. And Hadoop isn't better anymore. Comparing for example to Amazon S3, Hadoop less elastic, more expensive (by a large margin), less available, and less durable. I can only ask people to give the cloud a try and see for themselves. Here's a great head to head comparison done by Databricks: databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html
Goutham Anush Aenugula It’s dead the same way mainframes are dead. It doesn’t mean that they will be unfindable, just few new adopters. That started 2 years ago. One of my good friends that worked at one of the largest distributions of Hadoop called me asking for references. He wanted to join a cloud data warehousing company. “We’re not adding new customers, we’re just trying to find new use cases for the customers we have.” It is true not every company will be willing to go to the cloud. Just like not every person put their money in the bank.
Just got a call last week from another one of my colleagues that works at one of the largest Hadoop vendor. (I'm sure you can guess which) "It's a freggin ghost town here, is there any chance you could be a reference for me? I'm interviewing at Snowflake." I definitely got a lot of hate mail for this video, especially on LinkedIn. But I have no problem standing behind it, because it's true.
Let's just say the vendors of Hadoop are a graveyard. My friends that have been in that space have been in a mass exodus towards born in the cloud solutions.
Did I really watch this all the way to the end to realize it was an ad.
Josh Nicholson sorry you didn’t like it. I’ll try to do better next time.
Intricity101 Good. I think the way that this is framed as an educational video is misleading and disingenuous.
Josh Nicholson I disagree completely with that sentiment. The fact that we describe how we’re helping people solve the problem which we educate about in the video doesn’t take away from its educational nature. The video could stand completely on its own as an educational video.
No. You watched it to understand the framing of the problem (which i noticed you don't agree or disagree) along with a value-add solution that aids in the migration.
No. You watched it to understand the framing of the problem (which i noticed you don't agree or disagree) along with a value-add solution that aids in the migration.
Hadoop is not dying but evolving from the traditional definition of Hadoop.
@Mane what is hadoop evolving to?
Nah... its dying.
@@Intricity101 is Hadoop dead or not
@@shawnz9833 Data lakes on cloud?
Is hadoop used on a cluster of an organization's own computers that the org has to maintain and manage and the "cloud" the same thing on somebody else's computer that you pay to rent out based on your dynamic requirements of scale and have them maintain and manage. Is that the argument being had here???
Can you store big data on amazon S3 ?
You can store anything on S3
YES!!!
Very clear, thank you!
google technically didn’t make the initial setup of mapreduce. very nice vid though thanks
In my opinion, Hadoop's high underlying operating costs are toxic. But Hadoop is evolving, unless ASF stops it.
Don't give up, elephant!
I'm studing hadoop subject and I wonder why I should have one in my company.
Hadoop is designed to keep files. But what is the sense to keep data in files? Nowadays we have dozens of databases; sql, none-sql. They all have fantastic speed, cluster abilities. What is the sense to play with some old fashion linux funs ideas?
Really, I'm asking. Can someone give me real life exaples where hadoop can be useful?
Watch this czcams.com/video/MfF750YVDxM/video.html
IGNORE THIS VIDEO OK IM GRTTING THOUSANDS OF CALLS FIR HADOOP POSITION LOLZZ
"is dying" is not the same as "is dead"
Lol no positions on reading comprehension
The Truth is Hadoop was never very good. It was not really new technology but revitalized OLD technology. Also the claims made around it were never going to be realized. It wasn't going to ever replace robust traditional databases nor should it ever have been considered for such a thing. A batch oriented technology is NEVER going to replace the need for thousands of simultaneous users running queries. If Hadoop had not been oversold it probably would be in a better place today but because it was over sold and many fortune 500 companies have had many high profile expensive failures many have soured on Hadoop. When the ROI of "Free" is costly consulting and failures to deliver on projects that often rival the cost of just leveraging large traditional data platforms that simply work its not a hard choice. That said there will always be another Hadoop because people are intellectually lazy and buzz word shiny object oriented they will always fall for it. The cloud and Spark have both helped put the nails in the coffin of Hadoop. I never believed in Hadoop or other proposed "warehouse killers". Big data existed long before Hadoop and in many cases Hadoop was sold in such a way that it was really more of a scam. So for me its good riddance. It merely sucked IT dollars down a black hole because so many executives and CIOs are too lazy to really understand a technology and too egotistic to stand alone against a hype machine.
Gem
I felt this video was bullshit back then when I was managing a large hadoop cluster to run ETL and OLAP. But till now, I have to admit that hadoop is not suitable anymore since the rise of cloud computing and no one is gonna build their own cluster anymore. Using the same old Hadoop is clunky and wasteful, most of people just choose another solution rather than stick with hadoop, unless you have infinite money to pay the traffic and storage. Hadoop is still great and there're a lot of things which can't be replaced in the hadoop stack (like apache hive, hbase and kylin), if somehow hadoop managed to simplify its architecture, I think people will head back to hadoop again
Yeah I got so much hate mail when I released this video. I did release it quite early in the cloud takeover, but the uptake in cloud at the time was so intense I had to release it.
@@Intricity101 you really predicted the future. 2020 is the year of cloud computing, the cost of using cloud service is much smaller than manage multiple clusters of bare metal. Some great tools from hadoop family are even making ways out from their own family, like apache kylin. Hadoop is too slow and complex to maintain and manage. A lot of companies I know had dumped Hadoop and saved their data to a cloud provided service like cassandra, dynamo to host their since most of data nowadays could be saved in some kind of json.
I think the only reason that some companies still using hadoop is because their system is rely too much on hadoop, the data is too big so they cant migrate it, or they just don’t want to change. Since they have infinite amount of money so it’s not a problem for them to manage a large cluster of hadoop. But in 2021, there is no reason to turn up a hadoop cluster when we can just simply put some data to apache spark in amazon and extract it to cassandra.
@@aperture147could you recommend the actual roadmap to become a data engeneer? I find many recommendations to learn hadoop but the reality proves opposite, so i am quite indifferent about the learning path
I stopped where you said hadoop is based on map reduce , na boy it was primarily two stuff , GFS and MapReduce :-) , those were the Google papers.
Bye not watching.
RAJAT KANTI Bhattacharjee are you really splitting hairs that small on a 4 minute video, and not even on the main point?
@@Intricity101 You already have my view , the above was just my opinion. Plus it's true there are enough object storage systems today acting as Alternative but then again Hadoop lays as an alternative right whether on prem or over cloud. So it ain't dying, It has competition 🤷🏼♂️
next, why Internet is dying.
What's stated here has happened in spades. Instances of Hadoop are being migrated in mass to the cheap storage offerings provided by AWS/Azure/GCP. This is actually old news at this point so the folks hanging on are really just lagging the trend.
wrr
Hadoop can't end
so basically you are saying "local distributed storage" is dying because "cloud distributed storage"
alright! so whats new!
Well even Hadoop deployments in the cloud are asking, "Why not use S3/Azure Blob?". The cost of storage is cheaper and the performance is comparable. Also there are native services that make the query performance light-years faster.
@@Intricity101 Yups its all down to cost of storage: a no brainer, the speed concern of querying in "hadoop proper" (like hive and pig) is long gone in the past with apache spark doing the analytics over HDFS, And now something that even challenges spark's speed and ease: google's APACHE BEAM! an exciting new world in data analytics and MPP (massively parallel processing)
Hmm. Well, maybe. Seems like there's always someone who wants to be the first to declare a demise. I remember over ten years ago reading an article that insisted Bluetooth was already finished.
Not sure if you guys gives a damn but if you are bored like me during the covid times then you can watch pretty much all the latest movies and series on InstaFlixxer. Have been streaming with my brother for the last couple of weeks :)
@Kayden Korbin Yea, have been watching on instaflixxer for months myself :)
@Kayden Korbin yea, I have been using InstaFlixxer for since december myself :D
@Kayden Korbin Yea, I've been using InstaFlixxer for since december myself =)
@Kayden Korbin Definitely, I have been using InstaFlixxer for months myself :D
I am from traditional database background, was planning to learn hadoop and spark as a career switch, after watching this video..i need to rethink..could you please help?
SQL and Python are key right now.
If I am sticking to the Azure ecosystem, do you recommend learning stuff like U-SQL in ADLA and Scala/PySpark for Azure Databricks or Azure HDInsight Spark clusters? The new Data Flows in Azure Data Factory V2 are very interesting too, the ETL processes are constructed visually with a nice a GUI which are then compiled into Spark executables that are executed on Azure Databricks clusters. Edit: I'm talking Data Engineering here and some analytics but mainly engineering.
What you want is efficient use of compute because that is where your costs are going to come from. So when compute is not being used you want it to shut down. Look into Snowflake on Azure for something like this.
@@Intricity101 Thanks for the reply, will definitely look more into Snowflake, but it is just a data warehouse right? Microsoft have just announced the SQL Database serverless compute tier, I feel like that is very similar.
O Vallack if Data Warehousing is the purpose then that’s my recommendation. Feel free to test both.
Financial projects generally go on premises like Bigdata
Hadoop and big data, in general, is evolving. Using the right additional tools is certainly helpful. It really depends on your data needs.
You can evolve the wrong way - Darwin taught us this.
people were saying mainframe is dead some 20 yrs ago lol. But the truth is it's alive for at least next 30 yrs.These naysayers will be always there. Btw world ended on 2012 because mayan couldn't calculate beyond 2012.
sirrrrrrrrrrrrrr
Hadoop is more than a Distributed storage system. It is a comprehensive ecosystem of dozens of specific software for dozens of types of usage combined.
Lets just say that VERY few startups are considering Hadoop as their platform of choice. My personal friends that work for the big Hadoop vendors are getting crickets in their offices, and have their resumes out.
and dozens of problems.
what tool should we be learning then if hadoop is dying plz suggest
AWS S3/Athena, Azure Blob Storage/Azure DLA, Google Big Query, and Snowflake (which is one of my favs). Depends on the use case. Bottom line is, storage/query provided by native cloud vendors works just fine as an alternate to Hadoop, and it's usually cheaper.
@@Intricity101 thank you... just one question.. "Google Big Query, and Snowflake " similar to hadoop?? is java needed for this or any other programming knowledge is required as prerequisite??
@@sunayanakalekar2162 Good old SQL my friend. However, you can also use Spark with these platforms. Additionally you can do orchestration in Python.
1. Spark, Impala, Tez, Hive LLAP make query much faster now
2. By the time you upload a Terabyte of data to amazon cloud for analysis, the information is already old and useless. Specially with the organization that have daily change of more than 100GB.
I'd admit that small organization may benefit from your argument of cloud as they don't need to maintain the infrastructure but hadoop doesn't require that much love after the initial setup is done properly.
Jomphop, its actually the opposite now. The larger you are the faster you want to get off Hadoop. Especially since now the only justifiable location to store such large sums of information is in the cloud anyway. So why pay the licensing above and beyond the native cloud storage? Makes no sense. I can tell you, the business prospects of the Hadoop distributions are a shadow of their previous glory days.
Intricity101 there is also a push to decentralize cloud solutions. Not everyone is interested at hosting their data at a third party company let alone in another country. So for the case of decentralizing the cloud, or creating a regular private cloud, I will disagree with your statement, both here and in your video :-)
Ha amazing! Thanks for the update ;)
An open source solution like hadoop will always carry some drawback compared to the licensed versions. The fight is not about which technology is better compared to others since hadoop will always be the base of big data solutions & technologies, which gives good insight to big data technologies. Besides in this fast moving technology, noone can tell what is gonna happen after 5 years.
And no offense but your statement "noone cares about hadoop" during start of the video sounded like amateur.
Samant, I do mean it, people that don't have their careers locked into Hadoop really don't care about it. They're open to really understanding what's better. And Hadoop isn't better anymore. Comparing for example to Amazon S3, Hadoop less elastic, more expensive (by a large margin), less available, and less durable. I can only ask people to give the cloud a try and see for themselves. Here's a great head to head comparison done by Databricks: databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html
Agree. :-)
What do you have to say now?
Goutham Anush Aenugula It’s dead the same way mainframes are dead. It doesn’t mean that they will be unfindable, just few new adopters. That started 2 years ago. One of my good friends that worked at one of the largest distributions of Hadoop called me asking for references. He wanted to join a cloud data warehousing company. “We’re not adding new customers, we’re just trying to find new use cases for the customers we have.”
It is true not every company will be willing to go to the cloud. Just like not every person put their money in the bank.
some people want to attract view hence keep heading like.
Just got a call last week from another one of my colleagues that works at one of the largest Hadoop vendor. (I'm sure you can guess which) "It's a freggin ghost town here, is there any chance you could be a reference for me? I'm interviewing at Snowflake."
I definitely got a lot of hate mail for this video, especially on LinkedIn. But I have no problem standing behind it, because it's true.
Hadoop isn't dead.
It's slowly dying. Whatever will be left will be native services within cloud vendor solutions. This is already the case.
Let's just say the vendors of Hadoop are a graveyard. My friends that have been in that space have been in a mass exodus towards born in the cloud solutions.