Pig Tutorial | Apache Pig Script | Hadoop Pig Tutorial | Edureka

edureka!

zhlédnutí 119 595

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 12. 2016
🔥 Edureka Hadoop Training: www.edureka.co/big-data-hadoo...
Check out our Pig Tutorial blog: goo.gl/z3dahy
Check our complete Hadoop playlist here: goo.gl/ExJdZs
This Edureka Pig Tutorial will help you understand the concepts of Apache Pig in depth. Below are the topics covered in this Pig Tutorial:
1) Entry of Apache Pig
2) Pig vs MapReduce
3) Twitter Case Study on Apache Pig
4) Apache Pig Architecture
5) Pig Components
6) Pig Data Model
7) Running Pig Commands and Pig Scripts (Log Analysis)
Subscribe to our channel to get video updates. Hit the subscribe button above.
-------------------Edureka Big Data Training and Certifications-----------------------
🔵 Edureka Hadoop Training: bit.ly/2YBlw29
🔵 Edureka Spark Training: bit.ly/2PeHvc9
🔵 Edureka Kafka Training: bit.ly/34e7Riy
🔵 Edureka Cassandra Training: bit.ly/2E9AK54
🔵 Edureka Talend Training: bit.ly/2YzYIjg
🔵 Edureka Hadoop Administration Training: bit.ly/2YE8Nf9
Instagram: / edureka_learning
Facebook: / edurekain
Twitter: / edurekain
LinkedIn: / edureka
#PigTutorial #WhatisApachePig #PigLatin #PigScript
How it Works?
1. This is a 5 Week Instructor led Online Course, 40 hours of assignment and 30 hours of project work
2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course.
3. At the end of the training you will have to undergo a 2-hour LIVE Practical Exam based on which we will provide you a Grade and a Verifiable Certificate!
- - - - - - - - - - - - - -
About the Course
Edureka’s Big Data and Hadoop online training is designed to help you become a top Hadoop developer. During this course, our expert Hadoop instructors will help you:
1. Master the concepts of HDFS and MapReduce framework
2. Understand Hadoop 2.x Architecture
3. Setup Hadoop Cluster and write Complex MapReduce programs
4. Learn data loading techniques using Sqoop and Flume
5. Perform data analytics using Pig, Hive and YARN
6. Implement HBase and MapReduce integration
7. Implement Advanced Usage and Indexing
8. Schedule jobs using Oozie
9. Implement best practices for Hadoop development
10. Work on a real life Project on Big Data Analytics
11. Understand Spark and its Ecosystem
12. Learn how to work in RDD in Spark
- - - - - - - - - - - - - -
Who should go for this course?
If you belong to any of the following groups, knowledge of Big Data and Hadoop is crucial for you if you want to progress in your career:
1. Analytics professionals
2. BI /ETL/DW professionals
3. Project managers
4. Testing professionals
5. Mainframe professionals
6. Software developers and architects
7. Recent graduates passionate about building successful career in Big Data
- - - - - - - - - - - - - -
Why Learn Hadoop?
Big Data! A Worldwide Problem?
According to Wikipedia, "Big data is collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." In simpler terms, Big Data is a term given to large volumes of data that organizations store and process. However, it is becoming very difficult for companies to store, retrieve and process the ever-increasing data. If any company gets hold on managing its data well, nothing can stop it from becoming the next BIG success!
The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is - Hadoop, which is not less than a panacea for all those companies working with BIG DATA in a variety of applications and has become an integral part for storing, handling, evaluating and retrieving hundreds of terabytes, and even petabytes of data.
For more information, Please write back to us at sales@edureka.in or call us at IND: 9606058406 / US: 18338555775 (toll-free).
Customer Review:
Michael Harkins, System Architect, Hortonworks says: “The courses are top rate. The best part is live instruction, with playback. But my favorite feature is viewing a previous class. Also, they are always there to answer questions, and prompt when you open an issue if you are having any trouble. Added bonus ~ you get lifetime access to the course you took!!! Edureka lets you go back later, when your boss says "I want this ASAP!" ~ This is the killer education app... I've take two courses, and I'm taking two more.”

Komentáře • 66

@edurekaIN Před 6 lety ⁺¹
Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Hadoop Training and Certification Curriculum, Visit our Website: bit.ly/2Ozdh1I
@sumansetty3574 Před 6 lety ⁺¹
Vineeth was really a fabulous presenter, the way he explain was really amazing and it goes to my head directly with out any confusion, thanks a lot sir...expecting more from you and i need more pig videos.
@ketanpatil3489 Před 7 lety ⁺²
Good presentation. Thanks Edureka team!!
@filipesan Před 6 lety ⁺²
Thank you, from Portugal; I am studying for my exam on "Big Data Systems", and I have missed the class on Hadoop/Pig (the problem of being a working student); Now I think I got it clearly!
@edurekaIN Před 6 lety ⁺¹
Hey Filipe, thank you for watching our video. We are glad to have helped you here. You shoulld check out the courses we provide on our website: www.edureka.co
Hope you find this useful as well. Cheers :)
@shubhambhatnagar007 Před 7 lety ⁺¹
very good presentations thank you so much edureka....
@SrijanChakraborty Před 5 lety ⁺²
Brilliant. Just what I needed
@aartichugh5975 Před 6 lety ⁺¹
Thanks for explaining every bit of running PIG script.
@edurekaIN Před 6 lety
Hey Aarti, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@gokulr94 Před 7 lety ⁺²
very helpful thanks to edureka
@greatmonk Před 4 lety
great video sir!! really enjoyed the class!!!!!
@niloychatterjee1603 Před 4 lety ⁺¹
Brilliant presentation...
@harshiniprasad7738 Před 6 lety ⁺¹
Iam very thankful to this team I thought big data is very boring subject nd no one is going to make it easy to grasp for me but edureka did 😃
@edurekaIN Před 6 lety
Hey Harshini, thank you for appreciating our work. Do subscribe and stay connected with us. Cheers :)
@kunjalsujalshah1992 Před 3 lety ⁺¹
Excellent teaching
@rhce2120 Před 7 lety ⁺¹
Thanks a lot.....Sir
@sarojsahu539 Před 5 lety
superb sir!!
@rpattnaik2000 Před 5 lety ⁺¹
Good one !!
@anirbansarkar6306 Před 3 lety
Thanks edureka, This was really a great tutorial.
@edurekaIN Před 3 lety ⁺¹
Hi : ) We really are glad to hear this ! Truly feels good that our team is delivering and making your learning easier :) Keep learning with us .Stay connected with our channel and team :) . Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )
@srinivasvemula1963 Před 5 lety ⁺¹
thank you edureka
@sudhanshumathur725 Před 6 lety ⁺¹
very well explained
@edurekaIN Před 6 lety
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@Dipenparmar12 Před 5 lety
Great explanation.. keep it up.. thanks.
@edurekaIN Před 5 lety
Thanks for the compliment! We are glad you loved the video. Do subscribe, like and share to stay connected with us. Cheers!
@sharonrosy9519 Před 5 lety ⁺¹
Tq sir
@abhishekpandey2148 Před 7 lety
happy new year to dear trainer :)
@edurekaIN Před 7 lety
Hey Abhishek, thanks for checking out our tutorial and for the wishes. Happy New Year to you too, from the trainer and from Team Edureka! :) Also, do check out this tutorial: czcams.com/video/4zXsgyN4ZDo/video.html. We thought you might like it too. Cheers!
@ravijariwala9758 Před 7 lety ⁺¹
yes
@maryjain1762 Před 3 lety
good class
@thepriestofvaranasi Před rokem ⁺¹
Sir can you share the version of cloudera quickstart vm that you're using? And it would be helpful if you could share a video of how to install it.
@edurekaIN Před rokem
Thanks for showing interest in Edureka kindly visit the channel for more videos our content creators are eagerly waiting for your suggestion to make new videos on your interest :) DO subscribe for the video update
@avnish.dixit_ Před 5 lety ⁺¹
Nice video
@himbisht08 Před 7 lety ⁺¹
very nice video, can you please tell, which is more popular in market Pig or Hive? in prospective of job.
@edurekaIN Před 7 lety
Hey Himanshu, thanks for checking out our tutorial! We cannot say for sure which one is the most popular. For example,Facebook uses Hive, whereas yahoo which has the biggest cluster in world uses Pig.
If you know SQL, then Hive will be very familiar to you. Since Hive uses SQL, you will feel at home with select, where, group by, and order by clauses similar to SQL for relational databases. You do however lose some ability to optimize the query, by relying on the Hive optimizer. This seems to be the case for any implementation of SQL on any platform, Hadoop or traditional RDBMS, where hints are sometimes ironically needed to teach the automatic optimizer how to optimize properly.
However, compared to Hive, Pig needs some mental adjustment for SQL users to learn. Pig Latin has many of the usual data processing concepts that SQL has, such as filtering, selecting, grouping, and ordering, but the syntax is a little different from SQL (particularly the group by and flatten statements!). Pig requires more verbose coding, although it’s still a fraction of what straight Java MapReduce programs require. Pig also gives you more control and optimization over the flow of the data than Hive does.
Hope this helps you make the right decision. Cheers!
@sanjeevpandey2753 Před 6 lety ⁺¹
Thanks Sir
@edurekaIN Před 6 lety
Hey Sanjeev, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@ankitsaxenamusic Před 7 lety ⁺¹
This is a wonderful tutorial with detailed explanation. I just have a query in the samle.log file. What are the parameters in REGEX_EXTRACT. Can you please explain in detail what is $0 and what is 1 in the REGEX_EXTRACT.
Thank you so much for your videos. Keep the good work going :)
@edurekaIN Před 7 lety
Hey Ankit, thanks for the wonderful feedback! We're glad you found our tutorial useful.
Here's the explanation as requested.
REGEX_EXTRACT
Performs regular expression matching and extracts the matched group defined by an index parameter.
Syntax
REGEX_EXTRACT (string, regex, index)
Terms
string : The string in which to perform the match.
regex : The regular expression.
index : The index of the matched group to return.
Use the REGEX_EXTRACT function to perform regular expression matching and to extract the matched group defined by the index parameter (where the index is a 1-based parameter.) The function uses Java regular expression form.
The function returns a string that corresponds to the matched group in the position specified by the index. If there is no matched expression at that position, NULL is returned.
Example
This example will return the string '192.168.1.5'.
REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
Hope this helps. Cheers!
@abhishekbhatia8887 Před 7 lety
nice explanation. can we get advanced pig tutorial?
@edurekaIN Před 7 lety
Hey Abhishek, thanks for checking out our tutorial! Could you please let us know which Pig topics you are looking for so we can help you better? Cheers!
@priyankagauda4420 Před 7 lety
great video sir
but, i can not find sample.log file..can you please help
@edurekaIN Před 7 lety
Hey Priyanka, thanks for checking out our tutorial! We're glad you liked it.
The files used in this tutorial are Edureka course artifacts that you can avail by enrolling into our course here: www.edureka.co/big-data-and-hadoop.
Please feel free to get in touch if you have any questions or need any assistance. Hope this helps. Cheers!
@agodavarthy Před 5 lety
Can we do data processing like creating a dictionary(like in python) using PIG?
@edurekaIN Před 5 lety
Python Dictionaries and the Data Science Toolbox. As a data scientist working in Python, you'll need to temporarily store data all the time in an appropriate Python data structure to process it. A Python dictionary works in a similar way: stored dictionary items can be retrieved very fast by their key.
@lakshmans779 Před 7 lety
Hi Team is there any PDF document for hadoop from Edureka...
@edurekaIN Před 7 lety
Hey Lakshman, thanks for checking out our tutorial.
Could you please elaborate on what you need in PDF? If it's the PPT, you can can check out related PPTs here: www.slideshare.net/search/slideshow?searchfrom=header&q=pig+tutorial+edureka&ud=any&ft=all&lang=**&sort=
You can access our complete training by enrolling into our course here: www.edureka.co/big-data-and-hadoop.
Hope this helps. Cheers!
@user-bo7iz1mi6h Před 6 lety
how u have moved the data in hadoop.?..did not get it.
@edurekaIN Před 6 lety
Hey, sorry for the delay. Using hdfs dfs -put . Hope this helps. Cheers!
@tejaswinisana1405 Před 7 lety
hello sir ,
which is better pig or mapreduce ?in terms of processing speed?
@edurekaIN Před 7 lety ⁺³
Hey Tejaswini, thanks for checking out our tutorial. Here's the answer to your query:
Both are different. Pig is a Data Analytical language used to create Map-Reduce jobs to run on large datasets. While both work in a distributed environment and hand to hand.
PIG is a data flow language, the key focus of Pig is manage the flow of data from input source to output store. A Pig is written specifically for managing data flow of Map reduce type of jobs. Most if not all jobs in a Pig are map reduce jobs or data movement jobs. Pig allows for custom functions to be added which can be used for processing in Pig, some default ones are like ordering, grouping, distinct, count etc.
Map reduce on the other hand is a data processing paradigm, it is a framework for application developers to write code in so that its easily scaled to PB of tasks, this creates a separation between the developer that writes the application vs the developer that scales the application. Not all applications can be migrated to Map reduce but good few can be including complex ones like k-means to simple ones like counting uniques in a dataset.
PIG commands are submitted as MapReduce jobs internally. An advantage PIG has over MapReduce is that the former is more concise. A 200 lines Java code written for MapReduce can be reduced to 10 lines of PIG code.
A disadvantage PIG has: it is bit slower as compared to MapReduce as PIG commands are translated into MapReduce prior to execution.
Hope this helps. Cheers!
@tejaswinisana1405 Před 7 lety ⁺¹
edureka! thanks a lot sir
@vishwajitbhagat9515 Před 3 lety
Great stuff. Can I get that log file
@edurekaIN Před 3 lety
Hi, kindly drop in your email id to help us assist you with the required files for your reference. Cheers :)
@kashishkhetarpaul3214 Před 6 lety
how can we get this log file?
@edurekaIN Před 6 lety ⁺¹
Hey Kashish! send in your email ID here and we will send you the log files.
@jenijohn876 Před 6 lety ⁺¹
sir, very good presentation. Very clear to understand. Sir, where can I find the log file? Can you Please send me to my mail-id.
@edurekaIN Před 6 lety
Hey John! You can mention your emil address in the comments and we will mail it to you.
@vivekkvr Před 7 lety
Hi,Its Nice tutorial about PIG.I just want to know that in which best case will PIG used over HIVE in real time scenarios ?
@edurekaIN Před 7 lety ⁺²
Hey Vivek, thanks for checking out our tutorial! We're glad you liked it.
You can use PIG in case where your data is unstructured (it does not have a schema). PIG does not requires you to give schema of the file at the time you are loading(writing) it onto HDFS. It follows schema on read. Whereas HIVE simulates SQL like behaviour over HDFS(which means schema on write). Suppose you have to process a novel written by Shakespeare or a speech given by Donald Trump. In this case you will need PIG as these things(text files) are not structured and you can't write a novel in table (which requires you to provide schema). But, if you have a table with fixed column names and in each column the data type remains constant, then you will use HIVE.
Hope this helps. Cheers!
@shivkumar70 Před 7 lety
Thanks for posting informative videos.
I have tried pig script as it was explained in the video. But it got failed. Can you please let me know, How to make it success ?
Content of sampleLog.pig:
log = LOAD '/sample.log';
LEVELS = foreach log generate REGEX_EXTRACT($0,'(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)', 1) as LOGLEVEL;
FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null;
GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL;
FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL, COUNT(FILTEREDLEVELS.LOGLEVEL) as COUNT;
RESULT = order FREQUENCIES by COUNT desc;
DUMP RESULT;
hduser@ubuntu:~$ pig /home/hduser/HDFS_Practice_Dir/new_edureka/sampleLog.pig
Failed Jobs:
JobId Alias Feature Message Outputs
job_1491887529789_0011 FILTEREDLEVELS,FREQUENCIES,GROUPEDLEVELS,LEVELS,log GROUP_BY,COMBINER Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/sample.log
.
.
.
.
.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/sample.log
Input(s):
Failed to read data from "/sample.log"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1491887529789_0011 -> null,
null -> null,
null
2017-04-10 23:24:32,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2017-04-10 23:24:32,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias RESULT
Details at logfile: /home/hduser/pig_1491891860556.log
Log file content:
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias RESULT
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias RESULT
.
.
.
Caused by: java.io.IOException: Couldn't retrieve job.
at org.apache.pig.PigServer.store(PigServer.java:1083)
at org.apache.pig.PigServer.openIterator(PigServer.java:994)
... 13 more
================================================================================
@edurekaIN Před 7 lety
Hey Shiva Kumar, thanks for checking out our tutorial. We're glad you liked it.
The error is self-explanatory. The error is " Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/sample.log" which clearly states that your input file path is wrong and sample.log does not exists at that location. The reason that it did not gave an error when you enter ' a = load '/sample.log' ' is that PIG starts a map-reduce job only when you type a dump statement. When you typed dump it started a mapreduce job and found error in first line of your pig script. Try checking if the file really exists at "hdfs://localhost:9000/sample.log".
Hope this helps solve the issue. Cheers!
@sumitarora6429 Před 5 lety ⁺¹
Thanq so much sir
@pranupranup8285 Před 6 lety ⁺¹
yes
@TheMrTalliban Před 6 lety
yes
@RafaelDuarte Před 6 lety
yes

Další v pořadí

Automatické přehrávání

Hive Tutorial 1 | Hive Tutorial for Beginners | Understanding Hive In Depth | Edureka