Learn Python - Pandas merge two CSV files - Questions from the comments episode 2

Sdílet
Vložit
  • čas přidán 20. 05. 2020
  • Good by Python for Scott, hello Python for Pete (you!)
    In this video, I walk you through how to import two .CSV files with pandas and merge on a column and also merge on the index and compare values for any data errors.
    If you would like a video answering your question, please leave a comment below!
    #Python #Pandas #Data

Komentáře • 53

  • @joshuaabok3329
    @joshuaabok3329 Před 3 lety +1

    This is just awesome, man. Big ups.

  • @mohammedafsanuddin6067
    @mohammedafsanuddin6067 Před 4 lety +5

    keep up the great work! i think you should keep the series where you go on upwork(or similar freelance websites) and do jobs and show your work throughout! It provides real encourage for developers who want to create some work experience. Also your an excellent teacher ! I think this will be great for your channel

    • @MakeDataUseful
      @MakeDataUseful  Před 4 lety

      Thank you! I like your thinking, an upwork/freelance series is a smart idea. I'll get to work planning out the series.

  • @SignInWithGarry
    @SignInWithGarry Před 4 lety +1

    Hi adam, the video content is very helpful and i enjoyed learning it. I am into a job where these things run at back-end (matching of data coming in different files format from different sources and then parsing and matching it) and i have a very big curiosity to learn it and hands-on these things. Your video give me a spark(fire) to start it. A BIG THANKS for delivering this content.

    • @MakeDataUseful
      @MakeDataUseful  Před 4 lety

      Glad it was helpful! New content coming out every week now :)

  • @srijithscientia8614
    @srijithscientia8614 Před 2 lety

    Thanks for the Awesome video 🥰🥰🥰

  • @jagdishjoshi6052
    @jagdishjoshi6052 Před 3 lety

    Very useful 👌👌👌

  • @maxquin2137
    @maxquin2137 Před 4 lety +4

    Same problem as above with different number of rows. e. g one file has 70 rows with the other file has 120 rows same number of columns in both files.

  • @Jayu345
    @Jayu345 Před 4 lety +1

    An alternative to adding prefixes, the merge function has the parameter "suffixes". This will identify overlapping column names in the left and right side and allow you to define the suffix you wish to include. e.g. pd.merge(df1, df2, suffixes=("_df1", "_df2")).
    A time saver I find myself using often

    • @MakeDataUseful
      @MakeDataUseful  Před 4 lety

      I love that! Thank you for the tip, very helpful :)

  • @7Rodp
    @7Rodp Před 3 lety

    thanks really helpful.

  • @harishkonakandla
    @harishkonakandla Před 3 lety +1

    Sir where can I find this code? Have you created any separate link for uploading code?

  • @vinothkumar9796
    @vinothkumar9796 Před 3 lety

    Hello, I have a two datasets with each dataset has a different column names. How can I merge those?

  • @vijaybabaria3253
    @vijaybabaria3253 Před rokem

    Thanks for sharing these tips, is it possible to compare all columns to display only data that has mismatches without hard coding column names in the compare function? thanks a lot

  • @videofyph
    @videofyph Před 3 lety

    How can i combine 2 csv files with different columns bro?

  • @pramodhsuryajr2467
    @pramodhsuryajr2467 Před 3 lety

    Hi @Make Data Useful
    I have an Issue while comparing two CSV Files below is the case.
    1. The Structure are same for the both CSV
    2. Different number of Records in both CSV ( ex CSV1- 50 Records , CSV2- 40 Recods)
    3. I dont have a single Key column (Its a Composite key )
    Please provide me a solution I'm new to Python.
    I Tried the below way but its working if the CSV has same number of rows but its not working if it has different number of rows:
    I'm storing the data is CSV1 and CSV2 in a data frame df1 and df2 correspondingly andthe converting then to objects so that it can be easily compared.
    Source_file = str('a.csv')
    Target_file = str('b.csv')
    Source = pd.read_csv(Source_file)
    df1 = pd.DataFrame(Source)
    df1 = df1.astype(str)
    print(df1)
    #df1.dtypes
    Target = pd.read_csv(Target_file)
    df2 = pd.DataFrame(Target)
    df2 = df2.astype(str)
    print(df2)
    Getting the list of headers in df1
    header_list = df1.columns.tolist()
    for x in range(len(header_list)) :
    df3[header_list[x]] = np.where(df1[header_list[x]] == df2[header_list[x]], 'True', 'False')
    print (df3)
    df3.to_csv('Output', index=False)
    please helpout in this scenario

  • @prasannaprakash7940
    @prasannaprakash7940 Před 3 lety

    Hi, Thanks for this great video. I have few questions on how to achieve the following.
    1. For compare code, instead of I compare only one field with other field of csv files, is there a way that I can compare all the fields running in a array. If so how can I do it.
    2. This code has a limitation that columns of both csv files should be same. I am getting exception in case if I add more columns to any one of the file. How can I achieve comparison by ignoring headers/columns? Can you please explain???
    3. How can I compare if values in the fields are integers as == does not work when values in field are integers.
    Ex: df(price_field1) == df(price_field2)
    Price_field1 = 4.8
    Price_field2=5.3
    I have this questions, can u please explain...

  • @vinayakchikkorde8151
    @vinayakchikkorde8151 Před 2 lety

    I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?

  • @reinekeerthi9448
    @reinekeerthi9448 Před 2 lety

    hii, i have a doubt while comparing one column with other column in the data frame, but the column name is different, for eg: 1st column name is "Column1.subLot", the other column name is "subLot". both the columns are in different dataframe. the dataset one in excel other in json file
    can you please a video for my question please.

  • @kingofthesummer5180
    @kingofthesummer5180 Před rokem

    What would be different if the number of columns did not match ?

  • @esargf
    @esargf Před 3 lety

    Really enjoy this tutorial! This is my question:
    I want to merge 2 csv files. Each file has some different columns and an id column. When merge, each row must have matching id.
    The first csv file's id is in order, the second csv file's id is not in order and there's some rows with the same id.
    If there's multiple rows with the same id, copy the same row from csv file 1. Then, sort merged csv file 1 & 2 by id.
    csv file 1:
    id gender
    1 M
    2 F
    3 M
    4 M
    5 F
    csv file 2:
    id order_num color
    1 20 blue
    3 30 red
    2 10 blue
    3 40 pink
    4 50 blue
    5 60 red
    5 70 red
    merged csv file 1 & 2:
    id gender order_num color
    1 M 20 blue
    2 F 10 blue
    3 M 30 red
    3 M 40 pink
    4 M 50 blue
    5 F 60 red
    5 F 70 red
    I hope you'll make the tutorial on this. Thanks!!

  • @ankurmalik7388
    @ankurmalik7388 Před 3 lety

    hi I want to scrap top mba colleges from (Shiksha website) but the problem is that its page changes automatically like from page 0 - 1 by itself by scrolling down
    can you help me out how to scrap all pages, it has around 100 colleges but I can scrap only 40 colleges at a time

  • @MrTonie469
    @MrTonie469 Před 4 lety

    Is it possible to make a video to compare two CVS and throw in a config file to exclude certain columns? And possibly option to ignore headers.

    • @MakeDataUseful
      @MakeDataUseful  Před 4 lety

      Of course, have you got a couple of sample files you can link me to? I will make a Q&A video

  • @erykbazela7281
    @erykbazela7281 Před 3 lety +2

    Very nice video :) I am struggling with something rn. I have two input files with different data, indexes, column names etc. I extracted them from original tables(they were split, partially uppercase) with original position in column(indices). I have to check if the names are the same. If they are I have to save their index in original file row to the opposite so eg. csv1: 0 Bob Marley, [2,5,7] ; csv2: 0 Elon Musk[4,11]. Pls tell me if it's something difficult and worth showing or maybe I'm just inexperienced... Thank you.

    • @MakeDataUseful
      @MakeDataUseful  Před 3 lety +1

      Hey Eryk, do you want to drop me a sample of the two files with an expected output?

  • @anjanabhaskar417
    @anjanabhaskar417 Před 3 lety

    How can we combine or join two datasets without any common fields?
    For eg: I have one dataset bank.csv and another dataset salary.csv.
    For both datasets there are no common columns. So how can we join them in Python?

    • @MakeDataUseful
      @MakeDataUseful  Před 3 lety

      How would you look to join them without Python? Once we know that we can look at how to do that using Python.

  • @filo013
    @filo013 Před 3 lety

    Thanks for the video it help checked 18000 lines of data. Is there a way to check multiple rows at the same time.

    • @MakeDataUseful
      @MakeDataUseful  Před 3 lety

      There is a couple of parallel processing approaches out there, checkout dask as one example.

  • @sriharshadevraj968
    @sriharshadevraj968 Před 4 lety +1

    Thank you Adam. I am learning python and this is really helpful. In your compare function you pass in the account_type as the column to be compared, how can we modify this to compare all the columns in the file? In other words, I need to compare and list the differences for each column.
    Example:Compare Column Account_type
    Expected Output:
    Account_num Statement_1_account_type Statement_2_account_type
    1 Checking Savings
    Compare Column Statement_balance
    Expected Output
    Account_num Statement_1_Balance Statement_2_balance
    3 10 15
    and so on till we find differences in each column?

    • @MakeDataUseful
      @MakeDataUseful  Před 4 lety

      Hi Sriharsha,
      There is a couple of different ways to go about it. One approach would be to:
      Create a new column on each of your input dataframes made up of the three columns you want to compare then use these two new columns to compare.

    • @KhushbooSingh-it6pi
      @KhushbooSingh-it6pi Před 2 lety

      @@MakeDataUseful It might work for few columns, what about a dataframe with 55 columns, how would you compare?

  • @kishorekumar2743
    @kishorekumar2743 Před 4 lety

    First i need to create a excel sheet with three columns ' file1', 'file2', 'match/mismatch' like this
    and in python i need to get two csv's as input then compare those csv's , if the data's in both the csv's match the result has to be redirected to the excel file created earlier like,
    file1 file2 match/mismatch
    sample1.csv sample2.csv match
    Thank you.

    • @MakeDataUseful
      @MakeDataUseful  Před 4 lety

      That sounds completely doable, if you have some sample data drop me a link and I'll put together a video.

  • @swagatikarout8227
    @swagatikarout8227 Před 3 lety +1

    Hay can you help me to fetch multiple data through remote.