Convert Excel Files to CSV using Python | Working with Large Excel Files in Power BI

Sdílet
Vložit
  • čas přidán 5. 09. 2024

Komentáře • 72

  • @tasfarsowad7612
    @tasfarsowad7612 Před měsícem

    This is exactly how i want to learn programming.. thanks mannnnn

  • @bardiakhorshidi5711
    @bardiakhorshidi5711 Před 2 lety +1

    pandas is my home and excel is my room, a great journey in the native environment
    thanks for that

  • @victor_wang_1
    @victor_wang_1 Před 2 lety +4

    Nice video. A couple remarks: If you're going to use scripting to convert the files to csv, why not just combine them all into 1 csv? Also, why not just use vba which requires no installation or IT privileges?

  • @shubhabratadey
    @shubhabratadey Před 2 měsíci

    Great learning. Can you please start another channel teaching us python from the very beginning and its uses so that people like us can learn from you as well? Just a thought though...🙂
    2 questions with respect to this video.
    1. Can this be used on a SharePoint folder? If yes, how to do that?
    2. Can this code be run automatically? If my system is shut down, can it run following a specific schedule like using Power Automate or any other means?

  • @HachiAdachi
    @HachiAdachi Před 2 lety +1

    Power Automate would be another option especially if no-code-low-code is your jam. I have no idea about performance, though.

  • @emilmubarakshin49
    @emilmubarakshin49 Před 2 měsíci

    Great to see you diversifying your offerings

  • @tasfarsowad7612
    @tasfarsowad7612 Před měsícem

    well it worked, but i had to do some modification.
    First, i needed to install pip install xlrd
    In the code, I had to add one more line.Here's the code:
    import os
    import pandas as pd
    path = "F:\\python\\pythn csv project"
    files = os.listdir(path)
    for eachfile in files:
    cleanfilename = eachfile.replace(".xls", "")
    if eachfile.endswith(".xls"):
    full_path = os.path.join(path, eachfile)
    xlfile = pd.ExcelFile(full_path)
    sheets = xlfile.sheet_names
    for eachsheet in sheets:
    sheedata = xlfile.parse(eachsheet)
    csvname = cleanfilename + "-" + eachsheet + ".csv"
    sheedata.to_csv(csvname, index = False)
    the code used in this video was giving "file not found error"
    then i add the line with the help of chat gpt "full_path = os.path.join(path, eachfile)"

  • @vashisht1
    @vashisht1 Před 2 lety

    And all these years I used good old vba to get list of filename in folder and then run another macro to change the filename...I also wanted to share that there are still a lot of people in this world who do all this manually.....thank you sir for helping all ....my client does not allow python to run on citirx that why I use vba....but I am myself a bit skeptical whether vba will be able to handle such large data set..

  • @Raymclau
    @Raymclau Před 2 lety

    Chandeep, I appreciate your videos very much
    Thank you so much you are just an awesome teacher

  • @tak0331
    @tak0331 Před 2 měsíci

    This is really good Chandeep, thank you!

  • @yossiper7221
    @yossiper7221 Před 2 lety +1

    What kind of performance problem do you have with excel as data source ? Is it the refresh time of the report taking too long ? Or is it when you interact with the report ( slice etc) ?

    • @GoodlyChandeep
      @GoodlyChandeep  Před 2 lety

      See this - blog.crossjoin.co.uk/2018/08/02/comparing-the-performance-of-csv-and-excel-data-sources-in-power-query/

  • @SWor-tj2nb
    @SWor-tj2nb Před 2 lety +1

    I try to covert excel size 40 mb to csv that using more time about 4-5 mins
    in my opinion, this solution might be suitable for excel small size

    • @ryansodhi1815
      @ryansodhi1815 Před rokem

      Same I have a 100 mb file and I'm not sure how much time it would take to make that into csv

  • @bhagyapante3022
    @bhagyapante3022 Před 6 měsíci

    Hi ..Could you please let me know how we can handle if there are filters on columns..I used a code similar to yours but it is not working on the files with filters

  • @MaddyMugunth
    @MaddyMugunth Před 2 lety

    Just wow, Thanks a bunch Chandeep.

  • @nikhilmartha4832
    @nikhilmartha4832 Před rokem

    That's a great video. I have a doubt, like if we wanted to do the same with the file in aws S3 instead of local file. How to proceed.

  • @tmaris
    @tmaris Před 2 lety

    Great! Definitely trying this. Thanks!

  • @upmdosadno
    @upmdosadno Před 2 lety +1

    i would recommend using path its core module and easier to work. In my opinion

    • @GoodlyChandeep
      @GoodlyChandeep  Před 2 lety

      Can you please share the exact script!

    • @Crow2525
      @Crow2525 Před 2 lety

      I think path from pathlib.
      A - avoids double "//" in your path,
      B - allows you to chose the basename, rather than replace the ext,
      C - allows you to loop through the extensions rather than whole filename.

  • @manideepak6820
    @manideepak6820 Před 2 lety

    Thank you upgrading our knowledge chandeep
    A small doubt
    Can I convert a structureed table in a Excel sheet to a CSV using python

    • @GoodlyChandeep
      @GoodlyChandeep  Před 2 lety

      I haven't tried it but I'll try to follow this approach.
      - Using OpenPyXl library I'll try to get the table pulled up in Python
      - Then convert that table to a csv.

    • @manideepak6820
      @manideepak6820 Před 2 lety

      @@GoodlyChandeep thank you chandeep

  • @christopherlawes9286
    @christopherlawes9286 Před rokem

    Super good instruction. Thank you!

  • @sauravkragrawal
    @sauravkragrawal Před 11 měsíci

    How to loop through sub folders ?

  • @X_x_kingfisher_x_X
    @X_x_kingfisher_x_X Před rokem

    MashaAllah you were good in there.

  • @vinaykhankari6703
    @vinaykhankari6703 Před rokem

    Dear sir please share video on creating pivot table reports using pandas.

  • @bhuvanashri-yp5cw
    @bhuvanashri-yp5cw Před rokem

    nice video and superb explanation

  • @daphnednls
    @daphnednls Před 11 měsíci

    Great solution!

  • @josericardo3015
    @josericardo3015 Před rokem

    Many Tks for the video, Chandeep!
    If the data inside the sheet starts in line 6, how to adjust the pyhton code?

    • @GoodlyChandeep
      @GoodlyChandeep  Před rokem

      You can use the Table.Skip function in Power Query

  • @NavinKumar-tv9hg
    @NavinKumar-tv9hg Před rokem

    beautiful explanation

  • @mfachry3012
    @mfachry3012 Před rokem

    AttributeError: 'ExcelFile' object has no attribute 'to_csv'
    How do i fix this ?

  • @BigtsGameTavern
    @BigtsGameTavern Před 2 lety

    Awesome, this was very helpful 👍

  • @krishnapatel3140
    @krishnapatel3140 Před 2 lety

    Pls make a video for creating XML files.

  • @elyorkeldiyorov3641
    @elyorkeldiyorov3641 Před rokem

    Hello sir, this was really helpful! Thank you for that, but while I am converting excel files to csv I am losing zeros before the number. For example if I have '00356' in excel file, after converting it into csv it is becoming '356'.

    • @codingwannabe
      @codingwannabe Před rokem +1

      Here are two possible scenarios for that, either you need to specify and change the data type to string before converting to csv or you're viewing the data in excel. When you view data in csv format using excel it will not display the leading zeros even though they still exist in the file itself . You can test this by importing the csv file into a new file using "get data" under the "Data" tab and making sure the format of the data in the power query is in string format. Or you can read that same CSV file back into python and print the data. You'll notice that, if the data itself has the right format, the zero still exists.

  • @einoconsult5563
    @einoconsult5563 Před 2 měsíci

    I see that the fist row of my data is : Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8
    because I have 9 columns, any clue ?
    Can it handle multiple headers files ? Does the data has to start in A1 ? In the comments @upmdosadno offered a different method to use path , @Crow2525 shared the script
    After a year, can we have a feedback or an updated version
    Anyway to make the variable path dynamic that it picks up the path where the script is ?

  • @hemant5757
    @hemant5757 Před 2 lety

    Hi chandeep can you also show how much good performance you get when you loaded that csv data instead of xl data...you said earlier it was taking 2.5 hours in processing xl data in power query

    • @GoodlyChandeep
      @GoodlyChandeep  Před 2 lety +1

      See this - blog.crossjoin.co.uk/2018/08/02/comparing-the-performance-of-csv-and-excel-data-sources-in-power-query/

  • @abhijeetshetye8785
    @abhijeetshetye8785 Před 2 lety

    Very helpful video👍🏻🤘🏻 Thnx

  • @techuser876
    @techuser876 Před rokem

    Very good tutorial👍

  • @KuldeepSingh-nq1vi
    @KuldeepSingh-nq1vi Před 2 lety +1

    Nice Video Bro, pls upload next video of M Language.

    • @GoodlyChandeep
      @GoodlyChandeep  Před 2 lety +1

      Next up is List.Accumulate 😉

    • @KuldeepSingh-nq1vi
      @KuldeepSingh-nq1vi Před 2 lety +1

      Thanks Bro, After this ,pls make a video on "each" and underscore (_) statements in M Language Series.

    • @manideepak6820
      @manideepak6820 Před 2 lety

      @@GoodlyChandeep thank you and most waiting topic

  • @venkateswaram2285
    @venkateswaram2285 Před rokem

    xlfile = pd.ExcelFIle(eachfile) is not working throwing error as Filenot found error eventhough we installed latest openyxl

  • @johnpineda7692
    @johnpineda7692 Před 11 měsíci

    Thanks Bro!

  • @marcosoliveira8731
    @marcosoliveira8731 Před 9 měsíci

    Nicely done sir. +1

  • @thebhaskarjoshi
    @thebhaskarjoshi Před 2 lety

    New Learning!

  • @inaammuaz2284
    @inaammuaz2284 Před 2 lety

    Thank you

  • @aliyaqoob750
    @aliyaqoob750 Před 2 lety

    Nice but why you did not go for vba?

  • @smartwork4768
    @smartwork4768 Před 2 lety

    Thank you dear.

  • @elicesroman7500
    @elicesroman7500 Před 2 lety

    Thx for your video. I'm running the code in Jupyter lab and I had a couple of issues for example I could not produce the csv files and I could not use the argument 'eachfile' in the pd.excel.File. Any idea? Do you think it is because I'm running the code in jupyter lab when it should be run on a proper IDE?

    • @GoodlyChandeep
      @GoodlyChandeep  Před 2 lety

      I am not sure.. try doing that in VS Code.

    • @vigneshwarannallasamy
      @vigneshwarannallasamy Před 2 lety

      Have you tried using the exact name pd.ExcelFile(eachfile)?
      You said pd.excel.File. If you used this then it's wrong. Try the above one.
      Generally using IDEs will show suggestions as you type. Jupyter lab is great to work on data visualization but for writing scripts it is better to use IDEs like Pycharm, VS....

    • @ulearning4life364
      @ulearning4life364 Před 2 lety

      It is not because of using a different IDE,
      Pd.excelfile is a pandas method and not a python base method. Therefore, You must install Pandas package before you can import pandas. CHANDEEP had done the pip install but was not covered in the video

  • @mohamedchatar4393
    @mohamedchatar4393 Před 2 lety

    Hello there, I would like to thank you for this great video but I also wanted to ask you about something: when I did this python script at the line where I had to convert the eachfile variable to an Excel by using xlfile = pd.ExcelFile(eachfile) I had this error : FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\$company_sales.xlsx'
    It is beacause of the hidden files , how can I avoid this ?

  • @AMIN-ub8ct
    @AMIN-ub8ct Před 2 lety

    Hello Guys, Even after Power Query with all its solutions. How really its needed to learn Python or R. I know excel can't handle more than 1M row, but are there any other reasons. I am asking because with Excel hypnotically I can feel the data with my hand

    • @StefanoVerugi
      @StefanoVerugi Před rokem +1

      with Excel & PQ you don't have 1M limitation, for data handling R+tidyverse library give you lots of power, Python is always good to learn because is so widespread. If you manage to pass the initial learning curve you won't regret it.

  • @Aviji2525
    @Aviji2525 Před 2 lety

    Any new batch coming ??