What Is the Fastest Way To Do a Bulk Insert? Let’s Find Out

Milan Jovanović

zhlédnutí 10 279

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 26. 07. 2024
☄️ Master the Modular Monolith Architecture: bit.ly/3SXlzSt
📌 Accelerate your Clean Architecture skills: bit.ly/3PupkOJ
🚀 Support me on Patreon to access the source code: / milanjovanovic
What is the fastest way to do a bulk insert to a SQL database? We're going to find out in this video. We will compare EF Core, raw SQL with Dapper, EF Bulk extensions, and the SqlBulkCopy class. I'll show you the results for inserting 100, 1.000, 10.000, 100.000, and 1.000.000 records.
Fast SQL Bulk Inserts With C# and EF Core
www.milanjovanovic.tech/blog/...
Join my weekly .NET newsletter:
www.milanjovanovic.tech
Read my Blog here:
www.milanjovanovic.tech/blog
Chapters
0:00 What is a Bulk Insert?
2:28 Implementing the Bulk Insert benchmarks
9:54 Examining the Bulk Insert benchmark results
Věda a technologie

Komentáře • 71

@MilanJovanovicTech Před 2 měsíci ⁺⁴
Want to master Clean Architecture? Go here: bit.ly/3PupkOJ
Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt
@IgMa-tc1oj Před měsícem
You are using the wrong tool. Try linq2db and you will understand how easy it is to do BulkCopy, Merge, Update, Left/Right join, cross/outer apply, With hierarchy, Aggregate functions, Window functions, template table, arrays and many many sql capabilities.
@ksmemon1 Před měsícem ⁺¹
Subscribed 👍
@IgMa-tc1oj Před měsícem ⁺¹
You are using the wrong tool. Try linq2db and you will understand how easy it is to do BulkCopy, Merge, Update, Left/Right join, cross/outer apply, With hierarchy, Aggregate functions, Window functions, template table, arrays and many many sql capabilities.
@user-yj5gr6wc9e Před 2 měsíci ⁺¹²
Thank you for the video. I'd like to see a video on bulk upserts/merging data
@MilanJovanovicTech Před měsícem ⁺¹
Great suggestion!
@oneshot2579 Před měsícem
Especially for postresql
@alexanderst.7993 Před měsícem ⁺²
Milan just wanted to say, thanks to you and your C# videos, i managed to land a job. Appreciate ya pal ;)
@MilanJovanovicTech Před měsícem
Nah, that's all on you buddy. Great work 💪
@10Totti Před měsícem ⁺¹
Another best tutorial!
Thanks!
@MilanJovanovicTech Před měsícem ⁺¹
You're very welcome! :)
@vasiliylu8054 Před měsícem
Thank you, Milan! This video must be in top.
@MilanJovanovicTech Před měsícem
Glad you enjoyed it 😁
@-INC0GNIT0- Před měsícem
Thanks for doing the research !
Very insightful investigation
@MilanJovanovicTech Před měsícem
Sure thing! :)
@anonymoos Před měsícem ⁺³
I like using the BulkCopy function, it is amazingly fast for importing large datasets. One thing to note is that the column names specified for the database side are case sensitive. If there's a mismatch in case on column names, the import will fail. You can also eek out even more performance by tweaking the batch size in BulkCopy using `bulk.BatchSize = 10_000;`. Actual performance will vary based on how many columns you're inserting.
@lolyasuo1235 Před měsícem
What about primary keys? can you import data on table with incremental PK or you have to specify the PK?
@MilanJovanovicTech Před měsícem
That's an excellent suggestion. Will try it with different batch sizes.
@pilotboba Před měsícem
@@lolyasuo1235 SQL Server will generate identity values just like it does on a normal insert. There is an option though to "keep identity" so if you did have PK values you wanted retained you could use that option. Of course, if those values exist in the table then you will get duplicate insert errors.
I believe it will update the next identity value on the table after it is done if you use that option.
@FolkoFess Před měsícem ⁺¹
There is another way to speed up bulk insert for some specific cases (for example during periodic ETL process when the entire table or partition should be fully cleaned and recreated from scratch using bulk insert). In this case bulk insert can be slowed down by indexes, constraints, concurrent access , etc . The solution for this would be to do bulk insert into temporary table that does not have any indexes or constraints using ReadUncommited transaction -> after build all needed indexes -> after do partition swap operation with the main table/partition. Another advantage of this approach is that up until the last step - data from original table stays fully available . And partition swap is almost instant and atomic operation
@MilanJovanovicTech Před měsícem ⁺¹
Ah, great addition to this topic 👌
@pilotboba Před měsícem ⁺¹
It also looks like there is a library called dapper plus that has bulk insert feature as well. Also a commercial paid library.
@MilanJovanovicTech Před měsícem
Will check it out also!
@giammin Před měsícem
Really interesting! thanks
I think you can send directly the array without the need to convert to anonymous objects in dapper. Anyway it will not change much the benchmark results
@MilanJovanovicTech Před měsícem
Had some trouble with the Id field so this was a workaround
@harshakumar6890 Před měsícem
It's possible to check for dapper with executing SP that accept UDT table as parameter?
@MilanJovanovicTech Před měsícem ⁺¹
I'll see if I can update the article
@pilotboba Před měsícem ⁺¹
A few things.
You never adjusted the batch size for EF Core. It is possible to speed up inserts by increasing the batch since. I think by default it is 100.
Also bulk-copy has a way to set the batch size. By default I believe it is set to 0 which means all rows. But, its recommended to use it.
Bulk-copy by default does a non-trasacted insert. So, if there is an issue there is no way to roll it back. There is an option to have it use a transaction, but I assume that will slow it down a bit.
I'm curious if you match the bulkcopy and efcore batch size settings and enable Internal transactions in bulk-copy if the speeds would be closer?
I'm not sure, but did your code create the collection each time? Perhaps to remove the overhead of that you could create the user collection in the constructor?
@MilanJovanovicTech Před měsícem
That is great constructive criticism. I think I'll do a Part 2 of this video in a few weeks, with these remarks + some others I got. I wanted to include data creation for some reason, but I can also do a benchmark without it.
@rsrodas Před měsícem
Another alternative, if you already have files ready to import, is to use inside of SQL Server the OPENROWSET command:
INSERT INTO Table (
Col1, Col2, ...
)
SELECT
Col1, Col2, ..
FROM OPENROWSET(
BULK 'c:\myfile.txt', FORMATFILE='c:\format.xml'
)
In the XML file, you define the rules in how the file you want to import is formatted (fixed size, comma split, etc...)
@MilanJovanovicTech Před měsícem
Does it have to be an XML file? Can it work with CSV? What about JSON?
@rsrodas Před měsícem
@@MilanJovanovicTech Other option for format file is to use a non-XML...
Před měsícem
can you do this in an azure sql database?
@islandparadise Před měsícem
Love this. One quick qn: For the EFCore approaches, would the performance be consistent on Postgres as well as SQL server?
@MilanJovanovicTech Před měsícem ⁺¹
Hmm, I'm pretty sure the performance wouldn't change dramatically. However, I didn't test that.
@islandparadise Před měsícem
@@MilanJovanovicTech got it. Thanks mate you're a champ!
@musaalp4677 Před měsícem
Have you try OpenJson or other json structure with raw sql query?
@MilanJovanovicTech Před měsícem
No, I did not, but you're welcome to give it a try
@xtazyxxx3487 Před měsícem
Can you try to concatenate the insert query then try sql raw query with ef and see the results
@MilanJovanovicTech Před měsícem ⁺¹
It's what EF does with AddRange
@antonmartyniuk Před měsícem ⁺¹
Surprisingly Dapper doesn't perform well. Still I would like to see results when using Dapper with SQL Bulk Insert command.
I personally have used a EFCore.Extensions library, which is a paid one, to do the bulk inserts. My company bought a license for this library and it saved many development days for such things as bulk merge and bulk synchronize operations.
Interesting to compare its performance to sql bulk copy class
@MilanJovanovicTech Před měsícem
It's nor surprising if you understand how that specific SQL statement works with Dapper.
@dy0mber847 Před měsícem
Will results be different in case of using postgres?🤔
@MilanJovanovicTech Před měsícem
Not in relative terms between the different options
@user-yj5gr6wc9e Před 2 měsíci
😊
@MilanJovanovicTech Před 2 měsíci
🚀
@way_no6810 Před měsícem
Can u test Dapper Plus
@MilanJovanovicTech Před měsícem ⁺¹
Can do, in part 2 of this video
@belediye_baskani Před měsícem
What do you think about Bulk Update? Can you run Benchmark for us?
@MilanJovanovicTech Před měsícem
It all comes down to an UPDATE operation
@lolyasuo1235 Před měsícem ⁺²
How dapper can be 5 times slower at 1m records than efcore addall? This doesn't make sense at all.
@MilanJovanovicTech Před měsícem
Because Dapper has to unwrap the (collection) loop and run the SQL commands one by one. :)
@MatthewCrawford Před měsícem
My repository creates a datatable, inserts 25K records from a passed collection, then it sends that datatable to a sproc.
I use this same format for all Sel, Del, Upsert.
Dynamic SQL is slower than sprocs.
@MilanJovanovicTech Před měsícem
How do you apply this to a SELECT?
@Ivang017 Před měsícem
Hey Milan, any discount incoming for the The Ultimate Modular Monolith Blueprint course? I bought your Clean Architecture course and I loved it. Just wondering if there is a sale soon or discount for the Modular Monolith Course. Thanks
@MilanJovanovicTech Před měsícem
Send me an email :)
@EzequielRegaldo Před měsícem
So DataTable is EF core without paid lib?
@MilanJovanovicTech Před měsícem ⁺¹
DataTable is a .NET construct
@EzequielRegaldo Před měsícem
@@MilanJovanovicTech amazing ! Thank you for your response :D
@ExtremeTeddy Před měsícem
All shown methods are slow compared to „load data from file“. Whenever possible for large data imports use load data from file. It will load gigabyte of data within seconds. One of the best approaches in my experience is to create a temporary table for the datasource and do the load data file command. Then perform the inserts to the entity tables on the database server.
Only issue / drawback can be the network connection when loading large datasets.
@MilanJovanovicTech Před měsícem
What if we don't have a file? Would it be worthwhile storing the file locally before calling that command?
@ExtremeTeddy Před měsícem
@@MilanJovanovicTech Care to elaborate on it? Large data imports without a file or source material won't make any sense to me. LOAD DATA FROM FILE requires a file.
When a databas is the source I recommend using raw SQL rather than writing application logic.

Další v pořadí

Automatické přehrávání

EF Core Migrations Deep Dive, Applying Migration, SQL Scripts