I had VDEV Layouts all WRONG! ...and you probably do too!

Sdílet
Vložit
  • čas přidán 4. 06. 2024
  • Look, it's hard to know what VDEV layout is the best for #TrueNAS CORE and SCALE. Rich has built numerous different disk layouts and never really saw much of a performance difference between them. So, he's endeavoring to find out. In this video, we show you the results of our testing on which data VDEV layout is the best and whether read/write caches (L2ARC & SLOG caches) actually make a difference. Spoiler alert, we had to go get answers from the experts! Thank you again @TrueNAS, @iXsystemsInc, and Chris Peredun, for helping us answer the tough questions and setting us straight!
    Head over to the TrueNAS website for more details on the Mini-R here: www.truenas.com/truenas-mini/
    *GET SOCIAL AND MORE WITH US HERE!*
    Get help with your Homelab, ask questions, and chat with us!
    🎮 / discord
    Subscribe and follow us on all the socials, would ya?
    📸 / 2guystek
    💻 / 2guystek
    Find all things 2GT at our website!
    🌍 2guystek.tv/
    More of a podcast kinda person? Check out our Podcast here:
    🎙️ www.buzzsprout.com/1852562
    Support us through the CZcams Membership program! Becoming a member gets you priority comments, special emojis, and helps us make videos!
    😁 www.youtube.com/@2GuysTek/mem...
    *TIMESTAMPS!*
    0:00 Introduction
    0:40 OpenZFS and VDEV types (Data VDEVs, L2ARC VDEVs, and Log VDEVS, OH MY!)
    2:21 The hardware we used to test
    3:09 The VDEV layout combinations we tested
    3:30 A word about the testing results
    3:53 The results of our Data VDEV tests
    5:22 Something's not right here. Time to get some help
    5:42 Interview with Chris Peredun at iXsystems
    6:00 Why are my performance results so similar regardless of VDEV layout?
    7:30 Where do caching VDEVs make sense to deploy in OpenZFS?
    10:04 How much RAM should you put into your TrueNAS server?
    11:37 What are the best VDEV layouts simple home file sharing? (SMB, NFS, and mixed SMB/NFS)
    13:27 What's the best VDEV layout for high random read/writes?
    14:09 What's the best VDEV layout for iSCSI and Virtualization?
    16:41 Conclusions, final thoughts, and what you should do moving forward!
    17:10 Closing! Thanks for watching!
  • Věda a technologie

Komentáře • 160

  • @TrueNAS
    @TrueNAS Před rokem +116

    Chris's expertise is astounding, we're glad to have him in the iX Family!
    This video is a must watch for anyone looking to expand their TrueNAS Knowledge. As always, fantastic job on the video 2GT Team!

    • @jonathan.sullivan
      @jonathan.sullivan Před rokem +1

      Send me a mini please

    • @RobSnyder
      @RobSnyder Před rokem +2

      Shoutout my presales engineer Tyrel cant wait for my M60 to show up in sunny FL.

    • @TrueNAS
      @TrueNAS Před rokem

      @@RobSnyder Tyrel is the best! We cannot wait for it to land in your hands!

    • @MattiaMigliorati
      @MattiaMigliorati Před 11 měsíci

      May I ask for a detail ?
      Why with 12 drives in RaidZ2 there are 2 R/W queue ? (Minute 14:07)

    • @chrisparkin4989
      @chrisparkin4989 Před 11 měsíci +1

      @@MattiaMigliorati Chris says 2 x 6 disk Z2 so 2 vdevs = 2 concurrent read/write requests. The more vdevs the faster you go.

  • @ScottPlude
    @ScottPlude Před rokem +26

    This is fantastic stuff!
    I have noticed the performance improvements by throwing more RAM in a box. My friends complain that ZFS is a RAM hog. I just tell them that if you spent money on memory, wouldn't you want the server to use it all? By watching ZFS use up all the memory, I know that it is respecting my hard-earned dollars and putting that memory to work. I never want my resources sitting idle while performance suffers.
    Thanks for this video!

    • @charliebrown1947
      @charliebrown1947 Před 10 měsíci +1

      zfs will only use the amount of memory you specify. by default, this is 50%

    • @chrisparkin4989
      @chrisparkin4989 Před 7 měsíci

      @@charliebrown1947 that only applies to ZFS on Linux

    • @inderveerjohal7218
      @inderveerjohal7218 Před 6 měsíci

      So I’m currently running a 13500 with 64Gb of DDR5… I have a 100TB ZFS pool for my Plex media and my video and photo editing footage, which I want to edit directly off the NAS… would I benefit from doubling up to 128GB?

    • @chrisparkin4989
      @chrisparkin4989 Před 6 měsíci

      run arc_summary | more from the CLI and look out for 'Cache hit ratio' and check your percentage. If it's approx 95% or higher then probably not worth the extra RAM. Much lower then yeah. The hit ratio shows you as a percentage what amount of your reads are coming from RAM. This parameter resets at reboot so will work best on a system that has been running in a typical fashion for a while.@@inderveerjohal7218

    • @ScottPlude
      @ScottPlude Před 6 měsíci

      without a doubt. @@inderveerjohal7218

  • @lumpiataoge9536
    @lumpiataoge9536 Před 7 měsíci +8

    This is probably one of the best videos on truenas on yt. I've learned alot thanks

  • @artlessknave
    @artlessknave Před rokem +14

    Note that, like most of these videos, the zil/log is being confused with a write cache.
    It's frustrating to see ix reps reinforce the misunderstanding.
    It is not a write cache, it's a write backup. As long as the write works normally, the zil will never be read at all. Zfs write cache is in RAM (transactions groups), the zil or slog backs that up for sync write in case the ram copy is lost.
    Poorly designed slog will slow your pool down, since it's NOT a write cache, you are adding another operation.

    • @Burnman83
      @Burnman83 Před 11 měsíci

      Exactly, the benefit of the SLOG is the system to reply "sync write done" as soon as this backup is written, while it will not send it as long as the data is only in RAM.
      So the physical write process is not accelerated, but the overhead implemented by sync writes and the required replies is moderated by a very fast SLOG device that will allow much quicker replies to be sent.

    • @charliebrown1947
      @charliebrown1947 Před 10 měsíci +1

      @@Burnman83 except it is accelerated because it's been written to your fast slog device, which is just as good as writing it to the spinners. it IS effectively working as a cache also.

    • @Burnman83
      @Burnman83 Před 10 měsíci

      @@charliebrown1947 Yes it accelerates the capability to reply positive to a sync write, because writing to slog will already trigger the "ack", but it is NOT acting as a cache, because in that case it would delete the written data from RAM and then transfer from SLOG to disk. Afaik this is not what is happening. Data is kept in RAM and written from RAM to SLOG and disk in parallel. The only advantage this is the Delta between the SLOG SSDs acknowledging the write vs the disks, which means the smaller the files the more acceleration and vice versa, but no, it is still not a cache ;)

    • @charliebrown1947
      @charliebrown1947 Před 10 měsíci

      @Burnman83 once the data is written to the slog it is as good as written to the pool. It can be purged from arc and it is literally behaving as a cache. I don't know why you're trying to say it isn't. The data is cached in the slog and written to pool later.

    • @Burnman83
      @Burnman83 Před 10 měsíci

      @@charliebrown1947 ARC is RAM-based read cache, but I know what you wanted to say.
      From all I know this is not the case, SLOG is never read, only written and in case of a disaster, power outage or whatever read later during the remediation. Otherwise data remains in RAM, effectively blocking it and slowing down the transfer when you are running out. That means, big SLOG is not worth it, as it will only help for bursts anyway.
      If you had a piece of vendor documentation for me telling otherwise, please go ahead and prove me wrong, but until then I stick with what I learned, and not with what I think would make sense.
      This is by no means meant as an offense, I'd love if you were right, would give me all the tools I need to build lightning fast SLOG and force sync on all writes and like that have permanent insane write performance, but I'm afraid this is indeed not how SLOG works.

  • @SteveHartmanVideos
    @SteveHartmanVideos Před 3 měsíci +1

    Bro.... this cleared up a TON of the same questions I also had.... one big thing that i learned was that write cache SLOG is not used in SMB Shares, where is how I use my NAS at home. But if my VMs were writing to the same pool, then maybe it would benefit from that. The other thing I learned was that the "write every 5 seconds" is really "up to 5 seconds unless something tells it to write" i found this super helpful. I'm glad you asked all the same questions.

  • @foxale08
    @foxale08 Před rokem +12

    FYI: Optane nvme (U.2) is basically ideal for slog. I use two of the cheap 16GB NVME versions in a stripe to decent effect (boosted ~30MB/s sync writes to 250MB/s sync writes (till they fill.)) I am more concerned about unexpected power loss than those drives failing. Obviously a mirror would be wiser.

    • @2GuysTek
      @2GuysTek  Před rokem +6

      That's the biggest risk you have is power loss and the uncommitted SLOG data. Get yourself a UPS just to give yourself some breathing room!

    • @gwojcieszczuk
      @gwojcieszczuk Před 5 měsíci

      @@2GuysTek One should get SSD disks (like NVMe) with PLP (Power Loss Protection).

  • @terrylyn
    @terrylyn Před 3 měsíci

    Huge thanks to you and Chris for explaining all this for me, exactly what I needed as I am setting up my first TrueNAS.

  • @temp50
    @temp50 Před 7 měsíci

    Thank you! The interview was gold!

  • @Traumatree
    @Traumatree Před 11 měsíci +4

    Did those tests a few years ago with NMVe and got to the same conclusion and explanation. For the entreprise with big data access, you will need those cache, but for labs or small entreprises, the best optimization to get is ADD MORE RAM. That's it! Once you understand this, everything is quite simple.

  • @wagnonforcolorado
    @wagnonforcolorado Před rokem +4

    Concise and straightforward explanations. My TrueNAS server has 24GB of RAM, and I am using it primarily for backups of my home lab VMs and a network share for the occasions I decide to stash data away. Now I know that I could probably pull that SSD and use it for something else. Unfortunately, you have now planted the seed that I might want to reconfigure my VDEVs into 2 mirrors, versus a Z1. May have to dedicate an afternoon to a new project!

    • @2GuysTek
      @2GuysTek  Před rokem +1

      I know I made changes to my pools after going through this! Thanks for the comment!

  • @murphy1138
    @murphy1138 Před 9 měsíci +2

    Great Video and so thankful for the internview with Chris. Amazing.

    • @2GuysTek
      @2GuysTek  Před 9 měsíci

      Chris is awesome! All of the people at iX are fantastic! We really appreciated their insight on this video!

  • @adam872
    @adam872 Před měsícem

    Back in my Oracle DBA + Unix sysadmin days, we had an acronym: SAME (stripe and mirror everything). It kinda still applies, especially with spinning disks. Disk is cheap, mirror everything if you're concerned about write performance and fault tolerance. Of course if your data sets can fit on SSD or NVME then do that and get on with your life (I would still mirror it though).

  • @LackofFaithify
    @LackofFaithify Před 9 měsíci +14

    TLDR: the average user doesn't go hard enough to notice the fact that, yes, there is a difference between reads depending on your vdev layouts.

  • @misku_
    @misku_ Před 18 dny

    Great insight, thanks Chris!

  • @mariozielu
    @mariozielu Před 8 měsíci

    Wow! Such an informative video! Thank you so much!

  • @jeffhex
    @jeffhex Před rokem +1

    Wow! That was EXCELLENT!

  • @fredericv974
    @fredericv974 Před 7 měsíci +1

    This was extremely helpful.
    I'm in the process of doing my homework to migrate my home server to TrueNAS and was planning on spending on SSD's for caching. Now I know to get more memory instead.

    • @2GuysTek
      @2GuysTek  Před 7 měsíci

      So happy it helped! That's for watching!

  • @blademan7671
    @blademan7671 Před 11 měsíci +1

    Did you retest with data sizes that exceeded the memory cache?

  • @bobjb423
    @bobjb423 Před 6 měsíci

    Dang! Thanks Rich. I definitely got so many of my burning questions answered.

  • @blackryan5291
    @blackryan5291 Před 4 dny

    This was a very informative video. Kudos for sharing

  • @CalvinHenderson
    @CalvinHenderson Před 10 měsíci +2

    This was a interesting discussion.
    I am interested in the 20-100 tb realm of storage and RAM.
    Also this was focused more on the scale side of discussion and not so much on the NAS side.
    Helpful as starting point of discussion but there are more questions to discuss.
    Also I would like to see the tb transfers rather than the mb and gb.
    Former DAS user wandering in the darkness of NAS.

  • @zoranspirkoski1342
    @zoranspirkoski1342 Před 11 měsíci

    Excellent video, thank you!

  • @kungfujesus06
    @kungfujesus06 Před 7 měsíci +2

    5:03 : That's because a SLOG is not exactly a write cache. This is only going to speed up synchronous workloads, and mostly will benefit random writes, not sequential.

  • @user-bt2om7nf1x
    @user-bt2om7nf1x Před 2 měsíci

    Thanks both for explaining ZFS, really a great video.

  • @TechnoTim
    @TechnoTim Před 9 měsíci +2

    Great discussion!

  • @manicmarauder
    @manicmarauder Před rokem

    This is some great info. Awesome vid.

  • @ciaduck
    @ciaduck Před 16 dny

    ARC, Sync TGXs, SLOG, and L2; the rabbit hole is very deep. I've spent a lot of time understanding how all this works. I wonder how many hours I've sat watching `zpool iostat -qly 10` trying to actually understand my workload.

  • @VenoushCZ
    @VenoushCZ Před rokem +1

    Is there a size limit by design for the SLOG cache? I read somewhere that bigger than 8GB storage is not used in the process.

    • @TheRealJamesWu
      @TheRealJamesWu Před rokem +2

      That's not a size limit by design, but a practical limitation of real world use case, the amazing Lawrence Systems goes into mathematical details about how those numbers came about in this video: czcams.com/video/M4DLChRXJog/video.html

  • @maciofacio364
    @maciofacio364 Před 9 měsíci

    fantastic!!! thank you 😙

  • @SyberPrepper
    @SyberPrepper Před rokem

    Great info. Thanks.

  • @sevilnatas
    @sevilnatas Před 9 měsíci

    Question, I have 2 x 2gb nvme sticks that I am attaching via carrier boards to slimsas ports (motherboard only has 1 M.2 slot, using for OS drive, and I don't have anymore PCIe slots left) can I mirror them and then partition them, so I can use a small parts of them for discrete caches (read & write) and then the rest for SLOG? The primary use of this NAS is going to be VM hosting with a smigg of fileshare. Also, what is the suggested block size for VM hosting scenario vs a fileshare scenario?

    • @2GuysTek
      @2GuysTek  Před 9 měsíci

      You cannot carve up parts of disks for caches in ZFS unfortunately, only whole disks. If you're going to be running VMs, get as much RAM in your host as you can, and build out a SLOG for sync writes out of the two NVMe SSDs you've added. I think a read cache is less valuable in that scenario.

    • @sevilnatas
      @sevilnatas Před 9 měsíci

      @@2GuysTek Will do thanks! Was hoping to be able to put small files reads on NVME for a specific reason but oh well. The storage disks are all SSDs anyway. I can put 64gb of RAM right now. Will see if I can boost that later. Thanks again...

    • @mdd1963
      @mdd1963 Před 8 měsíci

      A pair of SATA SSDs more than enough for OS; no need to waste NVME on OS.

  • @davidtoddhoward
    @davidtoddhoward Před 11 měsíci

    So helpful.. thanks 🙏

  • @DangoNetwork
    @DangoNetwork Před rokem

    There is a very old but still relevant blog on testing up to 24 drives with different configuration performance for HDD and SSD.

  • @Fishfinch
    @Fishfinch Před 6 měsíci

    I've got 8x 4TB drives in RAIDZ2 and 2x 500GB SSD. I need NAS mostly for Plex (movies and music) and Nextcloud/Piwigo (photo sync with iPhone). Should I use 2xSSD as ZFS L2ARC read-cache or maybe make a mirror with these SSDs and use as a pool for plugins?

    • @florentandelenarobineau4413
      @florentandelenarobineau4413 Před 4 měsíci

      I had the exact same question as you. My conclusion is that I'm almost certain you don't want to use it as L2ARC.
      My reasoning:
      1) Are you sure you would benefit from an L2 read cache anyway? Is your typical working dataset that is accessed repeatedly larger than what can fit in memory? If so, can you increase the RAM? It will be much faster, and the cache algorithm for data in RAM is more sophisticated than data in L2ARC.
      2) you will probably kill your SSDs quickly, unless they are high endurance (enterprise grade SSD or Optane)

    • @BoraHorzaGobuchul
      @BoraHorzaGobuchul Před 3 měsíci

      you don't need to mirror the l2arc. Its data is already present in the pool so if it fails you don't face any data loss risks.
      I personally don't think adding l2arc will improve your performance significantly to notice, but I'm not a zfs guru.
      This is generally useful when you're caching HDDs with fast NVMe SSDs, for heavy-IOPS stuff like video editing.

  • @beardedgaming3741
    @beardedgaming3741 Před rokem

    so what is z1 best for?

  • @leozendo3500
    @leozendo3500 Před 6 měsíci

    is it possible to do one on encryption? vdev level encryption vs dataset encryption vs 2 layers of encryption. I feel like it will be relevant to many people

  • @NavySeal2k
    @NavySeal2k Před rokem +5

    Sadly they didn’t talk about special vdev…

  • @whitedragon153
    @whitedragon153 Před 6 měsíci +1

    Great video!

  • @sanitoeter4743
    @sanitoeter4743 Před 5 měsíci

    thanks for the video!

  • @rolling_marbles
    @rolling_marbles Před 4 měsíci

    Nice video, and hearing from ixSystems directly helps re-enforce things for me.
    Mirrorred vDEVs for storage using high capacity HDD is fine as long as you have one or more high end, DLP enabled SSDs in front of them as Log vDEVs. I don’t have a L2ARC because I have 128GB RAM and ARC is only consuming 62GB.
    I run iSCSI for vSphere on 10GB networking and never had a problem with disk performance.

    • @florentandelenarobineau4413
      @florentandelenarobineau4413 Před 4 měsíci

      If I'm not mistaken, your ARC is 62 GB more or less by definition - by default, the ARC will occupy (up to) half your memory (this is something that can be tuned).

  • @Jessehermansonphotography
    @Jessehermansonphotography Před měsícem

    So, one photographer, working on small projects and regular file transfers it really doesn’t matter which raid I choose as long as I have as much as I can fit in there to speed up the transfers? Makes sense if that is accurate.

  • @geesharp6637
    @geesharp6637 Před rokem

    Wow, great information.

  • @MikelManitius
    @MikelManitius Před 2 měsíci

    Great video, thanks. But one correction. You keep referring to a “write cache” when talking about the SLOG. But it’s really a ZIL (ZFS Intent Log) and doesn’t work the way you might think that a “write cache” might. Chris glossed over it. Going into that in more detail would be useful because this makes a big difference when sizing for the ZIL, which is mostly based on your network throughput.

  • @neccron9956
    @neccron9956 Před 6 měsíci

    What about Special device/cache attached to a Pool?

  • @MarkHastings-mu8dm
    @MarkHastings-mu8dm Před rokem

    The other factor I’d like to hear more about with vdev design decisions is future expansion potential. Home users may start small (2-4 disks) then add more over time. Are mirrors really the only way to go to expand an existing pool? Aside from potentially creating an entirely new pool, which has other trade offs.

    • @2GuysTek
      @2GuysTek  Před rokem +2

      In 2021 there was talk about OpenZFS adding single-disk add functionality to an existing RAIDZ VDEVs, however from everything I've read it sounded kinda kludgy and not very easy. I'm not sure where they're at with it today, I'll dig into that. That being said, you can expand a pool with RAIDZ(1,2,3) VDEVs by adding another VDEV to the pool. This will expand your pool size and give you better performance as well. It's still not an individual disk add, you'd need to have at least 3 disks (or more depending on RAIDZ type) to build another RAIDZ VDEV. So at least you're not locked into a Mirror VDEV only for expansion which is better on parity-cost.

    • @2GuysTek
      @2GuysTek  Před rokem +1

      To add more to this. I just heard back that the single-disk add feature is still not available, so your only way to expand an existing pool is to add additional VDEVs to it.

    • @JohnSmith-iu8cj
      @JohnSmith-iu8cj Před 9 měsíci

      @@2GuysTekor change every drive to a bigger one

  • @ZoeyR86
    @ZoeyR86 Před 14 dny +1

    I built out my truenas system with my old pc lol
    I have an amd 5950x, 128GB of ecc ddr4, and a pair of lsi 9305-16i cards hosting 28x h550 20TB drives.
    I have a 990pro 1tb for boot.
    And a pair of 118gb optane in mirror for slog.
    It's tied to the network with dual 10gbps links i use it for VM's but mostly for plex

  • @zyghom
    @zyghom Před 4 měsíci

    I just built my Trunas with 4x HDD and 2x SSD and I went with the pool of 2 mirrors (HDD) and just mirror (SSD) as I read somewhere about this that mirrors are the best. So to speed up day to day tasks I use fast SSD pool and for backup and media consumption the slow HDD pool. 64GB of RAM seems to be more than ok to support 28TB of my total storage

    • @BoraHorzaGobuchul
      @BoraHorzaGobuchul Před 3 měsíci

      The only thing about mirrors is that if they are 2-disk mirrors, 2 disks failing in one mirror vdev will kill the whole pool.

  • @Burnman83
    @Burnman83 Před 11 měsíci +4

    Hm, watching this video there are actually more questions afterwards than before =)
    1. When your initial tests already almost capped out on the full 10g speed, how did the host expect the speeds to improve through caching? =)
    2. Why does the iX systems sales engineer explain SLOG wrong and enforce the common misunderstanding that SLOG will act as a write cache for sync writes?!
    3. The system I am running Scale on is an old Dell R730 with 512GB of RAM, lots of different variations of pools of HDDs and SSDs/NVME. The network speed is 25g. How come that through Samba I barely ever get close to the numbers 2GuysTek get in their tests here over SMB even if I tests against a pool of very potent NVME disks only and as said, 8 times as much RAM (let alone much more CPU power). Have these tests been conducted with actual real life testing, or just with some synthetic test tools that don't tell anything about the real-world performance anyway?
    It'd be great to have another video where the impact of a metadata pool would be tested and also some SMB tuning that enables you to actually use these kinds of speeds in any real-world setups, rather than capping out at around 500-700Mb/s all the time due to the flaws of SMB.
    All the best! =)

    • @charliebrown1947
      @charliebrown1947 Před 10 měsíci +1

      You'll understand if you spend the time to research and think logically instead of acting like you know everything.

    • @Burnman83
      @Burnman83 Před 10 měsíci +4

      @@charliebrown1947 That is funny to read, considering there are literally flow diagrams in the official documentation proofing me right and as said, tested it in lab and surprise, official documentation is correct.
      Explain to me one thing: Who is the one of us that is acting like he knows everything? The guy that actually tested all this after reading the official documentation, explaining test serious that were tried and proceed his point that you can easily replicate, or the guy that managed to write 6 comments or so without any info exceeding "you better educate yourself".

    • @charliebrown1947
      @charliebrown1947 Před 10 měsíci +1

      @Burnman83 okay boss! You're so smart!

    • @Burnman83
      @Burnman83 Před 10 měsíci +4

      @@charliebrown1947 Thanks, champ. Appreciate it.

    • @camaycama7479
      @camaycama7479 Před měsícem

      ​@@charliebrown1947can you stop trolling this guy? I'm interested by what he's talking but you always stop him with irrelevant bashing.

  • @FinlayDaG33k
    @FinlayDaG33k Před měsícem

    I threw 2 old NVMe SSDs into my server as read cache but performance didn't really go up in a noticeable way.
    Now I know why, wrong workloads (lots of small reads but very few ones bigger than my RAM).

  • @wizpig64
    @wizpig64 Před 27 dny

    would have been nice to put a single mirrored pair up against the 4x pairs to see how well they scale

  • @andrewr7820
    @andrewr7820 Před 6 měsíci

    It seems to me that if you're trying to compare purely the relative raw performance of the different layouts, that you would want to run the benchmark program LOCALLY on the NAS. Something like 'iozone' would be one such choice. Measuring the over-the-network performance would be a separate analysis.

  • @David_Quinn_Photography
    @David_Quinn_Photography Před 10 měsíci

    I guess caring about performance is important for work flow, as a home user I have been running a 2 HDD mirror and no cache for a few years now and I don't it to be a big deal, but yet again my largest file is 15MB

    • @2GuysTek
      @2GuysTek  Před 10 měsíci

      That's fair - It's all about use case and what you're looking to get out of your gear.

  • @biohazrd
    @biohazrd Před rokem +9

    The Log VDEV doesn't exactly function as you describe here. Chris touches on this a bit, but there's an extra little caveat to how the Log VDEV works:
    Async writes are cached in RAM and not written directly to disk by default. But sync must be committed to disk before being considered complete. A power failure or crash will result in the RAM write cache being lost. That's generally fine for async writes, but for sync writes (like Chris says, something like a database or virtual machine, etc.), lost writes could really screw things up and corrupt your data. The Log VDEV fills this gap by providing non volatile storage of the write intent log. Without it, sync writes have to wait on the spinning disks

    • @2GuysTek
      @2GuysTek  Před rokem +2

      Fair point. It would also be fair to say that if you're using ZFS, it would be smart to have a battery backup to protect against that situation regardless though. It's my understanding that even with a Log device, ZFS is going to use the RAM _first_ which exposes you to the same issue. Would you agree?

    • @biohazrd
      @biohazrd Před rokem +8

      @@2GuysTek It helps to understand the exact steps the OS takes for sync writes:
      1) An application of some kind requests to write data synchronously to ZFS
      2) The data sent by the application is stored in RAM to be written to disk in a transaction
      3) The data sent by the application is written to the *intent log* which exists on non-volatile storage (regular disks, not RAM)
      4) The application is informed that the data it has requested to be written has been successfully saved
      5) The transaction of writes in RAM is successfully written to the storage pool (this is the thing that "happens every 5 seconds" but not really that Chris talks about)
      6) Now that the data is safely in the pool, the copy of the data we made in step 3 is deleted from the intent log
      So yes, RAM first. The only time the intent log is actually read from is if a power loss or crash happens sometime after step 4 and before step 5 is finished.
      It's important to remember, the intent log ALWAYS exists. When you set up a Log VDEV you're just telling ZFS specifically where to put that data. Without a Log VDEV, it just lives on your regular storage VDEVs.
      A few things to keep in mind when choosing disks for your Log VDEV:
      1) You don't need a ton of storage. You're committing your writes to your disk every few seconds. So your log really only needs enough storage to hold a few seconds worth of data.
      2) You really don't need a ton of storage. Even ignoring #1, remember we cache in RAM first. You can't cache more than that. A Log VDEV that is larger than your RAM is wasted space.
      2) Most SSDs cache their writes in their own RAM inside of the SSD itself. It is possible for a power failure at the exact moment where ZFS thinks the data is safe but the data is only in the RAM cache of the SSD. Always use enterprise SSDs that have *power loss protection* for your Log VDEV.
      It's too bad Intel killed off Optane, because it is the ideal log drive. They're incredibly low latency and a lot of them write directly to the flash cells (no SSD RAM). In fact, most of it is super marked down right now if you want to pick some up for later use.

    • @miroslavstevic2036
      @miroslavstevic2036 Před rokem +1

      @@biohazrd Good points, but I would like to further clarify about the size of Log VDEV in case of flash based media. Although it seems to be a waste using large SSDs for Log VDEV, that is not necessarily true in all cases. In a heavy write environments, it's wise to use larger SSDs or drives from a different series/manufacturers. Flash wear out (TBW) can kill smaller drives from the same series both fast and theoretically near simultaneously. So don't save on Log drive capacity.

    • @biohazrd
      @biohazrd Před rokem +1

      @@miroslavstevic2036 Yeah, good point. I didn't think about over provisioning but you can really stretch your endurance with a larger SSD in that scenario.

    • @DrDingus
      @DrDingus Před rokem

      @@biohazrd Also, smaller drives these days are not very good in terms of gb/$. It's like, do I pay $40 for 512GB or $50 for 1TB? I might as well just get double the TBW for $10 more.

  • @Saturn2888
    @Saturn2888 Před 6 měsíci

    I found that mirrors are way slower than dRAID with multiple vdevs. I have a 40 mirror SSD pool that runs at less than half the speed of my 60 HDD 4x dRAID pool in another enclosure with 3 SAS expanders going into one SAS controller maxing out at 4.8GB/s after overhead. One helper for my mirrors was going from 128K to 1M recordsize. But mirrors were still slower! I also disabled ZFS cache when running tests. dRAID is incredible with as little as 2 redundancy groups.

  • @praecorloth
    @praecorloth Před 2 měsíci +1

    10:44 "mostly it's coming down to the performance you're after for your workload."
    Ttttthhhhhhhhaaaaaannnnnnkkkkkkk you! I have been shouting my damn lungs out for nearly a decade now. Cache in memory is meant to act as a way to not have to reach down into the disk. This was really irksome when people were like, "You need 1GB of memory per 1TB of total storage on your ZFS pool!" No. You don't. That's dumb. Closer to the truth would be you need 1GB of memory per 1TB of DATA in your pool. Because it takes exactly 0MB of memory to track empty storage. A better metric would be, you need more memory if your ZFS ARC hit ratio drops below about 70% regularly, and the performance hit is starting to irritate you.
    One thing I don't like is how people, even people in the know, talk about the SLOG. It's not a write cache, it's a secondary ZFS intent log. The ZFS intent log (ZIL) exists in every pool, typically on each data disk. It's essentially the journal in every other journaling file system. Before you write data to the disk, you write that you're starting an operation, then you write the data, then you tell the journal that you've written the data. ZFS does the same thing, though it actually writes the data to the ZIL as well.
    When people talk about caching writes, they're usually thinking about something like battery backed storage on RAID controllers. ZFS will never do this exact thing. When you add a SLOG to a pool, you're ZIL is basically moving over there. This takes IO pressure off of your spinning rust data drives, and puts it on another drive. To that end, if you have a pool of spinning rust drives, and you add another spinning rust drive as the SLOG, you will see write performance increase, just not the massive increases that you might expect from a typical write cache. Your spinning rust data drives will absolutely thank you in the long run.

  • @bertnijhof5413
    @bertnijhof5413 Před rokem +1

    My poor man's usage of OpenZFS 2.1.5 runs on a minimal install of Ubuntu 22.04 LTS on a Ryzen 3 2200G; 16GB; 512GB nvme; 128GB SSD; 2TB HDD.
    All my application run in say 6 VMs of which Xubuntu with the communication stuff is loaded always (Email; Whatsie etc). I have 3 datapools:
    - one with the 11 most used VMs on the nvme-SSD (3400/2300MB/s), running with primarycache=metadata. Boot times of e.g Xubuntu are with caching ~6,5 seconds, while without caching it takes ~8 seconds. I wait ~1.5 seconds more, if I can save say 3GB of memory.
    - one with 60 more VMs on the first faster partition of the HDD. Here I have 2 levels of caching

    • @DrDingus
      @DrDingus Před rokem

      How is the wear on those SSD and NVMe drives?

  • @AnnatarTheMaia
    @AnnatarTheMaia Před 2 měsíci

    ...And you'd lose that bet. ZFS in production one week after it came out in Solaris 10 6/06 (u2) here.

  • @terry5008
    @terry5008 Před 9 měsíci

    What about metadata??????

  • @vikasv9687
    @vikasv9687 Před 11 měsíci

    Spread sheet please.

  • @knomad666
    @knomad666 Před 10 měsíci

    Chris did an excellent job of explaining how ZFS works. Rich, great job getting him on a call for all of us to hear a proper and thorough explanation from an expert! excellent video. At 10 mins and 20 seconds in, masterful answer! TrueNAS is a fantastic storage product.

    • @StephenDeTomasi
      @StephenDeTomasi Před 10 měsíci +1

      Actually, he really didn't. Didn't explain caching correctly. I expected better

  • @blahx9
    @blahx9 Před 3 měsíci

    Are the results similar because they are hitting the l1 arc (ram)? edit: hah should have kept watching!

  • @Catge
    @Catge Před 6 měsíci

    Chris was great

  • @KC-rd3gw
    @KC-rd3gw Před 10 měsíci

    There's also draid now which is an abstraction of raidz with hot spares. I use it for my VMS and lxcs since resilvering time is much much faster and the performance is adequate for my uses. Hot spare capacity is distributed throughout the vdev so on a resilver data is read from all drives and written to all drives simultaneously instead of all drives mobbing the hot spare being resilvered.

    • @2GuysTek
      @2GuysTek  Před 10 měsíci

      My understanding is that dRAID isn't available until the future version of SCALE comes out. We'll certainly be evaluating the new RAID type when it comes to production!

  • @JasonsLabVideos
    @JasonsLabVideos Před rokem +1

    ME FIRST !!!! Watching > Coffee In Hand !

  • @kalef1234
    @kalef1234 Před 6 měsíci

    I just got four 4TB wd red SSDs imma just run Raidz1 i guess should be fine

  • @mspencerl87
    @mspencerl87 Před 9 měsíci

    Before watching the video I'm going to go ahead and say mirrored VDEVS
    Then come back and see if I'm right 😂

  • @InSaiyan-Shinobi
    @InSaiyan-Shinobi Před rokem

    basically i cant do anything lol i only have 3 14tb hdds hardrives and like 5 ssds that are 1 tb i dont what i should do still lol

  • @briceperdue7587
    @briceperdue7587 Před 6 měsíci

    You can have more than 1 L2ARC per pool

  • @philippemiller4740
    @philippemiller4740 Před rokem

    No you cannot combine different VDEV types in a single pool tho

    • @exscape
      @exscape Před 9 měsíci

      Sure you can! Try this:
      cd; mkdir zfstest; truncate -s 100M disk{1..5}; zpool create mixedpool raidz /root/zfstest/disk{1..3} mirror /root/zfstest/disk{4,5}
      It works just fine.

    • @philippemiller4740
      @philippemiller4740 Před 9 měsíci

      @@exscape Thanks, can mix and should mix is very different then :P

  • @petersimmons7833
    @petersimmons7833 Před 9 měsíci

    Testing this over SMB ruined any real performance testing. It has huge unpredictable overhead. You should have used something that will not warp the test results like NFS.

  • @seannugent8141
    @seannugent8141 Před rokem +1

    Shame he doesn't understand how a SLOG works (not thats thats unusual). A SLOG is NOT a cache. Its always written to and never read from (in a steady state)

  • @blender_wiki
    @blender_wiki Před 9 měsíci

    Perfect demonstration how little a random CZcamsr understand about ZFS and TrueNAS. You just fail at the basic ZFS understanding exam.
    Next time read the doc is clear enough.
    🤷🏿‍♀️🤷🏿‍♀️🤷🏿‍♀️ 🤦🏿‍♀️🤦🏿‍♀️🤦🏿‍♀️

  • @charliebrown1947
    @charliebrown1947 Před 10 měsíci

    you cant compare performance of fresh pools.

    • @2GuysTek
      @2GuysTek  Před 10 měsíci

      I disagree, however I also understand where you're coming from in terms of ARC. I think it's completely fair to test a fresh pool's performance because that's a normal state of a NAS in it's functional life. Saying the only way to test performance is on existing, cached data, isn't correct either. Maybe a compromise is to run perf tests on fresh and warm data instead.

    • @charliebrown1947
      @charliebrown1947 Před 10 měsíci

      @2GuysTek I'm not talking about arc or cache. I'm speaking to the actual use case of a nas which is not a brand new empty pool.

  • @MarcHershey
    @MarcHershey Před měsícem

    The definition of this video almost makes me unconfortable. lol It's soo clear I can see every single beard stubble and chest hair.

  • @jonathan.sullivan
    @jonathan.sullivan Před rokem +1

    Tom from Lawrence systems did a video on it, so just you didn't know the best performance. It's ok, you've only been using truNAS for....errrr.... Years 😢

  • @dmsalomon
    @dmsalomon Před 10 měsíci +1

    Stopped watching once this dude said a SLOG is a write cache. And his explanation was even worse, totally inaccurate

  • @joetoney184
    @joetoney184 Před 9 měsíci

    What an Ad….

  • @ewenchan1239
    @ewenchan1239 Před rokem

    So....the TL;DR is if you do ALL of that (you have a server that does everything), then you're effectively "screwed" in the sense that there is NO optimal configuration for you because ANY configuration that you will deploy will be the "less-than-optimal" configuration for the different types of things that you are doing with the system that does everything.

    • @2GuysTek
      @2GuysTek  Před rokem

      No. In my opinion the TL;DR is, generally speaking, to add as much RAM as you can to your host. Adding more RAM will give you the most noticeable improvement in performance over any VDEV layout in a 'does everything' use case. I think that knowing the 'easy button' is to add more RAM and go with a RAIDZ2 for a general purpose NAS takes a lot of the confusion over what VDEV config you should use.

    • @ewenchan1239
      @ewenchan1239 Před rokem

      @@2GuysTek
      Two things:
      1) Adding more RAM isn't a vdev configuration (per the title of your video).
      2) re: using a raidz2 layout -- and whilst that might be the overall "average" layout that you can deploy for a "does everything" case, but as your own data shows (and also based on your discussion with Chris from iXSystems), different use cases have different recommend vdev layouts. But if you use one layout for a "does everything" system, then it is NOT the optimal vdev layout for the different use cases that a "does everything" will need to serve up.
      i.e. optimal is "here", and the actual deployed vdev performance is "here" -- at a less-than-optimal level of performance, for that use case.
      In other words, no single workload "wins", and they ALL lose some (level of performance), and that is the optimised solution where there are no clear winner, and everybody loses some (performance).
      (i.e. there isn't a vdev layout that's a clear winner for the different use cases, in a "does everything" system.)

    • @nadtz
      @nadtz Před 10 měsíci

      The optimal thing to do would be to have multiple pools/VDEV for your workloads. Whether someone who is using a ZFS 'do it all' server has the resources and wants or actually needs to do that is a totally different matter but the implication here seems to be that you are limited to one VDEV type on a server which isn't the case.

    • @ewenchan1239
      @ewenchan1239 Před 10 měsíci

      @@nadtz
      That will depend on the capacity requirements, how much capacity you are willing and/or are able to sacrifice for redundancy/fault protection, etc. as a function of cost.
      There is ALWAYS the perfect "scientist" solution where if money wasn't an object, you can deploy the theorectically perfect solution.
      But if you had a budget of only $1000, but there aren't any changes to the statment of requirements, what you're going to end up deploying, based on that fixed, finite budget amount, will be very different than your theorectical ideal solution.

    • @nadtz
      @nadtz Před 10 měsíci

      @@ewenchan1239 Obviously. The point is you said
      "...you're effectively "screwed" in the sense that there is NO optimal configuration..."
      and this is not true, 'scientist' or not. I clearly stated that having the resources, wanting or needing to deploy the optimal configuration is different from the fact that it is possible so you are just reiterating what I already said.

  • @mmobini1803
    @mmobini1803 Před 4 měsíci

    Great video, thank you!