Our data is GONE... Again - Petabyte Project Recovery Part 1

Sdílet
Vložit
  • čas přidán 28. 01. 2022
  • Configure your own workstation at lambdalabs.com/linus
    Check out Hetzner Cloud and use code LTT22 for $20 off at linustechtips.hetzner.com/en/c...
    It's been a long time since we've had any serious data loss, but on this episode, we're discussing a software misconfiguration that has resulted in us losing an unknown amount of data on our petabyte project storage clusters.
    Discuss on the forum: linustechtips.com/topic/14077...
    Check out 45Drives at the links below
    Website: lmg.gg/eGo2K
    CZcams: lmg.gg/6ModQ
    Buy Seagate 20TB Exos Drives
    On Amazon: geni.us/WFfs
    On Newegg: geni.us/XhkNI
    Purchases made through some store links may provide some compensation to Linus Media Group.
    ► GET MERCH: lttstore.com
    ► AFFILIATES, SPONSORS & REFERRALS: lmg.gg/sponsors
    ► PODCAST GEAR: lmg.gg/podcastgear
    ► SUPPORT US ON FLOATPLANE: www.floatplane.com/
    FOLLOW US ELSEWHERE
    ---------------------------------------------------
    Twitter: / linustech
    Facebook: / linustech
    Instagram: / linustech
    TikTok: / linustech
    Twitch: / linustech
    MUSIC CREDIT
    ---------------------------------------------------
    Intro: Laszlo - Supernova
    Video Link: • [Electro] - Laszlo - S...
    iTunes Download Link: itunes.apple.com/us/album/sup...
    Artist Link: / laszlomusic
    Outro: Approaching Nirvana - Sugar High
    Video Link: • Sugar High - Approachi...
    Listen on Spotify: spoti.fi/UxWkUw
    Artist Link: / approachingnirvana
    Intro animation by MBarek Abdelwassaa / mbarek_abdel
    Monitor And Keyboard by vadimmihalkevich / CC BY 4.0 geni.us/PgGWp
    Mechanical RGB Keyboard by BigBrotherECE / CC BY 4.0 geni.us/mj6pHk4
    Mouse Gamer free Model By Oscar Creativo / CC BY 4.0 geni.us/Ps3XfE
    CHAPTERS
    ---------------------------------------------------
    0:00 Intro
  • Věda a technologie

Komentáře • 8K

  • @The_Keeper
    @The_Keeper Před 2 lety +11761

    Linus: "Right, *Now* we won't ever lose data again!
    Data storage: "How many time do we have to teach you this lesson, old man?"

  • @loadnabox1943
    @loadnabox1943 Před 2 lety +2142

    Linus, I have over a decade of experience in managing multi-petabyte ZFS with five nines uptime in large ISP's. I think you may have the wrong cause of the data and it may not (MAY NOT) be as lost as you think.
    Please reach out to me

    • @JosephGamacheKD0AHS
      @JosephGamacheKD0AHS Před 2 lety +151

      Upvoting this to get it seen.

    • @B0A2
      @B0A2 Před 2 lety +87

      Tweet at him

    • @Jumalten001
      @Jumalten001 Před 2 lety +17

      No you dont

    • @philb5593
      @philb5593 Před 2 lety +172

      I would recommend that you reach out to them as well.
      Linus does read a lot of comments, but CZcams isn’t a good way to get a response.

    • @lordelliott42
      @lordelliott42 Před 2 lety +109

      Email their business email address.

  • @ashleymc5599
    @ashleymc5599 Před 2 lety +927

    "We never hired a full-time IT person" was stated and I immediately had the urge to bust out the popcorn and look at IT pros in the comment section.

    • @onceuponaban
      @onceuponaban Před 2 lety +40

      To be fair many of the LMG staff do qualify as IT pros in skill, if not in formal credentials.

    • @fulcrum7082
      @fulcrum7082 Před 2 lety +91

      @@onceuponaban no. Just no.
      The constant fuckups shows that they are not

    • @applepie9806
      @applepie9806 Před 2 lety +16

      The funniest thing is the next two comments under this are from the IT pros.

    • @fulcrum7082
      @fulcrum7082 Před 2 lety +50

      @@applepie9806 been in IT for 10 years, worked as infrasturcture engineer for hospital, Technical lead for MSP supporting SMEs and finally soloutions architect for a 250m £ company, Im also a freelance consultant I love LTTs vids i normally just have it on in the background whilst im working. but they do make some big mistakes but its all part of the drama :L

    • @SirNarax
      @SirNarax Před rokem +3

      I am a bit of an IT professional myself.

  • @OnlineWerds
    @OnlineWerds Před 2 lety +132

    As a data center engineer your storage content is my favorite content. I'm terribly sorry for your issues here.

  • @markclayton8977
    @markclayton8977 Před 2 lety +3984

    The irony of a cloud storage provider sponsoring this segment is not lost on Linus. I like that.

    • @Electrex8
      @Electrex8 Před 2 lety +146

      The most amazing part is a backup provider also sponsored the first video on losing their data. Incredible timing.

    • @Time4Technology
      @Time4Technology Před 2 lety +183

      @@Electrex8 "Hi we would like to sponsor your next data loss video, can you put us on your waiting list?"

    • @legominimovieproductions
      @legominimovieproductions Před 2 lety +20

      I mean backing up a petabyte of stuff on a cloud provider is so fucking expensive, you need to pay huge amounts for bandwidth (even with 200MBps it will take forever), so its not like a realistic option

    • @ZerotheWanderer
      @ZerotheWanderer Před 2 lety +5

      @@legominimovieproductions If they built the drive and sent it to the host already loaded/backed up/ready to go, I wonder what the service would run.

    • @pyjama9556
      @pyjama9556 Před 2 lety +1

      Generously negotiated for future f***ups no doubt!!

  • @DarrynJones
    @DarrynJones Před 2 lety +1826

    "I'm the highest ranking person in the company, the highest ranking person in the IT team, and the person who decided not to hire a dedicated IT staff. There is no way to determine who's accountable here" - Linus 2022

    • @connorwilliams9285
      @connorwilliams9285 Před 2 lety +99

      Bet he still might not hire one since he 'learned his lesson'. Oh well live and learn!

    • @Dimmers
      @Dimmers Před 2 lety +16

      @Connor Williams but by that logic it means he will fix what they failed at and not for anything that may arise. If they don't have a full time or part time IT person then the same or similar issues are doomed to happen again

    • @connorwilliams9285
      @connorwilliams9285 Před 2 lety +10

      @@Dimmers that's my point, hopefully we see a video posted asking for applications soon so this doesn't happen again!

    • @KP3droflxp
      @KP3droflxp Před 2 lety +13

      @Connor Williams it would be quite dumb for them to hire an IT specialist because a good portion of their content is working on their own IT systems.

    • @RalphInRalphWorld
      @RalphInRalphWorld Před 2 lety +50

      @@KP3droflxp they need an IT specialist to schedule and perform regular preventative maintenance. Otherwise, their team will just fix things when they break like this video.

  • @ulbuilder
    @ulbuilder Před 2 lety +374

    Your backups must be tested
    So you know they work as expected
    Offline is best
    So you can rest
    When lightening strikes unexpected

  • @LabGecko
    @LabGecko Před 2 lety +231

    Tech Tips' data loss is due to one thing - quantum variability. :D
    The data was in a state of flux until someone audited, at which point it was forced to exist or not exist. Some were observed to be the latter.

    • @HilbertXVI
      @HilbertXVI Před rokem +1

      Tf are you on about?

    • @LabGecko
      @LabGecko Před rokem +48

      @@HilbertXVI if you don't like quantum jokes then I'm half-certain there is a dimension on which you didn't comment.

    • @omary5439
      @omary5439 Před rokem +26

      Schrodinger's hard drive?

    • @ThomasGroshong
      @ThomasGroshong Před rokem +1

      😂

  • @Lmpy
    @Lmpy Před 2 lety +2766

    LTT never ceases to amaze me on how professional and unprofessional they actually are at the same time.

    • @hambo76
      @hambo76 Před 2 lety +251

      You just described every corporation and Government in the world.

    • @HiddenChin
      @HiddenChin Před 2 lety +24

      Do as i say not as i do.

    • @forresthopkinsa
      @forresthopkinsa Před 2 lety +20

      Definitely. But minus the professional part.

    • @bubbaandy89
      @bubbaandy89 Před 2 lety +53

      Right? I've worked in infrastructure for years, the consumer end videos are awesome and insightful, but the server/infrastructure videos frustrate me so much sometimes...

    • @paulb4334
      @paulb4334 Před 2 lety +17

      Yet 100% entertaining which is the only metric by which to value an entertainment business ;)

  • @HAWKF305
    @HAWKF305 Před 2 lety +4807

    Linus: Hates how USB and HDMI are being named.
    Also Linus: New new new vault

    • @mjtt12
      @mjtt12 Před 2 lety +204

      If you can't beat them, join them.

    • @robertt9342
      @robertt9342 Před 2 lety +270

      Well it’s pretty clear. It’s not like it’s named new old vault.

    • @marqs37
      @marqs37 Před 2 lety +129

      @@robertt9342 Don't give him ideas.

    • @adamjurak708
      @adamjurak708 Před 2 lety +16

      @@robertt9342 that was my first thought when he said about reusing old voult. It would be new voult build from old voult. So... New Old Voult [short NOV]

    • @Pico2199
      @Pico2199 Před 2 lety +46

      At least new vault and new new vault aren't being renamed vault 2.0 and vault 2.0 + new

  • @RobertCrawfordRobert4049
    @RobertCrawfordRobert4049 Před 2 lety +15

    As soon as they switched from storage spaces I kind of saw this coming; I've got a 912tb S2D cluster that serves as storage for about 200 or so virtual machines and it's been rock solid and performance with NVME cache has been solid. One of the things I saw on Spiceworks was a warning about over engineering infrastructure.

  • @verdantia
    @verdantia Před rokem

    You and your bunch give us so much of yourselves,thank you for putting so much time and precision in all your work.

  • @shwolverine2300
    @shwolverine2300 Před 2 lety +2606

    Linus: "the way they name HDMI generations are so confusing"
    also Linus: "we move the data from the old vault to new new vault and then name the old vault new new new vault with a bit of upgrade"

  • @KoSiNeK
    @KoSiNeK Před 2 lety +815

    I don't know why, but "server issues" episodes are my favourite LTT videos. Content like this just doesn't exist anywhere else.

    • @rfitzgerald2004
      @rfitzgerald2004 Před 2 lety +47

      That's what I like too, there's only so many gaming hardware reviews I can stand to watch, they're all much of a muchness to me, but I really enjoy the infrastructure and unusual project videos the most

    • @ryanq.4799
      @ryanq.4799 Před 2 lety +35

      IMO It feels more real, and a lot like old LTT did, just overall more entertaining to watch than the usual formula

    • @noxious8
      @noxious8 Před 2 lety +14

      Me too. One of the first LTT videos i watched was the one years ago where Linus, Anthony and Jake doing stuff in the server room on the weekend

    • @UpSideDownTech
      @UpSideDownTech Před 2 lety +8

      Right?! The Whonnock Server died video is one of my favorites to watch! I have no reason why, but I just like watching it for some reason😂

    • @Mesmaroth_
      @Mesmaroth_ Před 2 lety +11

      Check out Craft Computing if you like home lab server videos. Techno Tim as well for homelab hosting tutorials.

  • @cromulence
    @cromulence Před 2 lety +19

    I’m responsible for our SANs at work and there’s something else that wasn’t touched on in this video - make sure you configure email reporting from your storage nodes! The sooner you’re notified about issues, the sooner corrective action can be taken. Additionally, if possible, keep hardware spares at each site where the hardware is, so if a drive has failed (or even if it’s in a predictive failure statue), you can swap a new drive in ASAP. Same goes for other hardware, such as controller cache batteries; these too can fail, and can do so silently, allowing the node to continue working, but with degraded performance.
    TL;DR - Keep an eye on your infrastructure and monitor it!

    • @Nickwilde7755
      @Nickwilde7755 Před 10 měsíci

      This. If they had been notified from the first drive, this most likely would've been prevented

  • @gwheeler1609
    @gwheeler1609 Před 2 lety +1

    Mate, I really appreciate the honesty of this video. Eating humble pie in order to educate your viewers shows real dedication to your mission.

  • @obedulloa6219
    @obedulloa6219 Před 2 lety +3263

    If Linus manages his data the way he manages hardware... it's no surprise the data dropped

  • @TJ-vh2ps
    @TJ-vh2ps Před 2 lety +609

    Postmortem reports like this are hugely valuable, but companies don’t usually share them. This is a great service to the community.

    • @AegisHyperon
      @AegisHyperon Před 2 lety +9

      Because companies don't let their storage get to this situation

    • @MajesticBlueFalcon
      @MajesticBlueFalcon Před 2 lety +5

      @@AegisHyperon exactly. Companies from the get go have an official IT dept. or outsources it to a competent MSP.

    • @LG1ikLx
      @LG1ikLx Před 2 lety +18

      @@MajesticBlueFalcon you would be surprised how many companies mess up. What about if the IT dept didn't do their job properly and skipped over certain things in order to save time?

    • @jonasdatlas4668
      @jonasdatlas4668 Před 2 lety +7

      @@AegisHyperon not true. I do sysadmin for small and midsize businesses, and you wouldn't believe the kinds of things I've had to take over. Usually it's either some guy who does something else at the company and thinks he knows stuff but doesn't, or the work of some usually very mediocre external company.

    • @johngangemi1361
      @johngangemi1361 Před 2 lety

      @@AegisHyperon oh yes they do.

  • @timoonitamarooni
    @timoonitamarooni Před 2 lety

    I'm so sorry you've had to deal with this! I haven't watched the full video yet (I'll get back to it) so I'm not sure if this is something mentioned but, in terms of operational controls to prevent scope creep/creepback / operational swiss-cheesing, RACI matrices are good tools used correctly, it might seem super unnecessary or tedious but in terms for defining infrastructure maintenance (or other) tasks and roles and who does what (not to be overly prescriptive or take away from a lax culture but as a documentation tool so there's no confusion as to who does backups, who does audit, who makes sure ssl certs are up to date and how often etc) it is super effective.
    That being said technical failures are ultimately unavoidable, hopefully some of that loss was transferred via insurance? Best of luck going forward y'all

  • @MerlinsBeard91
    @MerlinsBeard91 Před 2 lety +4

    As someone who works in the IT field for a small company I will be following this very closely. Anything that you guys do like this I absolutely love and try to implement it if it is appropriate for my company.

    • @CommanderRiker0
      @CommanderRiker0 Před rokem

      Most enterprise NAS already do all this for you, for example Synology.

  • @leodoz1016
    @leodoz1016 Před 2 lety +1044

    Alternate title: The LMG group MIGHT hire an actual IT person

  • @Cluesman
    @Cluesman Před 2 lety +622

    "a lot of power outages" + "transferring that much data might take months" sounds like a recipe for another video in this series.

    • @Carcinogenic2
      @Carcinogenic2 Před 2 lety +19

      Yeah, on how bad a power grid can be and how important an UPS becomes in such situations.

    • @gorkskoal9315
      @gorkskoal9315 Před 2 lety +4

      I'll hazard aguess that they keep blowing a fuse. and don't have a generator for the building, or a UPS for the servers.

    • @gorkskoal9315
      @gorkskoal9315 Před 2 lety +1

      LOL I can see the tittle: Ever try to backup a few sextibytes? or even just a few exobytes?no? well funny thing happend...
      Or "This is awkward...newcubed16 ...."
      Please tell me they have fiber to the new vault and aren't trying to do this over a normal connection.

    • @namAehT
      @namAehT Před 2 lety +15

      @@gorkskoal9315 They do have a UPS for their server room, but for a few months they didn't because their UPS caught fire. Also it sounds like they never configured the servers to _safely_ shutdown when the UPS was running low, instead the UPS ran out of power and the servers got plug pulled.

    • @larrylentini5688
      @larrylentini5688 Před 2 lety +4

      Natural gas backup generators aren't very expensive relative to petabytes of hard drives, they should probably invest in one.

  • @emeraldmorningmist
    @emeraldmorningmist Před 2 lety +1

    First off, I am sorry for LTT about the data loss. Secondly, I am glad it wasn't "active" or current data but rather old CZcams videos, and those can be recovered (but only the uploaded videos and not any extra material/footage you had stored). Good luck on the project!

  • @justindacosta3d
    @justindacosta3d Před 2 lety +2

    Thanks for doing this video, I'm sure this made a LOT of people go back and check whether their home servers, or servers they support to make sure they are not vulnerable.

  • @lucasmenchone2826
    @lucasmenchone2826 Před 2 lety +709

    HR meeting with Linus: “All our data has been lost, i’m gonna fire someone…
    But not before i fire up our segway to our sponsor…”

    • @cogYo
      @cogYo Před 2 lety +5

      🤣🤣🤣🤣

    • @klaasmuller9663
      @klaasmuller9663 Před 2 lety +17

      *Segue

    • @DailyCorvid
      @DailyCorvid Před 2 lety +1

      Linus is the only person who's adverts I enjoy. Angry Joe started putting tonnes of effort in to his, but they are so forced!! I think Linus actually gets a laugh-kick out of saying LTTSTORE where it's crowbarred into something lol. I know I do, but not as much kick as the coffee in this LTTSTORE FLASK WILL HAVE.
      Linus dude, over all the years I have watched you I don't think I ever credited you properly. Well done man, this thing you've all created is really cool :)

    • @UncleKennysPlace
      @UncleKennysPlace Před 2 lety +5

      @@Avendesora Except, of course, it's totally wrong to use one of those words for the other. Unless your server room is so large that you must use a Segway to get to the sponsor.

    • @SuperNGLP
      @SuperNGLP Před 2 lety +1

      Gotta make up for that loss of money somehow.

  • @jstadler
    @jstadler Před 2 lety +356

    As a full time Sysadmin i always wondered how you guys sustained your data without a real backup plan. As it turns out now, you didn't. Really sorry to hear that guys!
    That's exactly why people like me get hired. Companies think they can do it on their own until they lose critical data to misconfigs and missing maintenance. Hurts to learn it the hard way.
    I really recommend you guys to create offline backups to tape storage for all your archived content.
    And respect for admitting having it done wrong so others can learn!
    Keep on making such great content!

    • @heavyq
      @heavyq Před 2 lety +16

      I'm not a sysadmin, just a network guy that dabbles in sysadmin stuff and yeah, it blew my mind to hear what happened here. If they open a spot to hire an IT guy I think I'm gonna apply :D

    • @TheGruwy10
      @TheGruwy10 Před 2 lety +4

      Get this dude hired, quick!

    • @GrayMatter70
      @GrayMatter70 Před 2 lety +12

      I'm not a sysadmin either, but I'm also surprised they didn't catch the offline drives earlier. Even without the regular data scrubs, basic monitoring should have caught that. As for tape backups, I agree but also advise caution that tape backups can fail too, so they need to be planned properly. I've done tape backups myself but that was a long time ago.

    • @StarFireG3
      @StarFireG3 Před 2 lety +6

      Yep. I'm doing this for 25 years now. I worked for a couple of companies with big raid systems but no backup. It's a struggle to get the responsible persons to buy sufficient backup systems. In one case only one week after installing the backup solution and having the first full backup, the main raid system failed and died. Without backup this company would have gone out of business completely. I have seen this happen to companies before.

    • @brighton_geek
      @brighton_geek Před 2 lety +2

      You would need one hell of a tape array to backup that kind of data not to mention it would take forever! I don't see tapes a practical offline backup solution for this quantity of data for a company LTT's size. It is better to off have a duplicate server in a DC with clean power and resilient backups and replicate the data, that would act as backup and be a suitable DR solution.
      Without backups I do wonder if they have a BCDR plan in place also?

  • @aquarianage3953
    @aquarianage3953 Před 2 lety +3

    Thank you for sharing, Linus. This is a sobering heads-up video for all of us who seek future dealings with our own DIY servers. Peace.👍

  • @nicksdinosforkids6001
    @nicksdinosforkids6001 Před 2 lety

    A lot to unpack and quite the cautionary tale! Thanks again.

  • @QualityDoggo
    @QualityDoggo Před 2 lety +856

    Just hearing "never hired a full time IT person" makes me go "uh oh... I don't like where this is going..." a good sysadmin who can help protect systems is a valuable part of any modern company

    • @danielgomez7236
      @danielgomez7236 Před 2 lety +154

      The world's biggest IT youtube channel, there's no IT guy

    • @darthkarl99
      @darthkarl99 Před 2 lety +179

      Classic case of responsibility creep. As Linus and others have become responsible for more stuff as the company has grown their ability to handle routine IT maintenance duties has dropped off, and because it's happened slowly over time it's never quite shown up on anyone's radars as a matter of concern.

    • @uwirl4338
      @uwirl4338 Před 2 lety +23

      Yeah, because just so you know, only other sysadmins value sysadmins. It's an extremely simple job, so the rest of us think we can do it, and we sure can until our real job prevents us. If only we could teach monkeys a couple of bash commands and have them be sysadmins for a couple dozen bananas.

    • @chrismcveigh4498
      @chrismcveigh4498 Před 2 lety +39

      As a sysadmin/sysengineer, unfortunately these guys although knowledgeable, aren’t professionals and works doesn’t always mean works properly :/

    • @Habdabi
      @Habdabi Před 2 lety +5

      That's why the sys admin job is dying out and most mid sized companies pay less to move it to cloud based systems that are more reliable (for now, until the price gets hiked)

  • @SimonPoirier
    @SimonPoirier Před 2 lety +944

    Other Pro tip: if building such a large scale storage, make sure your disks are from different manufacturing batches. Imagine the nightmare is having disks with consecutive serials wearing out and failing almost at the same time.

    • @lostintechnology1851
      @lostintechnology1851 Před 2 lety +102

      Or they could just buy a professional backup solution and get proper training operating it plus a maintenance contract. You know the way every real enterprise would do it :D

    • @entelin
      @entelin Před 2 lety +168

      @@lostintechnology1851 It's a different situation, he said this is non essential archival footage, the creation of these servers created content, the failure of it created content, and yeah, backing that stuff up would cost a lot of money... so risk/reward. The best option isn't necessarily always the right option.

    • @gamingbud926
      @gamingbud926 Před 2 lety +6

      That is... a pretty smart idea.

    • @lupsik1
      @lupsik1 Před 2 lety +4

      @@entelin Didnt watch the video yet but it sounds like something that RAID 5 would solve instantly and would cost them barely any storage with that many hard drives

    • @KingSvenDeluxe
      @KingSvenDeluxe Před 2 lety +33

      Or just never use Seagate.

  • @karenwang313
    @karenwang313 Před 2 lety

    Mad props for coming out and saying you guy screwed up. All of us can learn this and hopefully not lose any data of our own.

  • @mbgdemon
    @mbgdemon Před rokem

    These videos about your big fuckups are by far the most informational and educational videos on your channel... I have a little checklist of shit not to do when I set up a storage system, wouldn't have heard about these pitfalls anywhere else.

  • @OfficialSamuelC
    @OfficialSamuelC Před 2 lety +557

    I feel Jake holds a lot more of LTT together with his expertise than we think. Underrated!

    • @riks.1773
      @riks.1773 Před 2 lety +40

      Fact he takes the time to actually look and uncover this is enough to be praised employee of the month

    • @romanbaranovichi5375
      @romanbaranovichi5375 Před 2 lety +20

      It also helps that he's worked there from when they were getting serious about their data storage, so he knows the reasoning behind why the things are set up the way they are

    • @kstenders
      @kstenders Před 2 lety +6

      @@riks.1773 usually you set up a monitoring with alerting for checking the health state of your storages.

    • @riks.1773
      @riks.1773 Před 2 lety +2

      @@kstenders yes, but i never assumed they configured that... because other simple things i´ve seen get overlooked

    • @VanlockFR
      @VanlockFR Před 2 lety +3

      @@riks.1773 as Linus explained, it's routine checks that they should have been doing monthly, for years. AND they didn't set any email alerts so they never got notified of the failures !

  • @normandabald6501
    @normandabald6501 Před 2 lety +691

    The second most important thing to consider about backups, behind actually having them in the first place, is TESTING THEM!
    If you don't test your backups then you don't have backups.

    • @jonathanbuzzard1376
      @jonathanbuzzard1376 Před 2 lety +6

      Only if you have shit backup software. Last year I did a restore of our main HPC file system after and upgrade, everything came back. The only "testing" necessary is the occasional restore when users have done daft stuff and deleted files by accident. Then again I have a "proper" backup system in IBM Spectrum Protect (nee TSM). If you use toy backup systems (aka everything else in my view) then yeah test them regularly.

    • @zazethe6553
      @zazethe6553 Před 2 lety +16

      This is not a backup system, it's live storage.
      But you are right.

    • @johngangemi1361
      @johngangemi1361 Před 2 lety +2

      Agreed

    • @jacquesb5248
      @jacquesb5248 Před 2 lety +3

      yeah actually checking that the backups are running

    • @jonathanbuzzard1376
      @jonathanbuzzard1376 Před 2 lety +10

      @@jacquesb5248 Nope if you have to "check" that your backups are running then you are doing it wrong. This should be integrated into your monitoring system so you get told that your backup *DIDN'T* run. Checking manually is prone to someone forgetting or been on holiday or insert a thousand other reasons. Also getting told daily that you backup ran also becomes an issue where it is seen as background noise and you get bored checking the same report day in day out. Basically being notified something is as expected is the wrong way to do anything. You need to be notified that something is *NOT* as expected, in this case the backup didn't run to completion without errors.

  • @_GhostMiner
    @_GhostMiner Před 2 lety +56

    Linus being so calm while talking about one of his/their biggest oopsies is so cool 😄

  • @ericd4mation
    @ericd4mation Před 2 lety +2

    Thanks for pointing out needing to manually schedule a parity check!
    I've been using Unraid and I assumed that it would have scheduled _something_ by default. Nope. Parity hasn't been checked since I set it up in October.

  • @makingtechsense126
    @makingtechsense126 Před 2 lety +591

    Tape (LTO-9) is still an affordable option for backups. Especially for data that doesn't change. Yeah, it's old tech but it still works.

    • @mkastelovic
      @mkastelovic Před 2 lety +59

      Yep, completely agree with you, Tape library with LTO 9 tapes will be much safer. And it isn't so slow as people think. :)

    • @jspafford
      @jspafford Před 2 lety +28

      @@mkastelovic 250-300MBps. And they have worm tapes. And by using a dual drive tape robot, it makes backups completely automated. Restores too. Backing up to individual LTO drives having to load tape after tape is too much labor. Backups will never get done.

    • @unlink1649
      @unlink1649 Před 2 lety +71

      Modern tape storage has INSANE capacity. We are talking 32 petabytes per rack. ETERNUS DX600 S5 is one such system.

    • @mkastelovic
      @mkastelovic Před 2 lety +14

      @@jspafford Well, if you have the library, the backup is done automaticaly, plus in their case, we are speaking about the incremental backups, where most of the old videos doesn't change at all ;), so Backup will be done during the night.

    • @jojojojo4332
      @jojojojo4332 Před 2 lety +7

      I agree with all of you, except for one thing. Linus has expressed that he has quite a lot of data that he says isn't that important. Meaning that buying a tape robot, would be quite a expensive investment. Maybe not even worth trouble.

  • @waveformdistortion
    @waveformdistortion Před 2 lety +49

    Well if you hadn't made this video, I never would have known to check if automatic scrubbing was enabled on my storebought NAS. It wasn't. I don't believe it's ever suffered a power failure, being connected to a UPS and configured for automatic shutdown when the UPS drops below 50% battery since day one, so no automatic scrub on resume either. It's now set to automatically scrub once a month, so thanks!

    • @linusnexus9000
      @linusnexus9000 Před 2 lety +1

      Same here on a Synology box, thanks to your comment I checked and noticed it wasn't enabled either. I also activated a monthly schedule :)

  • @dronepilot-jrf-w1381
    @dronepilot-jrf-w1381 Před 2 lety

    it occured to me during this video as to how much LTT has taught me over the years. First video I watched was the first petabyte project video, and all I really understood was, that is more data than I am ever going to ever need, Ever. But now, I understood everthing he was saying without having to go look things up for my self. Didn't think that would ever happen, would still have 0 clue as to what I am doing while building my own storage server (i am the throw it at the wall and see what sticks kinda IT person) but I have learnt so much from you guys. and Thank you.
    on another note... I hope you do hire an actual IT person, just in case

  • @nicolasmorey-chaisemartin9795

    Had the exact same issue in my first RAiD5 NAS (Thecus). Managed to saved most of it at work with some spare hardware and some tuned kernel modules. Luckily it was mostly the pkex storage so nothing too important was lost.
    Raid scrubbing is the first thing I enable/check now :)

  • @Jordan_C_Wilde
    @Jordan_C_Wilde Před 2 lety +116

    "We lost a sh*tload of video data, lets make an educational video about it" - Most Linus thing ever

  • @technogamer18
    @technogamer18 Před 2 lety +917

    “This caused the array to offline itself to prevent further degradation”
    …Been there, array. Been there.

  • @ouija-board1
    @ouija-board1 Před rokem

    Love your videos just got into building and learning from you thank you taking me out of a dark place. Would love to work for you

  • @danielcobia7818
    @danielcobia7818 Před 2 lety

    I guess this highlights the need to set things up right at the beginning. Of course, unless this is your job 100% of the time, it's sometimes difficult to know all the things that will come back to bite you. Valuable lessons learned here.

  • @andrewnotmyrealname7827
    @andrewnotmyrealname7827 Před 2 lety +735

    All techs: "Follow this advice!"
    Those same techs: "YOLO"

    • @placate9051
      @placate9051 Před 2 lety +45

      Ay gotta know the rules before you break them

    • @datingzoneo798
      @datingzoneo798 Před 2 lety

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @K-----
      @K----- Před 2 lety +3

      To be fair it's more, follow this advice if X and then the same techs don't really have X. He basically said that at 9:27

    • @snowysysadmin59
      @snowysysadmin59 Před 2 lety +4

      Ok but we all know linus has said before "do as i say, not as i do"

    • @typerightseesight
      @typerightseesight Před 2 lety

      DO WORK!

  • @billhollinshead7843
    @billhollinshead7843 Před 2 lety +535

    A data *recovery* policy abides with this: "The only 'known-good backup' is one that you *have* successfully restored." 😀

    • @klaernie
      @klaernie Před 2 lety +12

      There is even the question, if the old Premiere projects are still loadable in current software versions..

    • @khatarin
      @khatarin Před 2 lety +12

      Former Data Protection Product Manager here for some 30-40k servers at my old job: Yes. :)

    • @feesh9977
      @feesh9977 Před 2 lety

      Eeplwllwlwl

    • @ProTechShow
      @ProTechShow Před 2 lety +1

      This is the way

    • @DAndyLord
      @DAndyLord Před 2 lety +7

      When discussing redundancy, one is none, two is one. That's how I discuss backup options with my clients. If it's mission critical you need a layered backup system.

  • @beaniiman
    @beaniiman Před 2 lety

    This has to be one of the best LTT videos of all time.

  • @eternalko
    @eternalko Před 2 lety +2

    A very practical advice. Store you "old" archival data (like photos) on hard drive that is not connected to power / server. Use other cloud storages all you want but just keep a one, disconnected, low tech option.

    • @klebdotio3284
      @klebdotio3284 Před 2 lety

      Suddenly I feel smart for keeping my backup drives in an antistatic bag unplugged

  • @Kblender798
    @Kblender798 Před 2 lety +125

    Please adopt LTO tape backups into your workflow! It's indispensable as a deep storage solution, especially within my field of work (film industry).

  • @gvfc
    @gvfc Před 2 lety +774

    In my first months as a sysadmin I learned a lesson: always keep a secondary backup that isn't on-premise. Power can go out, and you'll have a few bad sectors on your drives. But if there's a fire and your server goes with it, all of a sudden giving a few bucks to Jeff Bezos doesn't sound that bad of a deal after all.

    • @radical_dog
      @radical_dog Před 2 lety +62

      Yeah, not paying for cloud storage basically confirms that they wouldn't cry to sleep if they lost the whole lot. Which is a reasonable decision since it's not mission critical data.

    • @TonytheEE
      @TonytheEE Před 2 lety +27

      They had a remote server in a previous VLOG a year or two back. I wonder what's up with that?

    • @tpmeredith
      @tpmeredith Před 2 lety +10

      Heck anyone with a 5 or more user office 365 tenant can get unlimited onedrive backup. Yes it's slow to backup, yes it's full of details like 25TB sharepoint sites that you have to subdivide, but it IS unlimited for very cheap and an offsite backup.

    • @radical_dog
      @radical_dog Před 2 lety +64

      @@tpmeredith No such thing as "unlimited", it just means "we haven't written down a hard limit". 720TB would definitely be knocking on that door!

    • @Kevin-jb2pv
      @Kevin-jb2pv Před 2 lety +9

      I think they've covered this in the past, and the problem is that they just have so much at this point that the upload will take forever. But that doesn't mean you're not right. If anything, they should do it _now_ because every day they wait is going to just be more they have to upload. I'm sure there's something out there that will just start uploading everything in the background until it catches up.
      Also, IDK if it would only be "a few bucks" for the amount they need. IDK what that kind of enterprise level storage costs, but it's probably not cheap and I'll bet that even on "unlimited" cloud storage plans there's probably a catch written in the fine print with some way of restricting the storage in practice, like restricting the upload bandwidth past a certain amount of data uploaded to such a slow rate that they would never be able to upload faster than they create new data...

  • @77biologyteacher
    @77biologyteacher Před 2 lety

    You have a great future brother and I hope that you channel will progress coz you upload great and interesting tech content on CZcams

  • @mikeromero4162
    @mikeromero4162 Před 2 lety

    Congrats... you are getting pretty close to my life's book.

  • @paulbrooks4395
    @paulbrooks4395 Před 2 lety +66

    I worked for an MSP where they had fired the previous person in charge of backups. I was on the infrastructure team. We found that 65% of our customer backups were no good and something like 85-90% of offsite replication was failing. It was 8 months before we could return all the backups to normal and reduce the back checks workflow to less than a few hours per week. During the 8 months, it spend the first 4 working to get the backups all straightened out with almost every hour of my workday.
    Suffice to say, having an ops team with competent people who are organized and themselves redundant and able to check each other’s work without judgement is absolutely paramount for a team in charge of critical systems.
    I personally love working on backups because it’s a silent way to ensure continuity while working with amazing technologies.

    • @capps1994
      @capps1994 Před 2 lety +5

      As someone in IT I know the pain, one thing I go by is that you don't have a valid backup unless you have tested it. I've had some times (granted back in the day like 8 - 9 years ago) where the software would say its a good back up. god forbid that you need to restore as it will just fail. they are very fun times they are

    • @Phoen1x883
      @Phoen1x883 Před 2 lety +3

      Good worker! Providing billable service with no ongoing expenses like "maintenance" or "checking the backups".
      -most MSP management, probably

    • @aravindpallippara1577
      @aravindpallippara1577 Před 2 lety

      @@Phoen1x883 well if you can do something in company time to help the company bottom-line... You should?
      There sre things like loyalty and good will even in corporates

  • @BiffaPlaysCitiesSkylines
    @BiffaPlaysCitiesSkylines Před 2 lety +533

    Up to 80tb myself and needing more soon....! This hoarding raw footage is a nightmare 🤣

    • @Briceronie
      @Briceronie Před 2 lety +33

      hey i watch your cities skylines videos. hope your day is going well. much love

    • @TheMallaclllypse
      @TheMallaclllypse Před 2 lety +24

      Hello everybody and welcome back to the next episode of fix my NAS.

    • @StrokeMahEgo
      @StrokeMahEgo Před 2 lety +4

      Consider cloud, or tape based backups that you mail to a trusted friend or put it in a safety box at a bank.

    • @BiffaPlaysCitiesSkylines
      @BiffaPlaysCitiesSkylines Před 2 lety +10

      @@Briceronie hi, thanks 😊

    • @BiffaPlaysCitiesSkylines
      @BiffaPlaysCitiesSkylines Před 2 lety +7

      @Malaclypse The Elder yes, that'll be me soon lol 😆

  • @hiphophippi2646
    @hiphophippi2646 Před 2 lety

    Sorry for your Loss I hate losing data... Now I don't have that much information to store, but I have about 8tb of files and media and photos I have saved over my life. I use a hard drive dock that sits on my desk. and I have 4 external drives, and I back those up after every major project, or once a week. As I just write the new stuff into the folders and skip the data already saved. Trying to keep it going with clean back ups. I would cry if I lose all my music and photo's. My Projects I love going back and seeing how far I have come in those projects, and its like a portfolio. I would like to eventually build a server, but for now ill do it the ratchy way. I love your videos great stuff.

  • @emilemcgee6031
    @emilemcgee6031 Před 2 lety

    Honestly the first CZcamsr I regularly watched. And still do

  • @brodur
    @brodur Před 2 lety +331

    I am very interested to see how the recovery process goes. As someone who has only ever done disaster recovery in the realm of terabytes... yikes. Good luck friends.

    • @detingzonen7048
      @detingzonen7048 Před 2 lety

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @FireWyvern870
      @FireWyvern870 Před 2 lety +20

      Damn, these bots
      #CZcamsKilledTrustedFlagging

    • @theluigifan42
      @theluigifan42 Před 2 lety +2

      these bots out here calling youngboy "extravagant"

    • @leexgx
      @leexgx Před 2 lety +1

      What I don't und2is why isn't auto mod capturing then (when ever I post a link 90% of the time my post gets auto modded, it disappears)

    • @FireWyvern870
      @FireWyvern870 Před 2 lety +1

      @@marcogenovesi8570 both are problems. One is not higher than the other.

  • @wesrihn
    @wesrihn Před 2 lety +261

    Ahhh, the reason I originally subbed to LTT, insane server builds and configs.

    • @theairaccumulator7144
      @theairaccumulator7144 Před 2 lety +13

      Insanely bad and mismanaged server builds

    • @UrielZeptim
      @UrielZeptim Před 2 lety +5

      @@theairaccumulator7144 the point still stands

    • @anona1443
      @anona1443 Před 2 lety

      And lots of dropping expensive hardwares

  • @ahmedanssaien6449
    @ahmedanssaien6449 Před 2 lety

    Sorry to hear that, Linus.
    I learned not to get too attached to my data a long time ago, so I don't use RAID or disk mirroring or back up any of my data in any way, since most of it - if not all - I can download again, so I only store over 14TB of data for ease of access. 😅

  • @onlnagent
    @onlnagent Před 2 lety +13

    It's amazing that a company in the tech field can take such a YOLO approach to backups and still be credible to some.

    • @johnathanera5863
      @johnathanera5863 Před 2 lety

      Because its frankly unimportant for their company. Get that stick out your ass bud.

  • @GTRShaun
    @GTRShaun Před 2 lety +223

    In the takeaways at the end of the video, there was no mention of monitoring. If zfs zed was configured to email somebody/service desk on events like drive failure, this disaster could have been averted by replacing failing drives one at a time as they failed instead of accidentally finding the house of cards your enterprise is built on. Monitoring for failure should have been the most prominent takeaway.

    • @davidbubble6863
      @davidbubble6863 Před 2 lety +8

      My take away is no system is safe from hard drive failure and owner of system this big should hire someone dedicated to take care of it.

    • @yensteel
      @yensteel Před 2 lety +15

      Thought it was weird too. An email as soon as one drive fails could reduce response time. The number of drives they are handling meant the chances of 2 or more failing at the same time is pretty high.
      What about reserve drives to automatically repair when one degrades? Not foolproof but a good start. For bit rot, more frequent scrubbing?

    • @glenby2u
      @glenby2u Před 2 lety +3

      even a post power outage check or weekly job for an intern... oh well. once is a mistake, twice is a problem, thrice = low value asset.

    • @rosen9425
      @rosen9425 Před 2 lety

      My thoughts too. File it under "mistakes where made", it's the big locker you can't miss 😁

    • @NumptyMcNumptyface
      @NumptyMcNumptyface Před 2 lety +5

      Not just configure it, also test that configuration. I've worked at a place where the storage system was set up to send an email in case of pending doom. Problem was it wasn't configured correctly so the emails never reached their recipiant.
      How did they found out about the impending doom? Well, the system also gave off a sound alert as well as flashing a LED which were only noticed when I was given a tour of the server room.

  • @DangerousDac
    @DangerousDac Před 2 lety +195

    Well this "presentation" format certainly has a different energy to it than Whonnock died.

    • @philb5593
      @philb5593 Před 2 lety +31

      The vault is hardly the beating heart of the company that whonnock was, and sounds like this unfolded over the course of days and weeks as Jake found the issues and they are still working on rebuilding the data.
      The vault is just archive data. Whonnock is the in progress projects, and I think at that time Linus said there was no backup.

  • @whitey4986
    @whitey4986 Před 2 lety

    That's rough! As a sysadmin, I winced a lot. Good luck guys!

  • @andreasbrand3191
    @andreasbrand3191 Před 2 lety +1

    that is exactly the reason why I stopped building my own storage servers and got my first Synology like 10 years ago!
    Obviously I far less storage demand (I got 4TB of triple backed data and 25TB of nice to have original videos and RAW photos backed up ones). All secured via parity, auto-scrubbing, snapshot deduplication etc. I've never run into any issue and I've basically distributed more than 20 of DiskStations in my family and close friend's circle to people with far less IT know how than me... and I'm a different kind of scientist with ok-ish Hobby IT knowledge.
    There is no way on earth I can build something half reliable and convenient as purchasing a Synology or maybe QNAP and put another one up as backup at my parent's place!

  • @bencoomer2000
    @bencoomer2000 Před 2 lety +227

    You know. It's nice to see someone that handles things like an adult, admit mistakes, acknowledge that some failures aren't simple "that person screwed up", and use it to constructively fix problems

    • @beermarket9971
      @beermarket9971 Před 2 lety +8

      If he was handling this as an adult he would have hired a fulltime IT long time this is childish

    • @alias_not_needed
      @alias_not_needed Před 2 lety +7

      @@beermarket9971 Why? It is everyones own choice how important their data is. If they can live with the loss of some old footage, i see no problem in their actions...

    • @beermarket9971
      @beermarket9971 Před 2 lety

      ​@@alias_not_needed There are plenty of reasons why this is childish in my POV:
      For one you should value what belongs to you and protect them from predictable breakdown otherwise you come out as a spoiled child.
      Second, as a CEO you have to duty to protect and save your employees work, while accidents do happen when they are caused by a lack of prevention, the people in charge (or CEO) come out as childish.
      Finally, when a CEO cannot hold someone accountable for data loss (or work loss) it's ultimately his fault and he should just own it but, maybe i missed it, but it didnt quite come out like that.
      I don't want this to come out negative, i like LTT and it looks like an amazing place to work, and i admire Linus. But this is frustrating to watch...

  • @JeffGeerling
    @JeffGeerling Před 2 lety +571

    I think we all have a lot of cases of 'didn't follow our own advice' in the storage/DR world. Unless it affects your bottom line, backups and DR tend to be lower on the priority list.
    And lower on the priority list usually means either "not configured at all" or at minimum "never been tested before" :(

    • @SodaWithoutSparkles
      @SodaWithoutSparkles Před 2 lety +4

      Always test your backup and fail-safe. There is no use of having a backup but it doesnt work ar all.
      Dont just do backup, TEST your backup

    • @Deerhunter360
      @Deerhunter360 Před 2 lety +6

      @Nimki rafa 8 shut up bot

    • @lostphotographs3936
      @lostphotographs3936 Před 2 lety

      As a fellow Repair and Recovery guy in the SS world we sell hundreds of drives globally to guys in that very situation. TRUST ME !
      new new vault...... " vault 3 " ..... 😇

    • @ImAManMann
      @ImAManMann Před 2 lety +1

      I always follow my advice for backing up data because there is a simple rule... if you back up your data, you won't need the backup, if you don't back up your data you WILL need the backup.

    • @waspennator
      @waspennator Před 2 lety +1

      Backups and UPS should be essentials at this point, lost drives on my old comp cause I had the "bright" idea to use it in the middle of a bad wind storm with only a surge protector.

  • @jasonlevi7030
    @jasonlevi7030 Před 2 lety +15

    Sure sounds like a good time to make a video about how tape drive systems aren't as obsolete as many might think and maybe even get yourself a super cool tape robot!
    You could also dig into data reconstruction/recovery software to see what you can pry out of the drives you've pulled and maybe try out the old "HDD in a freezer" trick.
    There you go. Two new video ideas (that I'd love to see presented by Jake and Anthony respectively) to hopefully recoup some of the costs of this oversight.

    • @mrmotofy
      @mrmotofy Před 2 lety +1

      I used to think tape drives were old...I recently seen a tape drive with TB or something...guess I was wrong

    • @pof1857
      @pof1857 Před 2 lety +2

      @@mrmotofy LTO-9 is 18TB/tape.

  • @gueroloco8687
    @gueroloco8687 Před 2 lety

    I like Anthony, but everyone has a job to do and does a wonderful job on the videos!! Thanks so much for the honest reviews as well!!!

  • @moralapostel
    @moralapostel Před 2 lety +402

    Big mistake to immeidately replace the drives that weren't even dead, which just showed some failures. By removing them LTT removed all the (still good) parity data on those. Probably should've run a scrub first, and then remove the possibly malfunctioning drives.

    • @hallif7295
      @hallif7295 Před 2 lety +2

      Wouldn't that take a long time tho?

    • @bkrich
      @bkrich Před 2 lety +2

      Yeah I was thinking the same

    • @AyoKeito
      @AyoKeito Před 2 lety +8

      I'm pretty sure those wouldn't survive a scrub either.

    • @bkrich
      @bkrich Před 2 lety +29

      @@AyoKeito we wouldn’t know they for sure but we do know it didn’t survive the replacements

    • @dracotrapnet
      @dracotrapnet Před 2 lety +6

      If they are offlined, they are already dirty parity data.

  • @TristensMadness
    @TristensMadness Před 2 lety +322

    Please be server room related. I’ve been craving some of that content recently

    • @toxicxshotsx
      @toxicxshotsx Před 2 lety +13

      Me too man!! Also ^s/o to the milfs in the 20 mile radius comments ahah

    • @williamprimeee
      @williamprimeee Před 2 lety +2

      yeah we all wana see his server ;)

    • @gabrielrojasg.3180
      @gabrielrojasg.3180 Před 2 lety +1

      I started following Linus by server content haha

    • @SuperNGLP
      @SuperNGLP Před 2 lety +4

      You just have to wait until something goes wrong and boom new server content!
      Maybe we pay seagate to send Bad drives, so we get new content sooner?
      Sounds like a good, reasonable idea.

    • @frozenturbo8623
      @frozenturbo8623 Před 2 lety +1

      Wait until Seagate fails again in Vault 3 then we have Vault 4 until we got into Vault 76 and That marks the End of Seagate.

  • @DrTune
    @DrTune Před 2 lety

    Thankyou for your honest clowning around with this stuff

  • @arsixorus
    @arsixorus Před 2 lety

    im sharing this video with my team, this needs to be seen by all in IT

  • @captdev
    @captdev Před 2 lety +523

    As an operations engineer, the amount of red flags that the process you followed here brought up was terrifying. Please write processes for this sort of stuff and test them - it's all fun and games till you lose something essential because of a stupid decision from 5 years ago

    • @williameldridge9382
      @williameldridge9382 Před 2 lety +32

      Not to mention they used Seagate drives. They are just completely unreliable. I wouldn't trust them in any circumstance. I've hundreds of Seagate drives due to failure, but only a handful of WD/Hitachi. It isn't surprising as Seagate purchased the worst hard drive company that ever existed, Maxtor. And they didn't learn their lesson, they got even more Seagate drives.

    • @jrdemasi
      @jrdemasi Před 2 lety +13

      Why anyone trusts this guy for basically anything is beyond me. Lol.

    • @mikex4941
      @mikex4941 Před 2 lety +7

      @@williameldridge9382 Got a different experience. I'm still rocking Seagate and WD drives while all of my Hitachi drives from the same era as all my other drives died. But not sure right now though.

    • @esbekay
      @esbekay Před 2 lety +2

      seriously, its hard to watch

    • @JLeYang
      @JLeYang Před 2 lety +20

      @@williameldridge9382 Hard drive manufacturers have all had bad batches, it's just the nature of the beast now. I have had failures from all brands in usage. You should see hard drives as a consumable (especially as a storage array), run SMART and replace when health is detected as bad. The bigger issue is people not doing backups, that's a failure on you and your users to not enforce that.

  • @perrygolden
    @perrygolden Před 2 lety +169

    When your downtime and data loss is measured in lost $, hiring full time systems engineer becomes a very attractive value proposition.

    • @gabrielenitti3243
      @gabrielenitti3243 Před 2 lety +13

      i don't think any of this will produce any downtime for his company. The Petabyte worth of data he may loose as he said is just a "nice to have". It's not the actual production server where they store the current projects and videos. His employees may not even know about this data loss.

  • @brice0403
    @brice0403 Před rokem +11

    When Linus says that something is "nobody's fault" it usually means it was his fault 😂

  • @ebonhawkarmory9681
    @ebonhawkarmory9681 Před 2 lety

    Been watching since the NCIX days, Just bought a WAN hoodie, I've always wanted to support this channel. Matches my buisness colors for comicon! If you offered the Stealth in a orange / black color scheme I would totally buy a bunch for my crew!

  • @Unreasonable0ne
    @Unreasonable0ne Před 2 lety +386

    I'm just wondering why LTT didn't go for tape storage for their servers, since, as Linus said himself, it was for archival purposes and more of a fun project to test out the tech they got. They even got a tape drive some time ago afaik. It doesn't make sense to keep the drives spinning for years if they are not actively used or maintained.

    • @PanKosiu
      @PanKosiu Před 2 lety +44

      Basically this. it was the first thing I thought of. If the archive data never changes, tapes will be crazy cheap way of backing up old videos.

    • @Stasiek_Zabojca
      @Stasiek_Zabojca Před 2 lety +29

      Because they probably want to have quick access to it, I think... To cut something out of old video and things like that? As far as I know, tape storage does not give you that luxury.

    • @aoeuable
      @aoeuable Před 2 lety +27

      @@Stasiek_Zabojca You could store lower-bitrate stuff on fast storage for browsing and only get the tape out when you need access to the original files.

    • @666Tomato666
      @666Tomato666 Před 2 lety +14

      Tape storage is cost competitive on the level of multiple petabytes, not single petabytes.
      So it's nothing that any significant minority of viewers will ever see in person, let alone be part of decision making process to buy, install or configure.

    • @beid777
      @beid777 Před 2 lety +8

      Because he'd rather have "dope hardware" instead of using tape. If they need access to it that's fine, every week or month or time frame you do a fresh backup to tape and keep your servers running for access and have tapes as backup. He failed to implement backup in depth which is basically industry standard.
      Archive is not backup. Redundant and separated storage of data is backup.

  • @MOLINE7708
    @MOLINE7708 Před 2 lety +35

    Bro, hire a dedicated sys admin. You have too many employees that rely on your server infrastructure to yolo everything yourself. You mention that you, Anthony, and Jake work on it, but they also are writers. You have enough data and infrastructure to warrant a dedicated and experienced sys admin at this point

    • @peterpain6625
      @peterpain6625 Před 2 lety +9

      I wouldn't want that job. They'll go behind his/her/their back at any opportunity anyways because "it's faster that way" or "reasons". The way LTT grew the IT-Guy job is a surefire way to get PTSD now ;) No way they'll can establish any structure now.

    • @outofahat9363
      @outofahat9363 Před 2 lety +2

      @@peterpain6625 they know enough to be dangerous

    • @peterpain6625
      @peterpain6625 Před 2 lety

      @@outofahat9363 They know a lot in some areas and go full Dunning-Kruger in others ;)

  • @mattpallotta
    @mattpallotta Před 2 lety

    Ran into a similar scrub issue at the end the of the last year, took over a month to get through the rebuild and scrub.

  • @LiveNobin
    @LiveNobin Před rokem

    @LinusTechTips It feels very sad to know about the data loose. I can understand your feelings. Don't worry every thing will be fine with your new server 😊

  • @seriphim8542
    @seriphim8542 Před 2 lety +124

    At that density and the infrequency of the older data being updated you really should consider acquiring a tape library. A couple iSCSI targets and a 250 slot LTO library would keep you until you more than double your current use. But considering the increasing file sizes of the raw files you're ingesting I would recommend going for a 3-3.5X scaling.

    • @grrkaa8450
      @grrkaa8450 Před 2 lety

      A 250 slot library for what? 3 PB of direct access tape storage?

    • @killer2600
      @killer2600 Před 2 lety +5

      Tape is slow. I think the whole point of their setup is for fast access to footage new or old for editing purposes. If they were just hanging on to it for keep sake then Tape is an option but I think they keep it so they can retrieve previous footage on-demand to splice into the current video being edited.

    • @joross8
      @joross8 Před 2 lety +11

      ​@@grrkaa8450 Tape is slow, but much cheaper per TB.
      Typically you would have a hybrid system where users interacting with the data would hit high speed disk storage of some sort, and that disk storage would be running software that would migrate copies of files, or just less accessed files to tape.
      It's effectively the best of both worlds, users have the speed and accessibility of high speed storage, but the high speed pool is much smaller, and most of the archival data is on less expensive tape drives. The only time you hit a slow down is when a user has to access the stuff on tape which would be normally pulled when the user accessed a stub file representing the file on the disk pool.

    • @animefreak5757
      @animefreak5757 Před 2 lety +5

      @@killer2600 so do both? use tape as a economical backup option.

    • @MDKAOD
      @MDKAOD Před 2 lety +8

      @@grrkaa8450 Why keep the data in hot storage at all? Archive to tape (not backup) toss it in a fire safe.

  • @sasidharasarma8625
    @sasidharasarma8625 Před 2 lety +150

    Team: Our data is gone
    Linus: So we got our content for today’s video

    • @schmitt00
      @schmitt00 Před 2 lety +4

      and quite a couple more

    • @JamezMartinez
      @JamezMartinez Před 2 lety +2

      as long as they do not lose the data for that video too...

    • @ProTechShow
      @ProTechShow Před 2 lety +1

      I do like this about LMG. I've been called in to help with several incidents of a similar nature and the level of stress as people see their livelihoods on the line can be pretty extreme. The fact that LMG can just make lemonade out of it is quite refreshing (pun not intended).

  • @dctech4432
    @dctech4432 Před 2 lety +2

    Ya'll spend A LOT of money on redundancy for data, how about allocating "a reasonable amount of money" to redundant power backup strategies. Generators, solar panels, enterprise UPS w/ some SLA battery banks, or a nice LiPo/LiFe array. Buy yourself some time, with a big enough buffer for power outages. Do an energy audit of what absolutely must never loose power, and consider your options. Custom automating your alternative power sources, or even off loading your grid expenses with alt energy would pay off in MANY ways. You have a roof on that building load it up with some panels. It would make a supreme video series as well!

  • @ReverseCity777
    @ReverseCity777 Před rokem +1

    You can tell he just finished yelling at all the staff, got pissed and did this video :)

  • @Tetraknot
    @Tetraknot Před 2 lety +90

    Love your show! Just wanted to chime in here coming from an IT background supporting large companies in datacenters as well as being a content creator. Trying to maintain an accessible RAID of ever growing content only gets more difficult and expensive over time. You will eventually need a full time employee to manage your content if you go this route and at some point you will need to migrate your entire content to a new RAID when 1 petabyte isn't enough anymore and that's not going to be fun.
    The alternative cheaper and simpler solution is to archive your content to tape which will have a much higher chance of surviving the years to come as it's not on spinning platters that run 24/7. Yes, getting access to a piece of content you want to grab on short notice will be more annoying but you can always keep a smaller RAID with your completed videos and archive your raw content via tape as it's the RAW video content that really eats up the TB which is why you might want to consider archiving your raw video.

    • @pixelmaster98
      @pixelmaster98 Před 2 lety +8

      just build a giant data center that uses robots to automatically fetch & read tapes, so it's at least automated, even if it still takes half an hour. Building a data center is probably also great content for the channel ^^
      /s

    • @alextraska
      @alextraska Před 2 lety +6

      @@pixelmaster98 yea until crash override and acid burn have a hacking battle with your tape robots

    • @zicklane
      @zicklane Před 2 lety

      Ok no one asked

    • @blademan7671
      @blademan7671 Před 2 lety +3

      This response from a pro is why you would leave a job like this to pros. As this pro demonstrated, #1 is identifying and understanding the requirements. Do you really need all your old content available online, or maybe offline is good enough? Then solution to fit the needs.

    • @geoff_cline
      @geoff_cline Před 2 lety

      This could also be done with AWS Glacier

  • @viridisdraco
    @viridisdraco Před 2 lety +229

    Linus, i used to tell my loved ones "there is 2 kind of people in the world. who have a backup and who whish he had" i used to work on storage rack support and i've seen the worst of the worst, including a 24 hour straight marathon to restore a super critical one. but i've also seen a storage rack with all the capacitor blown due to a lightningh strike that fried a little unprotected datacenter.
    so... are you hiring an IT fulltime person now? :P

    • @JoeBlow-ub1us
      @JoeBlow-ub1us Před 2 lety +25

      lol this guy is like, "Where do I send my resume?"

    • @calebdevore3395
      @calebdevore3395 Před 2 lety +11

      @Telleva You deleted their data, and blamed them for not backing it up..?

    • @AlexAlex-jk2tn
      @AlexAlex-jk2tn Před 2 lety +5

      Actually there is 3 kind of people in the world. Who have a backup who wish he had and who check that it is possible to restore data from the backup. I mean that lots of companies are thinking that they have backups, but actually they haven't tried to restore data from the backup and it is possible that their "backups" is not recoverable. Just try to restore data from your backup and you might be unpleasantly surprised.

    • @ToothlessSnakeable
      @ToothlessSnakeable Před 2 lety +1

      @Telleva I have my stuff saved on icloud and Google photos

  • @glock21guy
    @glock21guy Před 2 lety +15

    I don't really think this was an issue of not having a "tech person", or "not having time" to set up. It was simply an oversight. Setting up scrubs and SMART alerts doesn't take long, and you certainly don't need a full time person sitting around waiting for trouble notices from monitoring applications.

  • @ma_er233
    @ma_er233 Před 2 lety +2

    Linus: USB naming scheme sucks!
    Also Linus: New New New New New Vault

  • @henningbutz2289
    @henningbutz2289 Před 2 lety +178

    Lets set this straigt: There are more backup options than local spinning disks and cloud storage. The cheapest way would be a LTO Tape-Library. An LTO8 Tape (12TB of uncompressed storage) is about 50-100€, thich is only a fraction of the cost of spinning disks. Also they are archival grade and can be labelled and stored on a Shelf somewhere. As their backup files dont really change you could just put a few projects on one tape and chuck it in the warehouse.

    • @thomasphillips885
      @thomasphillips885 Před 2 lety +10

      Yeah he's done a video about tape storage before

    • @7eis
      @7eis Před 2 lety +14

      This is not the logic channel

    • @bostjanko
      @bostjanko Před 2 lety +2

      You must be old :-), like me.

    • @markm4120
      @markm4120 Před 2 lety +16

      Yep, the system my team and I designed included LTO with 2 robotic libraries. Archival data doesn't belong on a hard drive.

    • @SierraLimaOscar
      @SierraLimaOscar Před 2 lety +16

      While I agree with the archive not being on spinning disks, long term storage of tapes is an issue in itself. It requires regular maintenance, climate controlled warehousing and copying every few years. I work in broadcasting and I have only seen deep archives done correctly maybe once in my career. I have quoted archival systems several times and the face customers make when they see the numbers and are then informed it does not include any recurring and on-going operational cost is always funny (not really).

  • @tkirchmann
    @tkirchmann Před 2 lety +280

    (oversimplified) Summary: The power dropped out a bunch of times and LTT dropped the ball on configuring the servers so the servers dropped a bunch of errors before dropping physical drives out of the servers resulting in the servers permanently dropping some data... I see a familiar pattern here.

    • @RippahRooJizah
      @RippahRooJizah Před 2 lety +3

      HOLD IT!
      I'm not sure what you are getting at.

    • @ZNotFound
      @ZNotFound Před 2 lety +20

      At least they get to drop a new video about it.

    • @sushimshah2896
      @sushimshah2896 Před 2 lety +5

      Would've been nice if (Mass)Drop sponsored then as well

    • @Thefreakyfreek
      @Thefreakyfreek Před 2 lety +4

      Linus drop tips

    • @fallenscsl
      @fallenscsl Před 2 lety +5

      how are we suppose to trust linus' tech tips if they keep dropping the ball :(
      But atleast they show us!

  • @taleg1
    @taleg1 Před 2 lety

    I've noticed bit rot on my 50 TB system after only a few years and with the raid hardware being older I'm not even sure if there is a recover scrub type function. I would love for the raid to check and fix things in the background, but sadly it's a money and available hardware issue and the sad fact of being self taught due to need.
    This video though just showed me that I need to dig into it a lot more and actually start to look for a better solution. Thankfully we rarely see powerloss here in the area, at worst maybe a few a year due to lighting hitting a box covering a lot of houses or construction crew accidentally dig and break an important buried wire.
    I'm now at a point where I need more space and the plan was to add another 100 TB to the system, it's already got the connections to handle it so there's no worry there, but the bot rot issue is a worry and if there's a fix, maybe I should be thinking about going another way entirely. Only question is what other option is there that can be gotten in my country?
    The sad truth here is that a lot of tech is hidden behind the only professions need this limit. It's like you can only buy certain things if you work using said thing in your work. Its completely daft, but also a tiny built understandable when it comes to 240 V systems and a few other things.
    Then there's the local prices where a type rack disk storage cost at least 4 times more than me jury rigging my own solution that does the job, maybe not perfectly, but it gets the job done and I can barely afford doing it my self. Right now I'm stuck waiting for the local online supplier to get in enough harddrives of the type I need. Issue there is that due to covid it's been on a year long estimated delivery date... ARGH

  • @adonayperezmorejon785

    goes up every boy the day to channel

  • @NoProHarrie
    @NoProHarrie Před 2 lety +408

    Moral of this story: hire a IT specialist already Linus.

    • @InventorZahran
      @InventorZahran Před 2 lety +17

      Linus: "I am the IT specialist."

    • @misham6547
      @misham6547 Před 2 lety +7

      Or a cybersecurity expert, would make for really interesting videos

    • @gorkskoal9315
      @gorkskoal9315 Před 2 lety +2

      ^^^^^^^^^^^^^^^^

    • @ticler
      @ticler Před 2 lety +5

      They can very well afford midrange EMC or Netapp storages that will be more stable and may be as performant as these toy storages.

    • @Carcinogenic2
      @Carcinogenic2 Před 2 lety +3

      @@ticler
      They can rot as bad as the 'toy' storages do. It's enough that they don't get attention. And where would the many hours of fun content about it go?

  • @Uhn_Tis_Uhn_Tis_Uhn_Tis_Baby

    Linus, “We need a full time IT person, we keep losing data”
    Linus - Doesn’t hire a full time IT person.
    Also Linus, “I’m going to build another storage server with EVEN more storage.
    IT professionals, “hey, I’ve seen this one before!”

    • @gorkskoal9315
      @gorkskoal9315 Před 2 lety

      lol well anyone really. lol. Something something something something insanity.

    • @astronemir
      @astronemir Před 2 lety

      The IT challenges are content

  • @aa664_
    @aa664_ Před 2 lety

    sorry for your loss

  • @dschwartz783
    @dschwartz783 Před 2 lety

    and now hopefully everyone understands the value of a good IT department. Much of each day is spent monitoring equipment, dealing with alerts, making sure things are up to date, etc. You gotta have someone doing that full-time if you want your stuff to stick around.

  • @jblyon2
    @jblyon2 Před 2 lety +43

    I've been through a number of mergers and acquisitions over the past 10+ years. On every single one the IT dept/employees who do IT tasks for the other entity have been running without viable backups, server monitoring, out of band management, or alerting. Most also lacked UPS units (or working UPS units), and one was even running RAID0 on a production server and couldn't figure out why it kept failing on them. It's a scary world out there.

  • @myname7021
    @myname7021 Před 2 lety +149

    10:30 and most importantly: monitor your environment! SNMP, Syslogs and even specialized monitoring agents are an easy way to monitor your environment.

    • @grrkaa8450
      @grrkaa8450 Před 2 lety +3

      PRTG has entered the chat

    • @towel2473
      @towel2473 Před 2 lety +15

      The irony is that they advertise these products in segways but don't implement them it seems.

    • @BTMikeMan
      @BTMikeMan Před 2 lety +6

      @@towel2473 I was going to say, did they not have Pulseway deployed :)

    • @szt1980
      @szt1980 Před 2 lety +1

      Rather messages from SMART and HBA utilities.

  • @nevertakeadayoff
    @nevertakeadayoff Před rokem

    i will refrain from saying anything negative because i appreciate your honesty.

  • @Consequator
    @Consequator Před 2 lety

    Please document all that recovery effort! It'd make for an interesting video for sure.