Why Japan's Moon Lander Crashed Due to An Unbelievable Computer Bug

Scott Manley

zhlédnutí 902 577

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 05. 2023
The investigation into why the Hakuto-R lander crashed into the moon last month after an otherwise perfect mission has revealed the answer: The software encountered unexpected terrain and didn't believe what the sensors were showing, so it started ignoring them.
iSpace Report
ispace-inc.com/news-en/?p=4691
LROC images of the crash site:
lroc.sese.asu.edu/posts/1302
Follow me on Twitter for more updates:
/ djsnm
I have a discord server where I regularly turn up:
/ discord
If you really like what I do you can support me directly through Patreon
/ scottmanley
Věda a technologie

Komentáře • 3,1K

@mcarpenter2917 Před rokem ⁺³²¹⁵
That's what happens when you keep changing the software spec's of a project. It's a bit hard to believe that they changed the landing site without rerunning the simulations.
@MarlinMay Před rokem ⁺¹³³
This! This all day.
@pjotrtje0NL Před rokem ⁺¹²⁰
Your first remark is very true, and not just in an aerospace environment!
@Powertampa Před rokem ⁺²¹⁸
That's like releasing software without doing unit tests just right after the remote guy pushed ten thousand lines of code
@ailivac Před rokem ⁺²²⁸
I feel like something in the sensor processing design isn't fundamentally robust enough if it can be this easily confused by real terrain features. Maybe they can add a second radar or lidar sensor for dissimilar redundancy or to differentiate unexpected yet real inputs from sensor faults.
We all know what happens when you run a safety-critical algorithm on a single AoA sensor...
@Ni999 Před rokem ⁺⁵⁸
Exactly! Mission creep eats in to the project time line and system tests degrade into delta testing for success instead of system testing for non-failure.
@kjgoebel7098 Před rokem ⁺⁹⁰²
I'd like to see an episode of "Things KSP Doesn't Teach" about instrumentation. How air/spacecraft instruments work, their limitations and quirks, and how they can fail.
@JohnWilliamNowak Před rokem ⁺³⁹
I'll second that.
The Soviets had a number of uncrewed vehicle losses because they used ionic sensors to determine the orientation of the vehicle, which would fail on occasion. On the other hand, the gyroscopes aboard Apollo 13 held true despite being pushed well outside their comfort zone. Some sort of video about orientation sensors would be very enlightening.
@BabyMakR Před rokem ⁺⁴
Yes please!!! We need more of those videos please Scott.
@ferdievanschalkwyk1669 Před rokem ⁺³
Another vote. I see it in formula racing where drivers are having to "fail" various sensors to address issues with the power train.
@eekee6034 Před rokem ⁺²
Yep, me too. I think I'm aware of the issues already, but I'd like to know how different real sensors would be.
@Spacedog49 Před rokem ⁺⁶
@@ferdievanschalkwyk1669 As a former Formula 500 driver, the fastest lap times were NOT the shortest distance lines around the track. A computer simulation takes the shortest distance, while the faster drivers took a slightly longer, but faster path that defied logic.
@miroslavhoudek7085 Před rokem ⁺¹¹¹⁹
In my personal experience, people insufficiently care about aerospace software. I worked in a software company that worked for ESA and we were always pretty much ignored (e.g. in all presentations of our local space agency). But when some other company made a screw for a satellite, it was plastered all over their presentations. There were literal delegations going to take a look at the space-screw-producing machine. Such an interesting visit, you see, to a hall with machining equipment, clean rooms - that's the "space stuff" in minds of people. Something you can touch and see. How do you brag about a company with people sitting at their PCs? Nobody cares. Even if these are the guys whose work ultimately decides whether these magical screws end-up doing something or are splattered over the moon.
I don't care about the publicity but it's the mindset. Everyone focuses on the aluminum this and titanium that - and software is always the afterthought. We can change that anytime. We can even send an update to space ... so, why should we think about too hard? Bam!
@deang5622 Před rokem
Good point.
And I think it is because it takes a higher level of intelligence and technical knowledge to understand software systems and the media and others can't understand it.
You only have to look at any news article published by the main stream media, on television, in newspapers and you will see the errors the journalists make, the incorrect use of terminology, the lack of detail and you walk away realising the news article has told you almost nothing.
@jayasuriyas2604 Před rokem ⁺²⁰
oof
@windywaz Před rokem ⁺¹³⁷
Boom! As a retired architect for space sensor payloads, I can say you are spot on. I watched management spend all sorts of money on convenience tooling but if SW wanted licenses for software production and testing tools, oh God, you got run through the gauntlet.
So how many times must a company learn these lessons? Simple, once per program.
@Mernom Před rokem ⁺⁵⁵
It's the same attitude all over the place. Games no longer ship out as completed projects... 'we can just patch it later'.
Mamy other fields also do shit like this.
@B_dev Před rokem ⁺¹⁹
Software in general too
@rhymereason3449 Před rokem ⁺⁴⁰⁰
It fascinates me that as you look at the history of disasters how many of them are ultimately caused by cutting corners to meet time pressures or budget targets. In this case you have to wonder (A) why the target zone was changed late in the game, and (B) why simulations with the new target zone weren't run. I would bet a dollar that engineers thought of it, but they were over-ruled because of time pressures or a budget target.
@aarondavis8943 Před rokem ⁺⁷
Your question (A) is a great one.
It could be that the new landing site could be reached with less expenditure of propellant or something like that. They thought it was a lower margin of error. Or was it the opposite? Was there a "better" more ambitious site with more interesting geography?
@rhymereason3449 Před rokem ⁺²
@@aarondavis8943 It is interesting to speculate. IMHO "less expenditure of propellant" would fit into theory about disasters and cutting corners to meet budget targets. On the "better geography" thought... unless an asteroid suddenly impacted an area close to their original site, one would think that the geography question would have been settled long ago... the lunar surface is pretty well documented (at least the front side).
@pierQRzt180 Před rokem ⁺⁷
Proverbs have a sort of statistical truth. "Haste makes waste" exist exactly due to that.
The sad part is that seemingly we keep doing the same mistakes
@rhymereason3449 Před rokem ⁺³
@@pierQRzt180 Yes it is sad to think about all the people who have lost their lives due to decisions on someone's part to save a few bucks by cutting corners. One of the latest examples appears to be that partial collapse of the apartment building in Davenport. Looking like the owner went with a cheaper contractor who would forego shoring up the building before proceeding.
@Beregorn88 Před rokem ⁺⁷
And C) why there weren't redundant sistems with majority check before deciding to discard the most vital part of your data...
@maurice_walker Před rokem ⁺⁴⁸¹
In their official debriefing, ispace actually admitted that it's primarily a (project / program) management issue, not an engineering issue. That gives me hope that they might actually learn something from this.
@rspawn Před rokem ⁺²²
most underrated comment
@curtislowe4577 Před rokem ⁺²²
Life imitates art: a common problem in the Dilbert comic results in utter failure.
@philkarn1761 Před rokem ⁺⁵⁴
It's almost *ALWAYS* a project/program management issue, not an engineering issue. This was also true for Mars Polar Lander and for Mars Climate Orbiter (the one that famously mixed up imperial and metric units).
@tomhenry897 Před rokem ⁺²
Don’t bet on it
@SayAhh Před rokem ⁺¹⁴
@@Josh_728 Get with the program: in 2023, we measure things in bananas
@ReverendTed Před rokem ⁺⁵⁶³
It continues to amaze me that we managed to safely land astronauts on the moon AND have them take off from the lunar surface and return home, several times. Obviously, having actual humans present makes a ton of difference, but the number of things that could have gone wrong but didn't is mind-boggling.
@MarlinMay Před rokem ⁺²³¹
The brain is a wonderful flight computer.
Lander: I'm going to land here.
Human: Dummy, there's a rock the size of a McMansion there! Gimme manual control.
@a4d9 Před rokem ⁺¹⁶⁹
The first moon landing was saved by the astronauts: the automation on the lander was going to put it down in a field of big boulders.
@unflexian Před rokem ⁺⁹²
think about it like this: humans have managed to control powered airplanes since the start of the 20th century, while autonomous aircraft have only just appeared in the last decade or two. humans are just that versatile
@raifikarj6698 Před rokem ⁺³¹
@@MarlinMay I am howling, when I pictured this in my head with astronaut Slapping their computer and called it dumb.
@technocracy90 Před rokem ⁺¹¹⁰
One of the NASA research reports justified the cost and risk to send human astronauts to the Moon with the allegory says "Human brain is the most lightweight and easy-to-aqcuire real-time non-linear computer"
@robertbarron7660 Před rokem ⁺⁹⁹
It's very interesting that this is almost the exact reverse of the famous 1201 alarm on Apollo 11.
In that case the computer restarted and generated errors on the astronauts control panels. But because they knew that they were at the right altitude per the flight plan they had confidence that they were still flying correctly and Neil Armstrong brought the lander down safely.
@warrenpierce5542 Před rokem ⁺⁸
Source of 1202 and 1201 alarms was traced to the rendezvous docking radar, used for rejoining the command/service module was inadvertently left on, at the same time the radar for landing, the only one needed for the decent phase was running. This overwhelmed the lunar module computer, but mission control knew it was still safe to land because of one man at Huston.
@robertbarron7660 Před rokem ⁺⁵
@@warrenpierce5542 yes, when you go into the details then these are different cases. But in the abstract, in both cases the computer was confused because it got signals which were unexpected and didn't handle them well. In Apollo's case, the human was able to use additional information to recognize that the problem wasn't severe and in this case - there was no human.
@larrybud Před rokem ⁺¹
@@warrenpierce5542 In Mike Collins' excellent book, he mentioned the 1201 and 1202 weren't exactly "well known" issues. Took a bit of "looking up" (quickly, albeit)
@richardmogie9675 Před rokem
That second antennae wasn:t inadvertently left on. I saw Buzz sheepishly confess, the engineers didn’t think the same way he did in an interview.
@purnachandran87 Před 10 měsíci ⁺²
Just realized that manned missions are technologically easier (skill of pilot) than unmanned soft landings that are possible now due to the progress of software systems.
@subhakantagmail Před 9 měsíci ⁺⁶
Finally the software bug is fixed and the Vikram lander from Chandrayaan-3 landed safely on lunar surface by ISRO. Hope most of the space agencies share data among themselves so that space progress is accelerated faster, instead of each one reinventing the wheel. Knowledge for Humanity...👍
@aparnagadde6542 Před 9 měsíci
Not software bug..... Many thing will be going during landing ...
@henrikibjensen3869 Před 9 měsíci
Sorry, Humanity doesnt land on the Moon, nations do - or dont.
@_Mentat Před rokem ⁺⁶²¹
My experience of being a software engineer is that the code has to be tested every time. It's amazing how often things that can't go wrong do go wrong.
@hanskloss7726 Před rokem ⁺⁶
It is not a sw that changed but the parameters of the flight. You may of course argue that the sw was made for the particular landing zone which I do not buy.
I may be mistaken as the video is the only source of my knowledge of the situation - sort of like this radar was. So you take a peek at the surface with radar and see this crater with it or rather a human having visual would have seen the crater - the landing module saw just a point on the surface which was 3km higher then the previous point it peeked at. I suspect what they would have needed to do is to have more points that radar is measuring especially from distance and make an average out of it or use some other technique to see where one is. When much lower this would also needed to be done to see if there is no big stone occupying part of the landing zone. I suppose this last thing was eliminated by assumption that the landing is going to be done on the flat empty surface by choice of the mission control. I suspect if they were landing on the water/liquid surface this radar error could only occur due to a massive tsunami - well no water surface and no tsunamis but hard landing.
Interesting to know all this tho, aint it?
@simonmultiverse6349 Před rokem ⁺²⁷
Been there! Written lots of software... made some unbelievable bone-headed mistakes, which are all *BLINDINGLY OBVIOUS* in retrospect. "This change is SOOOOOOOOOO OBVIOUS that we don't need to test it" ... ha ha ha... this is when reality bites you on the backside, informing you that you definitely *DO* need to test it again.
@simonmultiverse6349 Před rokem ⁺⁴
@@hanskloss7726 HA! Then you discover it's high tide instead of low tide... maybe you simulated it with mean sea level but a mile away someone opened the sluice gates and there was a large wave from the reservoir... etc.
@roguedrones Před rokem
This moon lander crash is an example of space sabotage. Deliberate.
@hanskloss7726 Před rokem ⁺¹
@@simonmultiverse6349 low tide v. high tide does not cut it here - the surface is mostly flat still at least from a 5km perspective. The crater is a different story so you need to have many points possibly also a map? Not sure what is easier here but their method obviously failed.
We know this is not a shame - we all have been there....
@martinmacphee3262 Před rokem ⁺⁵³⁵
Scott - great video as usual - thank you!
But really this is not a software 'bug' is it. It's a systemic design and control failure. The software was designed to work as it did, but the specifications do not seem to have included passing over a crater like this. In other words the initial flight plan was intended to avoid this situation, and the software was designed to work within that flight plan.
The first error was changing the flight plan without checking if the software could still function with the new one. The second error was not testing the software under the revised conditions it would have to work in.
Both errors are symptomatic of inadequate control over change management.
In other words, the flaw did not lie in the programming, but the organization's approach to change management.
@anotheruser676 Před rokem ⁺⁵¹
...and perhaps a Third error of the program disregarding the radar altimeter instead of querying it again. 'Say what? That result is outside of parameters. Please take your reading again'
@LezamaDamian Před rokem ⁺⁵⁷
I agree this probably shouldn't be called a bug. Requirements were not properly validated, so it's a failure in their systems engineering process.
@nosuchanimal6947 Před rokem ⁺¹³
came here to say that!
also, even if the result lateron would be inside parameters again: the device has already been proven to be unreliable. it might be an intermittent error, or it might be a bias that only on this occasion was noticed but existed all the time. revalidating system reliability would be a tough cookie to crack on its own if it didn't come with a redundant 2nd and 3rd system, though it should have notified ground control and gotten an update/patch. to my understanding that is how generally system failures are resolved. i don't know if their mission profile put an artificial time delay on that to prepare for longer ranged versions, or what happened.
@TheSheepwall Před rokem ⁺²²
Haven't read the report so might be wrong, but if they use something like a Kalman filter, it is likely that they are not simply not querying the sensor, but that the calculated variance to associate to the sensor readings spiked. In that case, the sensor would still be queried, but is _effectively_ disregarded since the resulting effect on output would be so low (due to the change in the assigned variance). Someone can correct me if I am wrong here.
@sciencecompliance235 Před rokem ⁺³⁵
There's also the design of the spacecraft that has to be called into question, specifically the AD&C architecture. Relying on a single altimeter means that you can't verify the data with a redundant sensor. Since accelerometers and gyroscopes can't really capture things like topography from orbit, it's like flying with one eye. I don't know how much mass, power, and space another altimeter would have taken up, but perhaps a redundant altitude sensor, possibly one with a lower resolution and/or sample rate, could have been used to verify the data coming from the primary one.
@sharizabel2582 Před rokem ⁺³⁴
I flew fighters for over 20 years. The Kalman filter was the bain of the navigation and bombing solution. It would actually discount most of the updates I would insert. It thought it knew more than I did … it didn’t.
@peterweston1356 Před rokem ⁺¹²
Makes the Apollo landings even more amazing. Considering the precision of sensors and computational resources, both to simulates and support landing.
@ksbs2036 Před rokem ⁺⁴⁵⁶
About 30 years ago I had a single page photocopied from Computer World or some such industrial publication taped to the outside of my cubicle. On that page was listed the ten most expensive software defects (bugs). I was astounded when the most expensive defects caused hundreds of millions of dollars of loss. When you read the list the top five defects (again, multi million losses) you found out that they were all losses of spacecraft and/or their payload. Flight software is tremendously complex and a single error will cost you your whole vehicle and years of effort. Now that page would have to be scaled to near billions of loss I expect
@a.p.2356 Před rokem ⁺⁴⁰
Maybe not most expensive, but Therac-25 should be on that list somewhere. Ya know, because it ended up maiming and killing a bunch of people with intense doses of radiation.
@RoryMacdonald-pfff Před rokem ⁺⁴⁴
There you go Scott - that’s an epic video right there. Top 10 most expensive Astro/Software defects.
@o0alessandro0o Před rokem ⁺³¹
@@a.p.2356 In a way, that is possibly the most expensive software bug ever; in another, it's quite cheap. Consider: we know for a fact that cars kill people, all the time, in every way, yet we do not ban cars.
The value of a human being's life has been calculated, and apparently it's cheaper than you would expect. Electricity production has a cost measured in lives per TW/h. You can look it up. Biofuel has a cost of 12 people per TW/h. Solar is 0.44. Wind is 0.15, and new/clear is 0.07.
The average American consumes 0.1-0.2 GW/h per year. In other words, over the course of your entire life you will likely kill less than one fiftieth of a person in order to keep the lights on. This does stack with the people you kill while driving, however - I'm talking about tyres particulate and excess death from pollution, not running somebody over.
Ain't that grand?
@travelbugse2829 Před rokem ⁺¹⁰
@@o0alessandro0o It's not easy to respond to that kind of information. I do know that training and regular checking of pilots contributes to a high level safety for commercial aviation (ignoring mechanical failures). For drivers, I reckon that similar processes should be followed. It would not be popular among the general public, but I have said for years that licenses should be graded, based on years of experience and how many training courses a driver takes. Governments balk at the idea, however, and go on putting up cameras and roadside radars, more draconian speed limits, but never addressing the fact that poor situational awareness, slow and inappropriate reactions, and limited skills are the biggest factors in car accident rates. But I'm going down a rabbit hole!
@malbacato91 Před rokem ⁺¹¹
Not strictly a bug, rather bad design; but implicit nullability - first introduced in ALGOL in 1965 and later copied into most programming languages - was famously coined by its creator as a billion dollar mistake.
I think I read somewhere that at the time the estimate was quite accurate, but that was 2009 so by now it wouldn't be surprising if it is an order of magnitude too low.
@Papershields001 Před rokem ⁺⁵⁰⁰
I feel such compassion for the Hakuto-R team. They are going to accomplish it!
@serronserron1320 Před rokem ⁺²²
I hope that they can make a new one and landed on the moon the next few years
@emileriksson76 Před rokem ⁺¹⁸
I watched the landing live stream and I felt so bad for them. Their nervous faces really hurt me too. I bet they do it next tie!
@abarratt8869 Před rokem ⁺¹⁷
They may not accomplish it. Very often such incidents reveal a whole load of issues that have been swept under the carpet, and the necessary organisational change required to address them all can easily break a small team / organisation.
Even big companies can be killed by this. This is what is going on in Boeing right now. They caused the crashes of two 737MAXes and killed people. Since then they've tried to institute root and branch reform of how they run their business. Yet, they're still having problems. The most recent one was a fuselage manufacturing defect (they were building them wrong) that had gone unnoticed for approx 700 airframes (yep they're flying, possibly with Southwest today!). Fine, they've found it, repairs needed, not immediately dangerous, but cannot be ignored.
Trouble is the manner of them finding it was accidental; someone was in the right place, at the right time and realised what was going wrong. The issue is that, if despite the introduction of a root and branch reform about how they approach quality (= safety, reliability) they're still finding major issues by chance, then the root and branch reforms are junk and are not working. They should be finding such problems as part of a systematic continuous improvement process, and they're not. So the bet-your-life question is, what else have they missed, given that they've essentially admitted that they've not been looking hard enough?
It's similar with 787 (fuselage barrel joints), brand new 737MAXs with FOD and rodent damage, etc.
This suggests to me that Boeing are in no way adequately reformed following the MAX crashes, the problem most likely being in the senior management who never understood it before and are still there today. It's worryingly possible that they're going to make another fatal mistake. Ok, the FAA is now (belatedly) keeping a much beadier eye on Boeing, but they can't see and check everything; certification engineers / inspectors are not there to do basic QC and basic QC improvement.
The Hakuto team's best bet, if they're to try it again, is to just fix that one core issue and try again, and do as much simming as they can muster. Unlike Boeing, crashes are just disappointments and money.
@99guspuppet8 Před rokem
❤❤❤❤❤❤❤❤❤❤ Yes they will succeed…… After they spend a lot of someone else’s money……… Let’s all go to Sugar rock Candy Mountain
@thePronto Před rokem ⁺¹
But they launched knowing that their testing was invalid. Kinda like practicing parachute landings in a field, then jumping over water. I hope they don't ask me for a donation, because polite refusal often offends.
@henrymalone422 Před rokem ⁺⁷
Been watching you since 2015! You have helped keep me interested in space flight! Thank you for doing what you do Mr.Manley.
@ezequielblanco8659 Před rokem ⁺⁴¹
Being a software developer, I have seen this happen countless times in multiple companies. Software is often overlooked. Testing is usually considered redundant and a waste of time/money. Developer's warnings and requests are normally disregarded or displaced by other department's concerns which are non-technical and even non-functional.
@old_guard2431 Před 11 měsíci
In my experience the software developers/engineers are kept out of the decision-making inner circle. Actually, this goes for engineering/tech in general. It’s fine, just change this, this and that: what’s the worst that can happen?
(Changing the Moon’s landscape to more closely resemble a seedy neighborhood in Brooklyn, one spacecraft at a time.)
@harshu2651 Před 9 měsíci
After fully tested, I still fear my code would break in some case that we have not looked 😂, its scary for space mission
@Nioub Před rokem ⁺²⁸⁶
There was a similar bug in the LEM : if the module had flown above a circular-shaped crated of a certain size, the radar altimeter would have shut off all propulsion, probably leading to a crash. Fortunately the bug was never triggered (mainly because the onboard crew had taken over manual controls at this point) and was found decades after the landings.
@alamrasyidi4097 Před rokem ⁺⁷
why are lunar manned missions not done anymore these days?
@jessepollard7132 Před rokem
@@alamrasyidi4097 Congress dropped funding, so NASA had no money for going to the moon (canceled the last planned 4 trips).
@vast634 Před rokem ⁺⁸⁰
@@alamrasyidi4097 No Soviets to beat
@dr.cheeze5382 Před rokem ⁺¹¹
@@alamrasyidi4097 isn't nasa planning to go back? Starting with an (unmanned?) Mission sometime after 2024?
@alamrasyidi4097 Před rokem ⁺⁴
@@dr.cheeze5382 so ive heard. but compared to the alternative of having to lose these spacecrafts to software error, i think "no soviet to beat" is a ridiculoua excuse. so i still really dont understand why lunar exploration has been strictly rover based these past few years...
@johnbuchman4854 Před rokem ⁺¹⁵⁷
This is why you also have timers for expected milestones (earliest and latest time a milestone can be validly sensed). My background is that I worked on the Attitude and Articulation Flight Software for the Galileo and Cassini spacecraft when I worked at JPL. For a very simple and solid method they could have used what the Surveyor landers did.
@danrbarlow Před rokem ⁺¹⁰
Thanks for your awesome contribution to space science!
@nocturnal6863 Před rokem ⁺⁹
I'm sure mission control had a plot of the expected altitude changes, the lander may have had one as well. Problem is that the expected rate of change of the altitude, was outside what had been set as acceptable for the altitude radar. It was probably written in the specs somewhere. Proper simulation of the landing would have caught this, it could possibly even have been dealt with after launch. It's changing the landing site without simulating it that screwed them.
@nocturnal6863 Před rokem ⁺²
What did Galileo and Cassini use for altitude readings? and would they have been equally screwed if forced to switch over to gyro / accelerometer readings with an apparent failed altitude radar?
@u1zha Před rokem ⁺⁷
@@nocturnal6863 John's point was that "forced" switch is averted, if the switch algorithm is completely disabled at such an early phase of flight. Reread about "earliest time.. a milestone can be validly sensed".
@nocturnal6863 Před rokem ⁺²
@@u1zha except you wouldn’t disable the software monitoring a sensor for failure. Not unless you knew in advance it might give faulty readings at that point.
Further thinking, I think I see what you are suggesting. That it should have been expecting by the dip in altitude and it’s failure the see it, means it should have known it’s altitude was off.
@dandeprop Před rokem ⁺³
Hi Scott: Very nicely done! (but then, I say that a lot about your stuff...). This scenario is directly reminiscent of the situation on the Apollo landings where passing over a crater (or any other feature like that) would cause a 'jump' in the Radar Altimeter-portrayed altitude, and it would 'jump' from the PGNCS altitude. Remember 'Delta H'? The difference between RA and PGNCS altitudes. In order to keep things from diverging in the PGNCS, they had to incorporate a 'terrain map' into the software that accounted for local differences in surface elevation. Remember the landing of Apollo 17? At some time in the PDI maneuver, one of the crewmen (I can't tell which one--they sound a lot alike) said 'We went over the hump, and Delta H just jumped'. It sounds (at least at first blush) like a feature similar to the Apollo 'terrain map' might have been appropriate here (?) Thank you.
@yashrajb5251 Před 9 měsíci ⁺³
Indias Chandrayaan 3 has finally soft landed on the moons south pole successfully. 🎉
@dmacpher Před rokem ⁺²⁴³
Such a bummer that a error correction filter with and edge case nailed them. Lots of amazing data and at least it’s a software fix!
@sliceofbread2611 Před rokem ⁺⁴¹
Cliff case*
@dmacpher Před rokem ⁺³
🎢
@thePronto Před rokem ⁺¹²
Edge case? A crater on the moon? But it's not just a software fix is it? Or are we talking about a KSP do-over?
@slcpunk2740 Před rokem ⁺¹⁷
Seems a pretty basic error, in what universe did they think they could figure the exact altitude without the radar? Even if it was broke too bad, damned if you do/don't.
@dmacpher Před rokem ⁺⁵
@@thePronto They moved their landing site to align with NASA South Pole targets super late in development (post validation). The threshold for culling/re-baselining seems to be the issue. The sudden change in relative altitude wasn’t expected from their simulations.
@BeardyBaldyBob Před rokem ⁺²⁷
I'd argue it's due to inadequate testing and making assumptions they shouldn't make rather than just blaming the software.
To move the landing site and NOT run a series of full simulations for the new site is just an astonishing degree of incompetence!
@mcgilliman Před rokem ⁺³
This.
@BeardyBaldyBob Před rokem ⁺⁵
@@mcgilliman I like to think of an F1 analogy... Imagine if you set your car up to race in good sunny weather in Monaco at sea level, and they changed the race to be in Mexico in soaking wet weather at 2,260m above sea level... You would NEVER just race the car with the exact same set up and no testing before the race!!
@Myndale Před rokem ⁺³
True, but if history has taught us anything it's that the incompetence almost certainly wasn't the software engineers themselves and was instead a cumulative effect of multiple levels of beurecracy repeatedly ignoring the recommendations and pleas of the people who actually knew what they were doing and what additional work had to be done. I suspect this is a scaled-down version of Challenger all over again, albeit thankfully with no loss of life this time.
@mikeburch2998 Před rokem
I'm so sorry to hear that this happened. I hope they try again and maybe send back some remarkable pictures. Don't give up. Greetings from Arizona.
@ytashu33 Před rokem ⁺⁸
Love this! Thanks you for reminding me of Kalman Filters, i studied those in my M. Tech., loved them but never thought i would ever hear of them again. I still remember how the "location estimation" part, based on current velocity and direction integrated over time (aka: dead reckoning) can provide smooth and accurate predictions over short durations, but errors tend to accumulate in a physics based predictor like this and needs to be augmented with an independent measurement (ie: the radar), even if the radar data is not accurate. Amazing to see how stuff like that led to this outcome. It is a tough one though... I wish you had shared your thoughts on how should a "faulty sensor" be detected then? I mean, you could say that a 3 Km sudden jump in the sensor output means the the sensor is probably broken, right? If not, how else would you do that and handle the case when the sensor actually is broken?
@Beregorn88 Před rokem ⁺²
Redundant systems and majority check: if all three of your radar sensor reports a sudden altitude change, than that's what actually happened. What surprise me is that the sudden altitude change eventuality is never accounted for...
@regolith1350 Před rokem ⁺¹⁰⁴
Software may have been the proximate cause but you can argue the real problem was somewhere in the development and quality control procedures. How can you not re-run a full landing simulation after changing the landing location? It reminds me of Starliner's problems in 2019. The software glitch where the flight computer grabbed the wrong "time" was the proximate cause, but the real problem was Boeing never ran a full end-to-end launch simulation.
@srinitaaigaura Před rokem ⁺³
Actually these days so much of manufacturing and coding is outsourced that the management, hardware and software teams are no longer next to each other - quality control begins to suffer massively. The more people outsource stuff, the more the work gets into the hands of rookies paid on cheap wages, who then end up making rookie mistakes that then require even more time and energy to fix. Boeing turned from an engineering firm to a management firm and the rest is history - 787, 737 max, 777x, Starliner.
And as more and more automation comes in there's less and less human intervention to take care of the times where the computers reach their limits.
@user-cr4sc1ht9t Před rokem
Feels like they might not have a great CI indeed, probably more like bunch of artifacts in git LFS type of management. But Starliner glitch might be slightly different topic IMO
@BubblefishOfTrem Před rokem
I was also wondering how expensive such a simulation would be. If they aren't too expensive, I was wondering if you couldn't run landing simulations from randomized positions and flag anomalies from there. Not so much that you can just fling the lander at the moon arbitrarily, but more so you can find starting conditions which result in something weird.
IDK, maybe we're getting into a space where "moon lander software testing" and later "asteroid lander software testing" might be a market, that would be amazing. With the costs of these missions, there might be some money on the line for a testing company - especially if they end up with a body of "known problematic situations" like the one from the video.
@MrJdsenior Před rokem ⁺²
How can you put a tank that has experienced both problems and damage in test into Apollo 13? Exactly like that, only different. Or get km and miles crossed up and smash a probe into Mars (IIRC), or ... ad infinitum.
You can run all the simulations in the universe and still have problems, but not running ANY sims to cover a deviation in the program...yeah, that's just begging for it. I would think, in this day and age, that you could pretty much run that sim real time in parallel with the mission, for the problem they had there, knowing the path and surface profile, I'm guessing, and have it fire up a quick "do not ignore the damned properly functioning radar" command, or some such. It might even be good to have REAL TIME simulations running against the truth of the mission.
Having done some aerospace hardware design, I'm guessing that there were schedulers and/or bean counters directly in the problematical loop. Or maybe idiotic MBA wielding managers that think they are engineers, or worse know BETTER than the engineers, because they know a few buzz words, and then maybe hold people's feet to the fire to get them to sign off on VERY cold Shuttle launches, or what have you. That's the sort of feedback you do NOT want in, say, a servo. :-/ Sometimes I look back and am glad I am retired, frankly. Some of it was fun, some of it SUCKED.
Doc requirements come to mind as some of the latter. I had one junior documentation fiefdom wannabe tell me that the real output of a program was the documentation. When I finally quit laughing I told her that if she actually believed that she should go talk to some F16 pilot and ask them which they'd rather have with them on a mission, a working LANTIRN pod, or the documentation that describes it. She wasn't happy, because then a couple of people standing around laughed too. She wasn't a nice person (that's putting it mildly), or I wouldn't have said it that way. My bad, I guess.
@i-love-space390 Před rokem ⁺¹
Armchair quarterbacks are a dime a dozen. You can certainly crow if you ever land a vehicle on the moon or even achieve orbit. Perhaps we can talk about "how obvious" the solution was when we stop whining about how LONG it takes to build and fly a vehicle and how the contractors are "milking the American public" for so much money.
I thank Providence every day for Kathy Lueders and NASA for riding herd on SpaceX to make the Dragon 2 safe. Everyone had lots of criticism for NASA for being conservative and "delaying" the first launch of the manned spacecraft. But all that effort kept the astronauts safe. (Also SpaceX had a real leg up on Boeing, because they had a working cargo spacecraft in Dragon 1 to build on. The last time Boeing designed a manned spacecraft was the 1970s and the Space Shuttle. All those engineers are long since retired.)
@user-jz1su8bh5t Před rokem ⁺¹¹⁴
Another outstanding episode Scott! Being a software safety engineer for the last 39 years, I have to agree with previous comments that point out this is not a software bug, but more of a people problem during design, testing, management, etc. I believe the first Ariane 5 launch was a similar issue where the software worked perfectly per its specifications (from Ariane 4) and doomed the flight to failure. Like in this case, proper testing would have prevented the, expensive, tragedy. Also wanted to give a shout to "How To Destroy Wayward Rockets - Flight Termination Systems Explained". My 39 years were all spent on Range Safety Software with the last 13 years working on autonomous flight termination systems. That was another outstanding episode! Keep up the awesome work!
@Icowom2 Před rokem
Pop op o99⁹9th kiwi's😊
@xGOKOPx Před rokem ⁺¹
It is a software bug though. People problem is that the bug wasn't caught
@vast634 Před rokem ⁺¹
Have you ever experience a flight termination system not working instantly, but 50 seconds late, as with the starship launch?
@user-jz1su8bh5t Před rokem ⁺⁴
@@vast634 Depends on the type of Flight Termination System (FTS). For solid rocket motors, they use a shaped linear charge that opens the casing and exposes the fuel which burns up quickly in an impressive display. (I think Scott mentioned that in his previous video.) For chemical fuels, things are different. You have more choices. The basic idea is to stop thrusting the vehicle so it falls into an unpopulated area, such as a broad ocean area in the case of SpaceX. Based on the video of the flight, the FTS worked properly and detonated explosive devices that created holes in the fuel tanks. That reduced or stopped the fuel flow to the engines. The FTS did its job. After that, it's all physics. If the fuels are hypergolic, they will combust on contact and you get a near-instant explosion. Otherwise, you need combustible fuel, oxygen, and an ignition source. Guessing, it took about 40 seconds before the three elements came together in the right quantities in the case of SpaceX. An FTS doesn't need to create an explosion. Rather than connect to explosives, the FTS can connect to fuel valves that terminate fuel flow.
@user-jz1su8bh5t Před rokem ⁺²
@@xGOKOPx I understand your perspective. My point is that the bug should have been avoided during design or implementation, and if not, then detected during development testing. Find and correct all the bugs before deployment. Since their development testing failed to react properly to "unexpected terrain" (kind of a silly term considering the moon's terrain is pretty stable), the people failed in the software development cycle and left in a failure mode (i.e., the bug) so it could be exposed during execution. The software did what it was designed to do so it worked properly. The people failed to account for something. The same thing happens with hardware but folks don't usually blame the hardware. The failure of Galloping Gertie wasn't blamed on the bridge. The people who designed and built it were blamed for not accounting for potential wind loading.
@kaineis Před rokem
I love the ksp2 animations you added. That was really nice to watch.
@caturlifelive Před 11 měsíci
Thats why i love Scott Manley video, so detail
@bobboonstra3484 Před rokem ⁺⁷¹
Not a software bug, it was a design bug. The software functioned as specified.
@pigsnoutman Před rokem ⁺²
How do you know? Did you read the design spec? If the design spec stated it should be able to handle multiple lunar landing locations, then it's not a design spec issue.
@simongeard4824 Před rokem ⁺²
Definitely a process bug that this wasn't picked up in testing - but premature to say that it wasn't also a software bug.
@marcusdirk Před rokem ⁺¹
@@pigsnoutman 6:17
@DavidEsp1 Před rokem ⁺¹
Mismatch at Requirements and/or Expectation levels. Activated by beyond test envelope operation. Needed a calm (seasoned?) "captain" to hold a steady, pre-planned course.
@Spillerrec Před rokem ⁺⁴
@@simongeard4824 I think the video was quite clear on that the software started ignoring that sensor because it was programmed to do so. An intentional feature that behaved differently than expected *because* it was put into a situation that was not considered while designing it. And that this only happened because they changed the mission plan after the software was developed and did not test it again with the new landing site, because their tests would have detected the issue. That last part really hurts because they reasonably could have avoided the crash.
@perishmokrat8257 Před rokem ⁺¹⁸
Working as a Software Tester I often see the managers tend to take the risk to save some money vs malfunctioning SW especially when it has to deal with error handling.
@Henglaar Před rokem
Which is a shame, really. The more expensive the project, the less management should feel like cutting corners on error handling and verification. Ah, well, what "should" happen in the real world doesn't agree closely with what actually happens in the real world.
@Anacronian Před rokem ⁺⁶
It's crazy to me that they didn't redo the simulations when a new landing site was chosen.
@rayoflight62 Před rokem
Thank you for all the detailed explanation!
Greetings,
Anthony
@connecticutaggie Před rokem ⁺⁶
Yea, that is the challenge of small projects with limited resources. It is great that this is not a problem for larger projects (cough-cough-Starliner) that have the money and resources to allocate to proper SW verification.😆
@IsMaski Před rokem ⁺¹²⁸
Unfortunate to see what led to the failure of this mission. But glad to see that they have found the issue. Really hoping they succeed on their next attempt. Thanks Scott for the comprehensive explanation on this!
@MrPaxio Před rokem ⁺³
they didnt find the issue, they made the issue
@MonkeyJedi99 Před rokem
Sounds like the software took the path of flat-Earth "science".
What I see doesn't fit my preconceptions, ignore it!
@togowack Před rokem
People need to wake up, controversy surrounded moon landings because there is stuff there. The issue / bug was in there on purpose. They will probably never let us see the real moon.
@davidbeppler3032 Před rokem ⁺¹
They did not find the issue. The issue was management. The software was fine. Software did not change the landing location, management did.
@togowack Před rokem
@@davidbeppler3032 The whole things was planned it is every time with every country why do people not see this, every single machine that lands on the moon has issues - #1 because the surface is covered in glass domes and other hanging debris #2 to cover up such things from the public in a convincing way.
@wChris_ Před rokem ⁺⁴
Its amazing how Apollo didnt have such bugs, despite it being written in pure Assembly!
@PMA65537 Před rokem
They chose tamer landing sites.
@phloxie Před rokem
@@PMA65537 apollo 15 likes to have word wth you
@castafioreomg Před 9 měsíci
Apollo missions had some issues but they handled then well..The engineers couldn't even visit their families becoz of the work pressure
@AleXsSpaceXTalks Před rokem ⁺²
Very good explanation and top video! I guess also the loss of the Mars Polar Lander was caused by a software issue, telling the landing thrusters to ignite too early, causing the probe to run out of fuel...
@hjalfi Před rokem ⁺⁴⁶
There's an argument to be made that if a sensor is critical enough that if it fails you're going to land on non-existent terrain 5km up, then you just assume it won't fail. If you handle failure gracefully but then don't have enough data to avoid crashing, what's the point of handling it gracefully?
Of course, ideally you'd have a backup. Like another radar, or GPS, or a video camera capable of estimating height using machine vision and a map, so you can sanity check it. The next best thing is just have a map: the vehicle knows where it is, so if it knows the terrain it can estimate what the radar values _should_ be, so instead of going 'eek, a delta of 3km in ten seconds is clearly wrong' you go 'the radar has shown a delta of 3km in ten seconds, what does the map say the delta should be? Right, 3km, moving on'.
@stoic.little Před rokem ⁺⁴
You can have a video camera that is very good at finding the distance by using phase detect autofocus, same principle as a rangefinder.
@driedurchin Před rokem ⁺⁵
I work in flight software and you're right. At a certain point if a system is so critical and irreplaceable you just have to trust it won't fail because as you said, detecting the failure isn't helpful if your SOL.
@Spillerrec Před rokem
There is an argument to be made that if a $90 million project can go up into smoke due to a single sensor failure you have an expectation that it could potentially fail, you should really have some sort of redundancy even if it is unlikely. Or some other form for backup plan. The question is if it was actually considered if this sensor could fail, or if it just used the same behavior failure detection and handling as any other sensor without further consideration.
@CodeKujo Před rokem ⁺⁵³
My reaction to just the title is "There are no unbelievable computer bugs".
Now that I've watched the video: *very* believable. Accumulation of error is nasty and dead reckoning is very hard. Changing something that "can't possibly affect the outcome" late in the process and not doing a full test happens often enough that it's a subject of comic strips and many high profile failures.
@Hebdomad7 Před rokem
Except the one that flew into one of the first computers and caused a short circuit.
@Ergzay Před rokem ⁺²
Scott's been moving to more and more clickbait titles of late. It's unfortunate to see him doing it.
@winebartender6653 Před rokem
When you're using accelerometer and gyroscopic data alone for position on a 2d plane, it can become hilariously inaccurate quickly, no matter how good your algo is.
Doing this in a 3D plane would be basically impossible if I'm being honest.
As an example, there is a reason VR relies so heavily on video processing for limb positioning. Obviously these aren't in the same ball park of cost/importance, but the same rules apply.
@VarenRoth Před rokem
The unbelievable part here, honestly, is how someone expected this to work without simulating the actual final flight plan at least once.
@CodeKujo Před rokem
@@winebartender6653 US missile submarines can pull it off, but their inertial navigation hardware is larger than the entire lunar probe and submarines experience much smaller accelerations.
It does seem like it was selected as a fallback with rather optimistic expectations of how well it would stay accurate. In hindsight, it would have been better to try turning the radar off and back on, relying on inertial navigation only as long as it took the radar to come back on. Also, redundant radar.
@AMeierhoefer Před rokem ⁺¹³
Scott, I am surprised that you did not touch on redundancy. I was a fighter jet aviator and one of the things we always did was use multiple sensors to allow the software to compare and then estimate probability. If they has three Radar altimeters they could see the rate of change of the surface as the spacecraft travels. Even if each would have shown the cliff, probability calc would have told it that its is virtually impossible that all three are suddenly all bad. Redundancy would be one answer in my book.
@thierrybriand2413 Před rokem ⁺¹
Agree and also on my part, I always thought that radar altimeters were used « closer » to the surface.
@drill_fiend1097 Před rokem
Probably budget constrained.
@AMeierhoefer Před rokem
@@drill_fiend1097 This is a commercial effort so they could have just gotten one normally used in aircraft. It's not NASA where they cost $750K each just because...
@i_Kruti Před rokem ⁺²
7:50 Yeah , the VIKRAM lander from CHANDRAYAAN-2 had lost communication and went out of control , but with improvements in software, damper etc , we are again ready for CHANDRAYAAN-3 to it in July according to official message......
@dust1209 Před rokem ⁺²⁵
This reminds me of an Alastair Reynolds novel where an automated system recorded the sudden vanishing of a planet but disregarded the data because the event was so far out of expected results that it assumed there was some kind of fault.
@letsburn00 Před rokem ⁺⁵
It then accidentally creates a cult.
@yogiwp_ Před rokem
Which novel is this?
@dust1209 Před rokem ⁺⁷
@@yogiwp_ Absolution Gap, it's the third book in the Revelation Space series which is kind of weird. If you're looking to check out the author, I'd recommend Pushing Ice!
@ShoeTheGreyCat Před rokem
@@letsburn00 And also liquifying the poor guys wife stuck in the scrimshaw suit
@letsburn00 Před rokem
@@ShoeTheGreyCat I forgot about that bit. Given that series largely relates to characters that are functionally aging immortal, it's wild how easily they torture and kill each other.
@thePronto Před rokem ⁺⁹
A lunar lander encountered a crater and got confused. Total freak accident: one in a million. I can totally relate: today, I encountered a Starbucks in a strip mall.
@AllAmericanGuyExpert Před 11 měsíci ⁺²
My Dad helped design the Apollo lunar landing software ... and curiously enough, it was never used due to a sensor overload ... the famous DSKY error 1202. When Neil Armstrong disabled my Dad's software for Apollo 11, that was the end of it. The LM landing program was always over-ridden by future LM pilots and the LM was landed manually. The fault was in a completely unrelated system ... I guess a lot of people wonder if it would have done its job. My dad says it was pretty robust and he never saw a simulation that it would have failed if given the chance to run to completion.
It's a good thing Armstrong was a good pilot!
My dad would go on to be famous for mockups, and then later, he worked on the avionics of the world's most capable fighter jet. He's getting old, but still with us. I wish he was more of a storyteller ... but the one he thought was the funniest (and most irrelevant) was meeting the president in the restroom at NASA ... as in, _um, nice day, isn't it Lyndon?_ as they conducted their business. I am guessing it was during LBJ's visit to Houston in 1968, the same time frame that my dad was working there.
@PT-xi5rt Před 10 měsíci
You still believe in this fable? Open your eyes
@AllAmericanGuyExpert Před 10 měsíci
You @@PT-xi5rt didn't know that LBJ was president? Or that he used the bathroom like the rest of us?
@bretthoffstadt Před rokem ⁺⁴
I can't believe they didn't simulate their final landing site but that's what you are saying. Thanks for the explanation. Such a shame, they picked the wrong thing for a shortcut!
@firefly4f4 Před rokem ⁺⁹⁹
By, "unbelievable", I'm pretty sure you meant, "Completely realistic, very common scenario when the software is put in an untested environment."
Note that I am saying this as a software developer myself. I actually just identified a scenario where our existing tests were thought to be sufficient, but then some surrounding parameters changed and a bug was found.
@jarisundell8859 Před rokem ⁺¹⁹
As a software developer myself, I'm actually asking myself why those simulations were not set up to run like a CI.
@firefly4f4 Před rokem ⁺⁷
@@jarisundell8859
Good question. Seems like actually running the sim again once the final site was chosen should have caught this, maybe allowing them to upload the fix.
For the record, CI is how the one I looked at was caught... prior to release 👍
@danstenger1 Před rokem ⁺²
Scott is also a dev by trade, too, lol, he works at Apple.
@cinquine1 Před rokem ⁺⁸
@@firefly4f4 I think it's a joke, since the bug happened because the computer didn't "believe" the radar
@scottmanley Před rokem ⁺⁷⁶
By unbelievable, I mean the software stopped believing the radar
@kennethng8346 Před rokem ⁺⁹⁶
I've never done it, but from what I have read, sensor fusion is an enormously complicated and fuzzy technique. You have to take a bunch of sensors, account for non linearities and malfunctions, and you need to figure out which ones are correct, which ones are sorta correct and by how much, and which to ignore. On top of this you have enormous weight and power restraints. And there must be a million fudge factors that have to be played with. Move it one way and you get a false positive, move it the other way and you get a false negative.
@andrewahern3730 Před rokem ⁺³
I wonder if this would be a good application for AI? A computer would definitely be able to interpret way more inputs than a human pilot ever could and in real time
@JKa244 Před rokem ⁺¹
It's a satisfying problem to work on.
@Niosus Před rokem ⁺²⁸
@@andrewahern3730 AI isn't a magic fix. Those sensor fusion algorithms are supported by a a deep understanding of the system and statistics. Like with the Falcon 9, they are extremely reliable once properly tuned.
Obviously an advanced enough AI system can always do the job. But if, like in this case, you simply didn't test the system with enough variations of inputs, you're not going to get good results either. The amount of simulations needed to properly train the AI would also have been plenty to find this bug in the old control code.
The lesson here is that more robust testing is needed. I have a feeling that spaceflight is often seen as hardware-first. That's understandable, but without proper software the hardware is useless. I think more modern software engineering practices could be useful here.
@Orieni Před rokem ⁺⁴
IRL, nothing says you can’t have false positives and false negatives at the same time, while you struggle to understand the data. That’s no fun at all.
@GeorgeTsiros Před rokem ⁺⁶
kalman filtering is pretty damn straightforward. It's a basic method, not something extraordinary. Known for more than 50 years and optimal for typical sensors (ie those with common noise distribution).
@carlwill5009 Před 11 měsíci
😅 A buddy's could hear from you again. Thanks for your good update videos.
@Songfugel Před rokem ⁺⁵³
Having seen in person how Japanese programmers work, how specialized and narrow their programming skills are and how ridiculously rigid their management approaches are, how many non-unified standards they use, this sort of thing doesn't surprise me at all
ps. the Ron Burgundu clip was priceless and so on money xD
@goodlife1302 Před rokem ⁺³
I actually did not get your point . Could you please explain little bit more ?
@JosePineda-cy6om Před rokem ⁺³
the point being this was a bug tha should've been relatively easy to find, if thoy had simulated a couple of "landing site changed at last minute" scenarios that included heavily cratered areas or craters with steep walls. Just doing some tests on random landing sites would've triggered this. But nobodu thought of this, and because of corporate culture, everybody was dis-incentivized to even raise the question
@StudioVRM Před 11 měsíci ⁺¹⁰
The software was built by Astrobotic, an American company. Not sure how stereotypes of Japanese corporate culture come into this.
@goodlife1302 Před 11 měsíci ⁺²
@@JosePineda-cy6om Oh ok . Thanks a lot for the explaination
@Dr.Kraig_Ren Před 11 měsíci ⁺⁴
They outsource programming.
It happened due to budget and time constraints. I'm pretty sure engineers wanted to rerun the simulation
@dorsetdumpling5387 Před rokem ⁺¹⁰³
Unbelievable that they had only one method of determining altitude!
@manuelsilva8640 Před rokem ⁺³
My thought exactly.
@theqwert3305 Před rokem ⁺⁵⁰
And that that one method could be turned off for the rest of the landing!
@EmpereurHector Před rokem ⁺⁶
I guess that's part and parcel for those very small landers.
@GlutenEruption Před rokem ⁺²²
I mean to be fair, even the Apollo lunar module only had a single non-redundant landing radar altimeter for determining exact altitude. The astronauts were fairly confident they could manage to land without it but if it failed, mission rules called for an immediate abort. The weight constraints for landers are so tight, engineers have no choice but to make those trade offs.
@dorsetdumpling5387 Před rokem ⁺⁴⁷
@@GlutenEruption Ah, but they had the backup that was the Mk. 1 Eyeball and its associated biological computer!
@chouseification Před rokem ⁺⁷⁰
hey Scott - thanks for the analysis. I remember this one (as well as the Israeli and Indian ones) and seeing the disbelief in the control room was sad. It is easy to tell who has a clue and who is a bureaucrat by their expressions, etc. :P
@adarsh4764 Před rokem ⁺³
Hope there's no software issue when Nasa lands back on the moon!😂
@chouseification Před rokem
@@adarsh4764 agreed - one would have thought that even a small lander would have a pretty robust navigation system these days, but obviously they met an edge condition they hadn't properly tested for... and a sad oversight too as nearly all landing trajectories will have the radar return affected by craters you're passing over. There are many of them after all, and although most are small, many are large/deep and you need to keep their profile in mind as you use the radar/laser/etc surface measurement.
The state vector routine needs a sanity check to make sure the drift never disagrees from projected too much without it doing some form of reliable recheck.
@markmarco2880 Před rokem
Way to go, Scott. Fly safe, indeed.❤
@LightsEnd304 Před rokem
Your explanation reminded me quite a bit of dynamic positioning systems on ships / oil rigs
@Hagop64 Před rokem ⁺²⁹
If it stopped to a speed of 0, then fell to a speed of 500 km/h then it would have had to fallen for ~86 seconds. Moon gravity acceleration = 1.62 m/s^2. That means it was in free fall for a distance of about 6.0 km. That's all based off of the "500 km/h" crash speed given.
@scottmanley Před rokem ⁺⁴⁶
Actually, I figured out 500km/h based upon the amateur radar measurments of 88seconds of freefall.
@Hagop64 Před rokem ⁺¹¹
@@scottmanley Love how reliable basic physics equations are! With either bits of data it still comes up with the same results! If only the rest of landing on the moon were that simple.
@travelbugse2829 Před rokem
What I want to know is how that equates to a violent impact on earth. Do I divide by six, which comes to 83.3km/h or just under 52mph? That's bad enough for it to need airbags...
@highdefinist9697 Před rokem ⁺³
@@travelbugse2829 You multiply by the square root of six - assuming there is no air resistance, so with air resistance you might end up with something not too different from 500 km/h for this type of vehicle.
@Kromaatikse Před rokem ⁺⁴
@@travelbugse2829 When it comes to the moment of impact, 500kph is 500kph. It's about Mach 0.5. You know those old war movies where they show fighters shot down and augering in? *That.*
@glennpearson9348 Před rokem ⁺³⁵
Excellent explanation, Scott. Thanks for putting it all together for us to easily digest. Nice Kerbal recreation, too!
@gonun13 Před rokem ⁺⁴
Putting aside changes in mission plans, redundant systems missing or even software bugs, I think the main issue here is overly strict programming. Assuming something is defective just because of a sudden change that is out of scope is bit extreme. Baffles me how it could hover waiting for the moon while letting propellant go to zero without at the some point trying to salvage itself with something like "this is not working, maybe I should take another look at that system i think it's dead".
@ahadsuleymanli9572 Před 11 měsíci ⁺²
what you're describing is human decision making, and you're ready to scratch this plan and try something better when the moment comes. you can't just imagine every scenario branching out at every step and hard-coding solutions to each. At some point you'll realize you need a generic decision making algorithm. In fact the mission failed due to them having a specific solution of switching off a reading since that allowed them success in previous simulations.
@bosqueblanco3744 Před rokem
great presentation
thanks for the follow up
@Alex-og3ev Před rokem ⁺⁵⁵
Similar thing happened in 2017 with second launch from new cosmodrome Vostochny, old software logic applied to new geography without double check. Didn't happen at first because they used very rare Volga upper stage but second launch was in default configuration that flew for decades from launch pads everywhere including South America. So after final separation, Fregat upper stage was scheduled to make 10 degree turn counter clockwise but due to geography of new cosmodrome and flight trajectory, software decided that it needs 350° clockwise turn instead. Didn't end well. Turned out that there was narrow set of input parameters that could make upper stage behave like this and new lauch pad won jackpot.
@JohnMullee Před rokem
Wasn't there something about thermal modelling and pipes freezing in the fregat upper? Or am I misremembering
@Alex-og3ev Před rokem
@@JohnMullee No, that was definitely some other story
@ns219000 Před rokem ⁺⁷⁵
Japan, sorry for your loss, but thanks for the software design lesson. Rockets are hard and this is how we learn. Thanks for sharing this one, Scott!
@elleryhorton44 Před rokem ⁺⁴
Redundant systems to help the mission don't matter if the mission never starts. I worked on a Single/Dual/Triple redundancy system a long time ago. I think the probability of a single incorrect signal per million samples for each device was 75/93/98 percent (roughly, I don't recall the exact number). A huge bonus from single to dual redundancy but rarely worth the extra 33% in cost between Dual and Triple. However, each module had to boot up on its own and if they did not, then the system wouldn't run anyway.
@user-if1tz7uj5f Před rokem
Thanks Scott!
@joelcorley3478 Před rokem ⁺⁶⁶
But what if the radar altimeter actually did fail around the time it passed over that crater? It sounds like it would have produced the same result. I think the only way to deal with this in the design is to have at least one redundant sensor for something this mission critical. Of course the problem with just one sensor is that you need to try figure out which one is actually the broken sensor. That's why there is often 3 sensors or 3 computer systems that are used in this kind of redundancy...
@sonaxaton Před rokem ⁺¹³
Sounds like a redundant sensor wouldn't have helped this particular issue though, because it would have just gotten the same confusing measurements of the cliff wall. I think they just need to thoroughly run simulations of the actual mission to catch edge cases like this early.
@a4d9 Před rokem ⁺³
On a vehicle like this, without humans onboard, the space and weight requirements might be too costly compared to the risk of a failed sensor.
@SashaNaronin Před rokem ⁺⁴
@@sonaxaton exactly. Proper simulation campaign would've catched that.
@Damien.D Před rokem ⁺¹⁶
@@sonaxaton 3 redundant sensor and a voting system is the way to go. Worked flawlessly in many aeronautical things, from Concorde autopilot to missile guidance system.
@travcollier Před rokem ⁺¹⁹
The dead reckoning system combined with prior knowledge (a map of roughly what is expected) should have been enough of a redundant system. Seems like they should have included a reassessment/recovery routine to check if that apparent altimeter glitch (which wasn't a glitch of course) cleared and the instrument was giving reasonable data.
This stuff is really tricky without a human in the loop.
@RogHawk Před rokem ⁺⁶
Thank you, Scott! You answered questions I've had for the last few years about the landers crashing on the moon.
@noahserio4182 Před rokem ⁺²
I’m surprised they didn’t have a redundant altimeter to verify the suspect altimeter reading against.
@thetooginator153 Před rokem ⁺⁸
It would be interesting to try an optical parallax system to verify the radar readings. If both systems agree, then the data is correct. Cameras could be a few meters apart, so, the parallax would be measurable from pretty far away.
@EnricoGolfettoMasella Před rokem ⁺¹
That’s a very creative solution! ✌🏼✌🏼Pretty sure would work!
@xonx209 Před rokem ⁺¹
If they don't agree, then what do you do?
@4k8t Před rokem
@@xonx209 In sci-fi usually it would be three independent systems with two having to agree as to what they were seeing. A two system setup would require that both system have to agree and if one system cut out a sensor as malfunctioning and the other didn't, something would have to be present to break the disagreement deadlock.
@Anvilshock Před rokem ⁺³⁶
Hardly a "bug" when it worked correctly for the data input it was programmed to handle. At best, it encountered data it _wasn't_ programmed to handle, which makes this more a missing feature.
@mikehartsough489 Před rokem ⁺³
I was thinking same thing. Sounds like the software did exactly what it was supposed to do.
@1224chrisng Před rokem ⁺¹¹
well, a bug is just unintended behaviour. The computer did exactly what you told it to do, just not what you wanted it to do
@ddnguyen278 Před rokem
Can't imagine why they didnt run simulations of this. It's not like the moons topology isn't known down to the meter. Stick it in Kerbal and run simulations.
@RemyPorter Před rokem ⁺⁷
@@ddnguyen278 Uh, the moon's topography *isn't* known down to the meter. Some areas of the moon are, but generating meaningful maps of the moon is actually quite hard and time consuming. There are folks whose entire job is to take lower res digital elevation maps and apply reasonable interpolations to generate higher fidelity maps than we actually have.
Not saying they shouldn't have done more sims, but it's harder than it sounds.
@davidwright7193 Před rokem ⁺¹
Repeat after me “That’s not a bug it’s a feature”
@0x8badbeef Před rokem ⁺⁹
6:20 Planned landing site change? That would normally require a revalidation of the software in the industry. I would blame this on the people who decided not to do that. I would investigate those guys and why the change. I would not blame the software as the software was not designed to be used that way.
@Aditya-gp2ih Před 10 měsíci ⁺¹
Came here after successful landing of chandrayan 3 of India....best of luck to Japan for future projects...
@joszandstra2044 Před rokem
Thanks for the interesting video. Hopefully next time they will make sure the software is working ok
@codediporpal Před rokem ⁺⁸
I'm very impressed with the abilities to diagnose what went wrong. Even amateurs helped! Another case study for future designers of "fail-safe" systems.
@rorykeegan1895 Před rokem ⁺⁴
Seems pretty sloppy not realising a change in landing site might cause the craft problems ... Sounds like bad project management to me.
@krispockell685 Před rokem
So sad! My heart goes out to the team.
@therealzilch Před rokem
Another fascinating and instructive example of Robert Burns' "The best laid schemes o' mice an' men / Gang aft a-gley.”.
Cheers from sunny Vienna, Scott.
@tertiaryobjective Před rokem ⁺⁵
Like when you're walking down the stairs and miss that last step.
@BILLY-px3hw Před rokem ⁺¹¹
It tore me apart watching the team coming so close, it really has to weigh on the people who didn't catch the glitch, I am sure some are still laying in bed awake at night, can't wait to see the team bounce back with a flawless mission
@OhNiceMatt Před rokem
Those software engineers were layed off, hence the laying in bed awake at night
@scvcebc Před rokem ⁺²
Neil Armstrong took over the controls and manually landed on the moon when he saw rougher terrain than expected at the final approach of the first manned landing in 1969. He was a true test pilot who was able to think fast and take action without losing his nerve. He barely had enough fuel for the extra maneuver, so he was also lucky. The problem with depending on robotics is that software doesn't have "common sense" and enough experience to handle the unexpected. However, these crashed robot landers are much cheaper than manned missions, so with trial and error they will eventually work.
@ClickClack_Bam Před rokem
And then a unicorn ran up & they rode the unicorn all around the Moon going 240,000 miles back to Earth.
The Unicorn didn't run 28,000mph like they would've had to go in the pop rivet aluminum can they brought them there.
@c.ishikawa6346 Před 11 měsíci
Thank you for the lucid explanation. Japanese media did not seem to convey the essence of why the failure occurred, yet (?).
I understand the problem with the software. I often insert "this can't happen." processing in my code. Tough, it was not remedied somehow by secondary sensor or some sort of re-calibration effort. I wish the project a success next time.
@lyoha5028 Před rokem ⁺⁸⁴
I wonder what all these people in mission control were doing during the landing. Were they analyzing the telemetry in real-time? I assume they were supposed to notice that the radar altimeter was considered faulty and disabled. If so, perhaps they could have reviewed its readings and realized that after passing the edge of the crater, the readings returned back to normal. In that case, they could have just manually reenabled the radar altimeter. Since it is not Mars, the signal delay is small enough to allow for manual corrections during the landing.
@katho8472 Před rokem ⁺¹
Word!
@ooooneeee Před rokem ⁺¹¹
They lost telemetry. If they had a connection they could saved it.
@pavanshetty9806 Před 11 měsíci
There might also be delay in communication.
@skougi Před rokem ⁺³
After watching both India and Israel do the same thing (live) I decided it must be tradition to crash on the moon at least once before landing there.
Seriously though, both of those crashed last minute too. It's like their up becomes down and they rocket full speed into the moon trying to avoid, well, just that. I think the one chinese probe even had to resort to using optical recognition tech to get around the weird landing issues. thanks for posting!
@jeechun Před rokem ⁺²
This story reminds me that once I planned to make a simulation for spaceships/probes, where the simulation goes down to almost hardware level, where the subsystems (sensors) could be configured to have a certain precision, sampling rate, processing delay, and the way how they communicate with the CPU, the flight computer, so the design of such a vehicle architecture would be closer to reality. Also, the propulsion units could be configured to have delay to start/stop/change working, and a function, how it is done.
May be, in KSP3? :D
(Feel free to use this idea, most probably I won't have time to develop it.)
@deeplearning7097 Před rokem
Thanks Scott.
@riccardob9026 Před rokem ⁺⁴⁰
To be honest (and a bit philosophical), I would not call this a "bug," in the sense that sometimes with bug you mean an error in the software that makes it behave differently from the behavior specified at design phase. In this case the software had to face a situation that was not expected, that is, a suddenly increase of altitude due to a deep crater. It was not an error introduced at implementation time (that is, when they wrote the software), but at design time. Like a bridge that breaks down, not because some error during the building, but because of a strong wind that was not considered at design time.
@DrDeuteron Před rokem ⁺¹⁰
I agree. This was planning error, or a failure to test error, or changing the landing into a regime that had not been tested, or all of the above. It's been know for a long time that radar altimeters can be spoofed by terrain, it is nothing new.
@serronserron1320 Před rokem ⁺¹
An engineering oversight
@aspuzling Před rokem
As a software engineer I agree lol but that's not to say it is not also partly the responsibility of software engineers to raise potential bugs in the design.
@chaz720 Před rokem ⁺³
Agreed, and came to write this. As a space systems engineer, this was a systems engineering failure, not a software bug.
@bbgun061 Před rokem
They should have tested their software with real data.
@seann4678 Před rokem ⁺¹³
Hi Scott, during the iSpace debriefing, they reported that their velocimeter did not start reporting data when it expected to be 2km above the surface (event 9 in the schedule).
Do you know if this is a separate issue or a consequence of being too high from the ground?
@vrendus522 Před 10 měsíci
Thanks Scott
@bobdionne4625 Před rokem
YOU fly safe little brother.
Your program is very instructional and informative. And it must be a blast to produce. But don't be distracted while you're behind the yoke.
Always fly the airplane 👍
@jbirdmax Před rokem ⁺⁷
Enjoyed hearing you on NSF Mr. manly.
@antonioloma2327 Před rokem ⁺⁸
If they tested by simulating landing on other spots but not on the selected one, then they didn't tested! This isn't a software bug but a project mangement issue (specifically testing). It's like if you "test" your computer program on your desktop but then deploy it in a server and the faster hardware makes apparent a race condition that borks the system. Testing is expensive, testing is hard, but not testing the actual flight plan is dumb.
@RobertBlair Před rokem
Software engineering does not start at the keyboard, and end when it gets sent to a testing team that is somehow not software engineering.
Engineers have a responsibility to work with testing crew, to validate the test scenarios. The teams failed to run enough variations of realistic input, so inputs outside the limited sets caused a fault.
Specifically several bugs in the system as a whole
1 Spacecraft is unable to land without altimeter inputs. Relying on only inertial guidance cannot be accurate enough to land, due to inherent input noise. If the altimeter signal is discarded more than X seconds before touchdown, error margins cause failure rates approaching 100%
2. Guidance system (apparently) had no way to recover confidence in sensors
3. Guidance system would erroneously flag valid inputs altimeter as a broken sensor.
4. Testing was not done to cover new landing site (and yes, a senior engineer should have balked at the change)
@zrohit Před 10 měsíci
Maybe multiple countries could drop beacons around common landing areas that everyone could use during landing. Not a foolproof but can help.
@mrpocock Před rokem
FYI if the landing location and approach is part of the software spec then a change to the landing site and approach is a change to the software spec and requires a full end-to-end revalidation of the software.
@bobbun9630 Před rokem ⁺¹¹
From this description it sounds like the software worked as intended based on the circumstances. It sounds more like they need to rethink the system level design to have more inputs that can be used to sanity check one another, and perhaps have a means for a one-time instrument glitch (at least in the design interpretation) to be "forgiven" if later sanity checks pass.
@u1zha Před rokem ⁺¹
Yes, that makes sense, and the "forgiving" part is commonly solved by Kalman filtering, which Scott also mentioned. Here it sounds like Ispace overengineered a little bit, overeagerly dropping sensor data on the floor before giving the filter a chance.
@bobblum5973 Před rokem ⁺⁶
This sort of situation is actually a good point to use against those who claim the Apollo moon landings were impossible back then because of the limited computer power available. Having a human (or two!) at the controls made the landings difficult but not impossible. Comparing it to the success
/failure rate of the even earlier Surveyor unmanned landers shows it can be hard to do, and losing a lander is expensive but you can try again.
@kendokaaa Před rokem ⁺²
Interestingly, this is the kind of problem that could also occur if you make a kOS (or kRPC) landing program in KSP if you use instruments and not the game's data
@DJA-BOMB Před rokem
Love that music at the end!
@snwendland Před rokem ⁺⁸
I seem to recall the University of Wyoming having a "Missile Guidance for Dummies" audio description of a guidance system for knowing where the missle is by knowing where it isn't - it seemed pretty rock solid. I have to wonder why this method hasn't been adapted for spacecraft yet.
@frodo9649 Před rokem ⁺⁷
It substracts where it should be, from where it wasn't.
@H-S. Před rokem ⁺²
Exactly. It would be especially helpful in this case; if the lander knew where it isn't, it would not waste fuel by trying to land as if it was just above the surface. :)
@u1zha Před rokem ⁺³
I believe that's just a sentence for lulz, engineers expressing themselves purposefully obtuse. Kalman filters are exactly the "knowing" part, and a closed loop control system is exactly the "subtracting" part.
@frodo9649 Před rokem
@@u1zha czcams.com/video/bZe5J8SVCYQ/video.html This video is full of these sentences, that are close to how control loops work, but not quite, which I find quite funny, especially if you know how it works
@simongeard4824 Před rokem ⁺²
@@u1zha Unfortunately, it also inspires a lot of morons to quote that line constantly on CZcams, perhaps under the mistaken impression that it makes them look smart.
@brentboswell1294 Před rokem ⁺⁵
Didn't Neal Armstrong have to do some on the fly recoding to overcome the 1202 error when the Eagle lunar module was getting overwhelmed with input? (Which was fixed on later Apollo missions through code fixes and turning off an un-needed radar as part of the checklist?). Seems like they could have used an altimeter, but on the moon the altimeter setting is always "00.00" 😅
@robertst-laurent6452 Před 8 měsíci ⁺¹
Mr. Manley, for the whole planet you are our 21st century Eugene Kranz.
At 01:13 your video proves that we now have available:
‘A da Vinci World of Creativity at Home’
The video shows that they used a $170 Airspy R2 receiver (with a $620 LNC + antenna) with the mind blowing power of the software available for the Airspy, so for less than $900 USD you can have the same setup at home !
Your use of the Kerbal simulator, to help us better understand the sequence of events, is of jaw dropping beauty.
@kenhelmers2603 Před rokem
Thanks Scott :)
@rnilu86 Před rokem ⁺³
They suddenly changed the landing site and didn't test the simulation? wow. What can I say. Always test your software before production. Hard lesson learned.
@SashaNaronin Před rokem ⁺⁴
Sounds like they tightened the Mahalanobis check magrins in Kalman filter. It's the check that real measurement at each time step, expected measurement and estimated measurement errors are all in accordance with each other. And you usually hardcode the acceptable marigins for that, i.e real-expected measurements must < 4 times expected error. If it isn't the measurement is bad (e.g. accelerometer physically fell off the mounting). Unfortunately the margins are often set too tight.
It could've been another problem tho, related to algorithms similar to simulataneous localization and mapping, but I don't have enough experience with them to judge.
@dsewtz3139 Před rokem ⁺²
Good point. However, I believe (hope?) Scott was only using it as a a well known example and they don't really use a "naive"/textbook implementation of either Kalman filters or Mahalanobis distance... 🤷🏻 I mean, the moons surface is NOT a hidden probabilistic distribution - so at least four dimensions could use euclidean distance, verified against the same using surface maps - not some exhaustive search in simulation training data for planned approach vectors 🧐 ...if they need help implementing that - I actually wanted to visit a friend in Japan for some time now 🤣 (just kidding, compared to them I also don't have enough experience - but I'm 100% sure, if the control-loop "broke", it was more indepth than due to a wrong geometry of the probability space)
@synergy021 Před rokem ⁺²
Is there a requirement that all titles must be clickbait and include one of these words: Unbelievable, Shocking, Terrifying? No the reason wasn't unbelievable, it's actually quite believable and simply just an oversight.
@scottmanley Před rokem ⁺¹
It’s unbelievable because the navigation software stopped believing the altimeter.
@synergy021 Před rokem
@@scottmanley Lol, good save. Wasn't really directed at your video per say, just that's the CZcams titling by youtubers trend these days. Although yours is actually technically accurate hah 🙂
@kiereluurs1243 Před rokem
What was the 'REAL TRUTH?!!'
@dpratt2000 Před rokem
Why didn't it just use Google Crater View? Seriously though, it does seem like that would be a useful product for something so close as the moon. All of those craters have been so well mapped out that this information could have been pre-installed on the lander as background info for any possible final landing trajectory. Thanks for another great video Mr. Manley. Much appreciate your content.

Další v pořadí

Automatické přehrávání

Can The Human Body Handle Rotating Artificial Gravity?