Everything Wrong with Ump Scorecards

Sdílet
Vložit
  • čas přidán 11. 09. 2023
  • Twitter - / dead_baseball
  • Hry

Komentáře • 262

  • @tiger4thewin
    @tiger4thewin Před 8 měsíci +336

    I always skipped overall consistency since the EUZ never looked completely right to me. Thank you for being able to articulate what seemed missing and suggesting some interesting changes!

  • @joshknapp7455
    @joshknapp7455 Před 8 měsíci +118

    I think having a 'total distance missed' number is a good idea but bad math. I think having an average distance missed would show a better picture because a total distance for one game vs another could vary wildly because of the difference in number of pitches thrown

    • @nmappraiser9926
      @nmappraiser9926 Před 8 měsíci +8

      Except that total distance increases with the number of missed calls. Average could be skewed by one particularly bad call. I think for a full picture you'd want both numbers included.

    • @rick809
      @rick809 Před 8 měsíci +14

      ​@@nmappraiser9926 yeah that's the issue. Total distance increases with the number of missed calls, so an ump who misses on a lot of really close calls might have a higher total distance missed than an ump who does a bad job with an easier game, even though the former probably had the better game overall. Normalizing it based on the average makes it a better metric to compare two umps directly without having to look at and understand the rest of the information on the card. Also, total distance missed could also increase drastically by 1 really bad call, so I'm not sure what your point is there. Regardless, to get rid of that issue, you could always just remove outliers.
      The problem I have with average distance missed is that it's just a number and the viewer has to look at a lot of scorecards to get a feel for what the typical range is for that value and what "good" and "bad" look like. Everyone understands percentages, and having the average there and the expected average and deviation from the expected average makes it very clear for someone looking at an ump scorecard for the first time that this ump did a good or bad job.

    • @user-se4rr3rs3m
      @user-se4rr3rs3m Před 8 měsíci +2

      Maybe some kind of weighted average depending on how close the missed call was. 0.1 inch missed call would have a low weight compared to an 2 inches missed call. I think the main idea is to evaluate the ump performance depending of how close their missed called were.

    • @SalvoSLR
      @SalvoSLR Před 8 měsíci

      @@nmappraiser9926 agreed. Average would be missing the intent of showing a distance missed

    • @letsmakeit110
      @letsmakeit110 Před 8 měsíci +1

      @@user-se4rr3rs3m now that sounds like relative accuracy.

  • @hardyworld
    @hardyworld Před 8 měsíci +205

    Great video. I agree with all your suggestions except replacing relative accuracy. I agree with you that it is an important data point and it needs to remain as part of the scorecard. If you want to ADD the total distance missed, that's fine, but I'm not convinced it's a meaningful addition. Perhaps average distance missed on all missed balls and average distance missed on all missed strikes could be better than the total distance missed?

    • @finnb2318
      @finnb2318 Před 8 měsíci +19

      I think the wording of the poll was unfortunate (~Which number do you look at first). Even I might have answered Total Accuracy, despite considering Relative Accuracy more important. If instead he had asked a number of people to rank them from most to least important or indicative of an umpire's performance (say four points for first, one for fourth) he might find that he would have gotten a very different result.
      It's also a relatively new addition. Even with the internet moving as fast as it does, and him speaking to a digital audience, I would wait some more before, in effect, claiming that the measurement has failed as it's not understood.

    • @mwal223
      @mwal223 Před 8 měsíci +1

      ​@@finnb2318I absolutely agree with you there. Total accuracy is usually the first metric I look at (because attention is drawn to it on the scorecard) but I consider expected/relative accuracy to be at least as important, if not more

    • @junct
      @junct Před 8 měsíci +2

      in statistics, mean squared error is a common indicator of how close a thing to the model. so rather than average distance missed, you square the distance before averaging it.
      the idea is that you punish big misses more heavily and let small misses be left off more lightly.
      so mean squared error would probably be a more meaningful stat to look for

  • @dominicpancella3012
    @dominicpancella3012 Před 8 měsíci +114

    This is a great video! Umpire Scorecards is still a project in its relative infancy, so it's always good to get this kind of constructive feedback from people who regularly use the site. As a data person myself, I've got a couple comments on the way the scorecards are presented. All your concerns are very well founded and well-explained.
    1) A kernel density map is definitely not the right tool for the job as far as the EUZ is concerned, for all the reasons you mentioned. What would be better is your distance idea, where the weight of how a bad call affects the shape of the zone is determined by the square of the distance from the edge of the zone. But that doesn't tell the whole story either: the reason KD is used is to potentially identify common areas of misses, which as you say is more useful for much larger samples that would allow you to clearly see, for example, that a particular ump tends to call balls just off the outside corner strikes more than the average ump (or more than the True Zone). For single games that usually isn't feasible though, and I don't have a terribly good answer as to the best solution for this problem other than "the algorithm is a work in progress."
    2) I like your Total Distance Missed idea, and I think it could benefit from the inclusion of some other metric like average distance per miss or average distance missed per call. A more elegant solution could even include what SABRmetricians have grown accustomed to, the + stat where 100 is average. I also agree with your sentiment that this is fundamentally the most important part of the scorecard-how does this umpire compare to other human umpires calling the same game? How does he compare to an electronic umpire? They ought to keep the "correct calls above expected" line in there though, that's insight with a tremendous punch.
    3) I have an idea brewing as far as favor is concerned, and it wouldn't even be that difficult a result to achieve. Replace the net runs for with a slightly more complex calculation where you take all the missed calls that benefited each team, multiply the count by the median in each case, and subtract one expected run value from the other. This would strongly decrease the relative strength of any outliers in either direction. You could also step it up a notch and, instead of using the median, weight each call based on the distance missed such that more egregious misses are counted as such and close calls don't have as much of an impact. But in order not to skew the favor metric too much by introducing different units, it might be better to simply apply those weights to the impactful calls list, for which run expectancy changes are not shown anyway (but perhaps should be, in a simple way as above, like "+0.19 runs for TEX" or something).
    4) How do we measure consistency? US measures it based on how many correct and incorrect calls fall within their estimation of what the umpire's zone is. And in theory this makes sense, but in practice it turns out kind of weird and janky. Another way to look at consistency would be to look at other games this umpire has called and determine how many more makes/misses he would have than usual and/or how much farther/closer his misses are on average to his typical missed calls. The primary reason to do this is that players and coaches would be looking primarily at an ump's overall tendencies rather than his in-game tendencies unless it was clear from the get-go his zone was wider than usual or that he was calling one corner differently from another or something.
    5) Where's the distillation of all these numbers into an overall umpire rating? EA has been doing this kind of thing for decades, and it would ostensibly be pretty simple to add an overall grade to the top of a game scorecard to let you know how this umpire's performance actually was without you having to dig into or combine any of the other numbers yourself. Even make it a letter grade so it's simpler, e.g. for XYZ game Livensparger earns a B+ and for ABC other game Wendelstedt earns a C-. Or whatever. All of the other information is useful, but people with short attention spans want either a single number or scalable identifier that they can use to tell everyone their least favorite umpire is horrible and should be fired and incarcerated.

    • @dash4800
      @dash4800 Před 8 měsíci +3

      But you would think that the brilliant statisticians coming up with this would immediately see how flawed their system is. It really looks like a system that they put through no actual real world testing before trotting it out there. Like many baseball stats, they came up with a formula that gave them a number and never stopped to think if that number was either helpful or actually reflective of whats happening.

    • @panner11
      @panner11 Před 8 měsíci +7

      @@dash4800 Things like this are always a work in progress. Models that simulate human judgements are inherently flawed and are built up over time. In this case, I don't think the people who wrote the paper are the same people that run Ump Scorecard. So you shouldn't think of it as the people who came up with it didn't know the flaws. They are well aware, of course. It's possible Ump Scorecard just went with it, there's no reason to believe the creators had much to with it being used in production.

    • @MrTheboffin
      @MrTheboffin Před 8 měsíci +1

      @@dash4800 I suspect that the probably did but couldn't find a better one. After all the one proposed in the video has flaws as well since it provides no weight to correct calls which based on the reste of the card represent the vast majority of the data.

    • @laartwork
      @laartwork Před 8 měsíci

      It's a project that will end soon. Second year of my AAA team here using the robo ump and no one notices. Just imagine 100% accuracy every game.

    • @jamesknapp64
      @jamesknapp64 Před 14 dny

      As a mathematician this is a great comment.

  • @Uncle_Benny
    @Uncle_Benny Před 8 měsíci +38

    I love you're idea for total inches missed!
    I'd recommend also adding some sort of average or "Inches per pitch". Because again, if an ump has a game with 30 close calls, and misses by 1 inch on just 10 of those, he'd still have a total of 10 inches missed, whereas if an ump didn't have many close calls, but missed 2 or 3 by 3 inches, he's only at 6" or 9" total, even though he actually had a worse game

  • @IsYouAWizard
    @IsYouAWizard Před 8 měsíci +32

    The Total Distance Missed is cool, definitely think if it were to be put into practice it would need to be divided by the number of pitches that the ump had to make a call on (no swing).

    • @1868JG
      @1868JG Před 8 měsíci +3

      Or a per 100 pitches version.

  • @panner11
    @panner11 Před 8 měsíci +17

    I'm very impressed with this video. EUZ sparks a lot of confusion and I was skeptical if you'd present the algorithm correctly, but you nailed it. The message being that it's not the EUZ is bad, it just isn't very suited for use with small sample sizes like a single game. Over the course of a career, you can get a good sense of an ump's tendencies and whether they are consistent or not. But when there's a lack of data in certain areas of the zone, the zone becomes a crapshoot. Thanks for presenting the information accurately and not hamming up the issues.
    As for a better model, your progressive deformation model is pretty good. Probably they should just aggregate a bunch of models including these like most modern models do. But people also like transparency especially in sports so I understand them wanting to stick with it.

  • @kyokyo718
    @kyokyo718 Před 8 měsíci +8

    Even if Ump scorecards never changes, you demonstrating how to read the data presented in its current form is extremely valuable information. Kudos to you for managing both sides of analysis.

  • @Mason.Becker
    @Mason.Becker Před 8 měsíci +9

    Amazing video. The only thing I sometimes wish was on the scorecards is 2 separate zones, one for right handed hitters and one for left handed. Just to see the differences, if any, for things like a pitch inside to a righty that is called a strike but outside to a lefty called a ball

  • @rowlofobro2
    @rowlofobro2 Před 8 měsíci +4

    Overall favor is one of the most misleading stats since its only based off of expected runs and misses a lot of context. I remember a Mariners game earlier this year where there was 2 on 2 out and Julio hitting with a 2/1 count. The ump misses a call and calls a ball a strike making it 2/2. The run value went towards that other team because a 3/1 count is much better as a hitter than a 2/2. The next pitch the ump also misses but this time he calls a strike a ball and the expected explodes in Seattle's favor because expected run value thinks that should have been strike 3 and inning over and all of a sudden seattle gets like a full run in value. Had the pitches played out in the opposite order and a strike was call a ball to make it 3/1 but then a ball was called a strike, the favor would have heavily shifted towards the other team for avoiding a walk when in reality the end result is more a less the same and the ump messed up bad for both teams.

  • @brp5121
    @brp5121 Před 8 měsíci +1

    This is one of the most helpful videos I've ever seen. Not only did you make great points, I also learned a ton about Umpire Scorecarda.

  • @Liwet.
    @Liwet. Před 8 měsíci +11

    For Total Distance Missed, you should instead square each individual distance, find the average of all these values, and then take the square root of that average. You aren't going to have umps that exclusively have 'good' misses and umps that exclusively have 'bad' misses; you'll be comparing umps that have a mixture of the two. Squaring will make the bad misses harder to overcome in the average. Otherwise one bad miss will look the same as 3 good misses.

    • @karolrafalski3419
      @karolrafalski3419 Před 8 měsíci +2

      Was just about to comment that root meas square of distances would probably be a better metric. That combined with total amount (and type) of missed calls would paint a decent picture of consistency with the true zone.

  • @jeffroitero4266
    @jeffroitero4266 Před 8 měsíci +2

    Dude, you're amazing. I've wanted to do a deep dive into this ump zone thing... but I haven't had the time, which is a slightly artful way of saying that I didn't actually want to do it that badly. And I'm rewarded for my laziness by the fact that you did the work for me. Only so much better than I would have. You rock. And you know all this, but comments are good for youtube, so... comment comment comment.

  • @mrmikejsteele
    @mrmikejsteele Před 8 měsíci +4

    I’m not sure I’ve ever learned more about something I’m familiar with in such a short time. This was clear, fair, and interesting. I’ll never read an Ump Scorecard the same way again. Thank you!

  • @jameskingsbery3644
    @jameskingsbery3644 Před 8 měsíci +1

    Some math thoughts:
    1. When dealing with rare events, such as happens for calls high in the strike zone in the KDE used for EUZ, one approach is to use a Bayesian method. The simplified-for-CZcams-comments version is: they should add some "fake" pitches that are correctly classified (or, classified as the "average" ump would call) just for the purposes of the KDE. As there are more pitches in a part of the zone, the actual pitches overwhelm the fake data. If there aren't a lot of pitches in a part of the zone, the KDE can anchor on the added data for what the ump probably would have done.
    2. The total distance missed is an interesting idea. I don't know how it would be made simple to understand but summing the squares (that is, the distance missed of each pitch times itself, all summed together) of the errors would highlight bad calls more. If you have 8 pitches that miss by a quarter of an inch and a pitch that misses by two inches, the total distance missed would be the same (2 inches), but the one bad call seems worse than a bunch of close ones. By summing the squares, the two pitchers would have 0.5 (for a lot of small misses) vs. 4 (for the one big miss).
    In any case, great video!

  • @kadensadich1311
    @kadensadich1311 Před 8 měsíci +6

    Incredible video, well put together. Always get happy when I see another video put out by you. Your editing is so good and you present everything so well and use so many great sources and back up everything. You don't just give one side of the argument, you give both. So yeah, thanks for this video and keep up the great work :D

  • @taylorb5039
    @taylorb5039 Před 8 měsíci +1

    Particularly good video topic, and a very well executed video. I hope it's not condescending to say I'm seeing your growth as a creator! It will pay off. Thanks for making vids

  • @harrisonkarp7406
    @harrisonkarp7406 Před 8 měsíci +2

    One of the smartest CZcams videos ever fantastic job

  • @paulframe85
    @paulframe85 Před 8 měsíci

    I love the intro sequence on your videos. It's just so much fun!

  • @Busanjingu.popularrapper
    @Busanjingu.popularrapper Před 8 měsíci

    Wow. This is one of the most impressive video ever. Mentioning issues and coming up with clear solutions are such a rare thing in youtube.

  • @inline885
    @inline885 Před 8 měsíci

    Wow this video totally changed the way I look at this. Great work!

  • @donelec5955
    @donelec5955 Před 8 měsíci +2

    Finally someone was able to explain the ump consistency to me

  • @aaronlee9784
    @aaronlee9784 Před 8 měsíci

    Glad to see an analysis/scripted video! You're easily one of my favorite if not my #1 baseball creator

  • @hardhatlunchpal
    @hardhatlunchpal Před 8 měsíci +23

    Wouldn't it be a better stat with overall distance if you made it per missed call. Like you take the overall distance divided by the number of pitches used

    • @fellow456
      @fellow456 Před 8 měsíci +9

      Yea, if you just have total distance missed, then an ump could get shafted just by virtue of having to call more pitches over the course of a game.

    • @smoceany9478
      @smoceany9478 Před 8 měsíci +1

      nah i think per pitch called is better, per missed pitch would make an ump who calls 1 pitch 3 inches off wrong and thats it is worse than an ump missing 10 pitches but average 2.9 inches per ball missed

    • @anthonyregier9649
      @anthonyregier9649 Před 8 měsíci

      @@smoceany9478you could have every correct strike or ball count as 0. Would be a low number but you could multiply it by 100 inches or something for a grade ranking.

    • @smoceany9478
      @smoceany9478 Před 8 měsíci

      @@anthonyregier9649 yea thats what im saying, add the inches missed for every incorrect pitch and divide it by every pitch called

  • @62Trevor2199
    @62Trevor2199 Před 8 měsíci +1

    Like the refreshed intro! Great vid!

  • @IKER0718
    @IKER0718 Před 8 měsíci

    damn the intro alone deserve a lot more subscribers!!!
    love that a lot, now im going to enjoy the video!

  • @raschticky
    @raschticky Před 8 měsíci +1

    I have never agreed more with a quote than I have with “Joe West is the Yuniesky Betancourt of umpires”

  • @terri2rial
    @terri2rial Před 8 měsíci

    dont know much about baseball, i’m an *extremely* casual fan, but this video was so eloquent and well researched it makes me wanna become a baseball nerd. well done!!

  • @EDF1919
    @EDF1919 Před 8 měsíci +2

    Always a good day when BND uploads.

  • @Sean-uh6te
    @Sean-uh6te Před 8 měsíci +9

    The strike zone box on TV is not the strike zone. It rarely gets the top of the zone right. Its just a TV graphic for us at home. Like the fist down yellow line you see in football. Its not perfect. This is why MLB has asked teams to stop looking at the reply in the dugout on their ipads. It’s needlessly piss them off thinking its the true strike zone. If the broadcasters were to remind people of this on air, like the football broadcasters do, it would clear up a lot of confusion and negativity.

    • @1uckedout
      @1uckedout Před 8 měsíci +1

      The problem of viewers thinking the box is perfectly accurate is so bad even a baseball youtuber like this believes it's always right. He even called it the "true zone" in this video lol

    • @CharlesFreck
      @CharlesFreck Před 8 měsíci +3

      I hate that people think it's real. It's usually about a full ball short high, meaning high strikes look a mile out of the zone when they're easily completely within the actual zone. People also think the zone is based on where the batter is when they've swung/the ball passes the plate, but it's not. It's based on their stance when in the box and ready i.e. the top of the zone is much, much higher then on TV. People need to remember the zone changes FOR EACH BATTER. If you're tall, you're going to have a bigger strike zone then a tiny guy, and the tiny guys zone will be lower then the tall guy. The zone is not fixed. There's no true zone. It's situational for each individual that steps up too the plate.

    • @Sean-uh6te
      @Sean-uh6te Před 8 měsíci

      @@CharlesFreck you said it better than me. I’m not a conspiracy guy but it feels like one to swing favor towards a robo ump. Probably not, but it feels like things are going in that direction and its made easier when the audience is shown a false strike zone.

  • @LeaminOwnsAll
    @LeaminOwnsAll Před 8 měsíci

    Great breakdown on the umpire scorecard! Learned a lot watching this!

  • @jeffroitero4266
    @jeffroitero4266 Před 8 měsíci +3

    Angel Hernandez will be the first ump with a total distance missed that exceeds his own height.

    • @SvanMagic
      @SvanMagic Před 8 měsíci

      CB Buckner may beat that total.

  • @nickbuchholz6841
    @nickbuchholz6841 Před 8 měsíci

    This was really good, great presentation and points. 10/10

  • @8stormy5
    @8stormy5 Před 8 měsíci +4

    Two criticisms.
    First, EUZ is a tool to determine whether misses are arbitrary or follow a pattern. It's obviously not going to work when there are no misses.
    Second, overcomplicating the model of fit to get "everything right" risks overfitting the model to the data. The model would simply collapse into a simple and literal description of reality, which means it can't meaningfully predict at all whether a missed call was missed arbitrarily or due to bias.

    • @panner11
      @panner11 Před 8 měsíci +1

      His criticism of EUZ was pretty spot on though, it's extremely volatile when there is a lack of data. In a single game sample size that happens often. Just on inspection we can see how wonky the EUZs are for single games.
      You are right about the progressive deformation model he suggests, it is prone to overfitting and assumes the zone is just the real zone. But aggregate it with the other models and normalize it and I'm sure it would be fine.

  • @Falllll
    @Falllll Před 8 měsíci +2

    Love this video. I've always been a bit suspicious of certain things on the scorecards, but have never taken the time to dive into how exactly some of those things are calculated, so I appreciate this being explained here.

  • @NickyQuesne13
    @NickyQuesne13 Před 8 měsíci +1

    Great vid. Gotta say, the dig at Cinemasins was the cherry on top...

    • @panner11
      @panner11 Před 8 měsíci +1

      I never understood that channel. Only saw a few videos, but never saw a single sin that was legitimate criticism like a review would do. Seemed like just 100 random observations about the movie labeled as sins. I assume it's satire but it's not even funny so idk.

  • @komiteunofficialaccount9224
    @komiteunofficialaccount9224 Před 8 měsíci +1

    Great explanation of KDE, and how it went wrong. I miss *some* math classes.

  • @a-a-ron3542
    @a-a-ron3542 Před 8 měsíci +1

    One thing I keep thinking about is that pitchers are consistently throwing harder than ever with greater movement than ever. It's literally harder to call balls and strikes than ever. Furthermore, part of that velo consistency is the increased use of bullpens, which means they are going to see a greater range of pitchers with more diverse release points and styles, so it's harder to get into a groove with a single pitcher. The Phil Cuzzi's and the Angel Hernandez's are always going to be terrible, but I don't know that umps are worse; I genuinely think it's a harder job than it used to be.
    Edit: we also didn't have the luxury of the little box 10 years ago. We would just say things like, "Oh, it was two strikes, he should have been protecting."

  • @andrewlauer4030
    @andrewlauer4030 Před 8 měsíci +3

    You don't suggest a fix for the Favor category, but I think it's actually fairly easy. Since they already have a metric for expected accuracy on certain calls, they could apply that as a weighting factor to the win probability added. So that way a call that is essentially a toss up can't swing the favor too much.
    Something like WPA*[2*(Expected Accuracy - 0.5)]. That way if the expected accuracy on a call is only 55%, then only 10% of the win probability added by the call would count towards the umpires bias for the game. It wouldn't be a perfect fix, but it would go towards addressing the problem you are talking about.

  • @cosmoid
    @cosmoid Před 8 měsíci

    I never really understood the EUZ in the first place, but I love getting to see the accuracy of ball/strike calls.

  • @JDawg12329
    @JDawg12329 Před 8 měsíci +4

    What if consistently took the average number of differing calls in an area. So you could split the zone up into four quadrants, then have a 3 inch box in each corner just outside the zone and then finally 4 more boxes well outside in the corners, So they have 12 zones that they can measure whether the calls were all the same within that zone. If you have to make 8 calls in the bottom right quadrant of the zone and you call 7 strikes and 1 ball, you would have a consistency rate of 0.88 for that particular section. Then you average out the zones to get an overall consistency.

    • @hb-robo
      @hb-robo Před 8 měsíci

      This would be amazing, sort of like an accuracy heatmap. I would probably suggest the more common 3x3 grid inside the zone, which would lead to 16 cells outside the zone (3 per side + 4 corners) for a total of 25. Just to get more granular, since we already know the precise XY coordinates of every pitch

  • @Jay-gb9pi
    @Jay-gb9pi Před 8 měsíci +1

    I always wondered if there was a way to change the bottom and top of the true zone depending on the height of the batter... because the true zone does not change up or down I tend to look at the inside/outside calls 1st and w/t more scrutiny than missed calls at the top/bottom of the zone since you don't know if the batter was 5'6 or 6'6...and knowing how different batters are called would be really interesting... getting to see how the different zones styles altuve and judge are working with would Cool

  • @andrewszaflarski5379
    @andrewszaflarski5379 Před 8 měsíci

    I guess a question that I also had re: Ump Score Cards is this: Is one side of the shown strike zone considered to be "inside" and the other "outside" regardless of the left/right handedness of the batter, or is one side inside for right handed batters and outside for left handed batters and visa versa? I'm pretty sure that its the former, but I don't know for certain and hadn't found the explanation.

  • @Dockie27
    @Dockie27 Před 8 měsíci

    Great video, but I got completely distracted on an hour long rabbit hole (hour deep?) looking at the Galle Crater, Mars, and asteroid impacts. Thanks for the new space stuff to learn about!

  • @sirgermaine
    @sirgermaine Před 8 měsíci

    If we already have inches off, the simplest piece of context you could put on impactful missed calls is to put (amount of impact) and (distance from correct) along with the context for the call. There is already room, you just throw it right below as +1.25 SEA / .34 inches out or +1.2 NYY / 1.8 inches in

  • @sealeo5772
    @sealeo5772 Před 8 měsíci

    When I heard kernel density estimation my ears perked up and I got a bit excited that finally something I know about from using GIS software and learning about mapping statistics in school is relevant to nerdy baseball stats.

  • @josephtaylor5077
    @josephtaylor5077 Před 8 měsíci

    Never heard of Umpire Strike Zones. I’ll have too look them up. Great analysis of the numbers.

  • @blakestoudt2131
    @blakestoudt2131 Před 8 měsíci

    Excellent breakdown. The EUZ always confused me as well. I can tell how much you care about drawing the right conclusions from this data that is relatively new to the fans.

  • @mclew1234
    @mclew1234 Před 8 měsíci

    I agree that as data builds having an EUZ of each umpire based on their previous calls would be great. I've always said as a ball player while I having the true zone would be great humans are always going to be slightly off, i'd much prefer a consistent zone that's consistently wrong in a certain way than a zone that's all over the shop. If I know as a player that an ump will call a ball out a strike but a ball on the inside corner will get given a ball I can adjust my approach appropriately & in the pro's players can do this pre game by knowing we have ump X today so we are gonna have to swing at a pitch just outside but I don't have to go after balls on the inside corner etc.

  • @darkbreaker9767
    @darkbreaker9767 Před 8 měsíci

    I have an idea to fix the favor metric. Multiply the favor swing by the distance from the zone, or set blocks of distance as different favor scores. Especially for high-swing situations like bases loaded full count.

  • @jcorn12
    @jcorn12 Před 8 měsíci +8

    I'm struggling to see how total distance missed is any more intuitive than relative accuracy. Otherwise great video

    • @JGPRSNJ
      @JGPRSNJ Před 8 měsíci +1

      Relative accuracy is much more straight to the point.
      It would also need to be a distance missed per missed call or something because it would be very difficult to compare games on just a flat distance number

    • @stevedomique9278
      @stevedomique9278 Před 8 měsíci

      A thousand percent, agree with every other point in the video. Relative accuracy seems like a great stat, it's just misunderstood and underemphasized in the umpire scorecard.

  • @ColumbiaSCRealEstate
    @ColumbiaSCRealEstate Před 8 měsíci

    Total baseball geekdom... I love it!

  • @itsokthen
    @itsokthen Před 8 měsíci

    A change to help show the favor better is to have the impactful calls section show how much that individual call moved was worth.
    If an ump has a +2 favor but i see one call was +1.7 I would be able to understand it better

  • @user-dg9ki6vo6r
    @user-dg9ki6vo6r Před 7 měsíci

    I really like your suggestions.
    Can we put a "top 5 misses by distance" or "top 3..."? if we need to fill out the space with additional relevant things as well? It would be nice to know how much of the total distance missed was from the individual worst calls.

  • @capraagricola
    @capraagricola Před 8 měsíci

    It's actually fairy least to implement your idea for EUZ -- you can initialize the states of the EUZ to be the exact strike zone and instead of comprehensively solving for the EUZ at the end of the game you can iteratively solve it with each pitch as input.

  • @matthiasm4299
    @matthiasm4299 Před 8 měsíci

    I think support vector machines (SVM) might be used to construct a better strike zone estimate. It should not have the problem of high balls / low strikes biasing the zone, since only the points close / over the boundary are used to construct it. Therefore, consistency should work fine.
    However, it would still have to be somehow combined with the theoretical strike zone for a visual representation that is fair to the umpire.

  • @madaman6556
    @madaman6556 Před 8 měsíci

    Total distance missed would be great. If umpscorecards adds it, I would think they should compliment it with average distance missed (total distance missed / # calls missed). Plus something like xDistance Missed where they calculate the average size of zone for MLB players (because of different heights and such) and determine what the missed call distance would be for the average batter. Great video!

  • @simonthegreat527
    @simonthegreat527 Před 8 měsíci +5

    Is it just me, or have the batters purposefully made it harder and harder on these umpires as they have (for the most part) gotten better and better? When I was young, a hitter would never, ever, never, ever take a pitch right on the corner with two strikes while expecting the ump to call a ball. It was called protecting with 2 strikes. Today, hitters rarely seem to be able to protect with 2 strikes and instead use eagle eyes to either milk a walk or hit a mistake. In my short time playing baseball, basketball, football, or any form of competition the coaches always said "Do not let the referees or umpires decide the game." You swing the bat if you have two strikes and the pitch is potentially a strike, don't stand there and get mad that a borderline call goes one way or the other.

    • @1uckedout
      @1uckedout Před 8 měsíci +3

      That's a product of the three true outcome way of playing. They're looking to hit it over the fence or not hit it at all. They'll take borderline pitches and hope for a call to go their way on 2 stikes because they don't want to offer at it if they don't have the potential of extra bases. I think it's a great approach when there's less than two strikes but I hate watching hitters take close pitches on 3 strikes.

    • @CharlesFreck
      @CharlesFreck Před 8 měsíci +2

      @@1uckedout Nailed it. Strike, Walk or Homer (realistically, extra base hit). The idea is that it, on average, forces pitchers to throw more, and thus, make more mistakes, opening up more chances later. The Japanese play more like Simon is talking about, training to foul off anything they don't want to hit. But the problem is, Major League pitchers are the best in the world, and it's significantly harder to foul a ball off. You're risking a miss, ground ball or fly out everytime you swing, you just can't expect to beat a pitcher every pitch. So anything that isn't exactly what you're looking for, you leave, and hope they threw a ball.

  • @coreygroh654
    @coreygroh654 Před 8 měsíci

    Love the addition of JoRam's knockout in the intro

  • @tonychen3628
    @tonychen3628 Před 8 měsíci

    😂Intro now includes JRam knocking out TA.
    That should worth 1 million likes.

  • @matrixphijr
    @matrixphijr Před 8 měsíci +4

    Maybe changing the name of ‘Favor’ to something else would help, because that word definitely implies a purposeful act, which is probably why people associate it with rigging the game.

    • @hb-robo
      @hb-robo Před 8 měsíci

      agreed, something that indicates less intention would be better, maybe "Net Benefit [LAD +04]"

  • @josecarrera6519
    @josecarrera6519 Před 8 měsíci

    actually very informative and will no longer read ump scorecards like a noob, except if the favor goes against my team!

  • @josephalvarez5315
    @josephalvarez5315 Před 8 měsíci +1

    Commenting to boost algorithm. This is a great video

  • @AliceYobby
    @AliceYobby Před 8 měsíci

    Great video, thank you so much!

  • @user-se4rr3rs3m
    @user-se4rr3rs3m Před 8 měsíci

    Great video! Well explained. I enjoy that you do criticize in a respectful and constructive manner.

  • @saccharide
    @saccharide Před 8 měsíci

    There's something called RMS (root mean squared) that can be considered as well. It is a weighted average with the squared - so something way off would be counted more. Mathematically, it's the average of the square of the error and then the average is square rooted to bring it back to the right dimensions

  • @charlie-wf3bn
    @charlie-wf3bn Před 8 měsíci

    its so interesting, because the formula they use for overall consistency makes sense and is good statistics, even though it doesn't map onto the tendencies of baseball players well. such a niche bit of statistics for this game.

    • @panner11
      @panner11 Před 8 měsíci

      It's just too volatile for a small sample size like a single game. It maps the tendencies of baseball players well with larger samples.

  • @darthjaxrevan
    @darthjaxrevan Před 8 měsíci

    In reference to the three biggest favor calls, could just add how much each favored a team.

  • @a6340
    @a6340 Před 8 měsíci +1

    Cant sneak that updated intro by us... rip TA7

  • @AggieRinse
    @AggieRinse Před 8 měsíci

    Maybe one could use your distance metric as a scaling factor for the favor resulting from missed calls. I can see the reasoning for releasing the measurements as is, but that could be one idea to refine the overall favor concept.

  • @Dave__AC
    @Dave__AC Před 8 měsíci +1

    Total distance missed is a cool idea but I wonder how much it is affected by the number of calls eg if you make 100 calls and they all miss by 0.1 that's the same as 25 calls that miss by 0.4 even though that game really should be significantly "worse" imo. I guess it would depend on how consistent the number of calls per game is, if it's relatively stable then that's fine but if not then it might make sense to say average distance missed and just add a denominator of the total number of calls.

  • @henryst5
    @henryst5 Před 8 měsíci

    What if they had more data for the EUZ by counting all pitches from that umpire’s last 5 games? Or 10 games, whatever gives enough data points?

  • @BeefPapa
    @BeefPapa Před 8 měsíci +1

    What I do is look for the names Hernandez, Diaz or Bucknor and just laugh my ass off.

  • @crschoop
    @crschoop Před 8 měsíci

    A way to normalize the total distance missed, as stated by others, would be useful. And, again as stated by others, Ump Scorecards does not differentiate between inside and outside pitches. The perception is that umpires will give the outside edge to pitchers and call the inside edge for hitters. They could make a slightly different symbol for missed calls to show if it was on the inside or outside for that at-bat.

  • @kato547
    @kato547 Před 8 měsíci

    Good Video, Good Ideas, Just Good!

  • @davrosthecreator1660
    @davrosthecreator1660 Před 8 měsíci

    9:15 I’ve been saying this for ages. This is the best way to hold terrible umps accountable. Decide what percentage of umps would make the right call based on certain pitches. If the ump makes the right call, award them points based on that percentage. Easy calls get hardly any points, tough calls get more points. And then vice versa for blown calls. If it’s a strike right down the middle called a ball, they are deducted a bunch. But if it dots the corner and it’s called a ball, they won’t be punished as much.

  • @MRConvex8
    @MRConvex8 Před 8 měsíci

    Distance Missed is a nice way to present the information contained in Relative Accuracy, but it requires normalization. In your example you conveniently use two games with the same number of pitches. The value of distance missed is lost if we're using it to compare games with vastly different pitch counts.

  • @G.Aaron.Fisher
    @G.Aaron.Fisher Před 8 měsíci

    Honestly, you could combine your "inches missed" idea with the Overall Favor metric to create a new stat measured in inch-runs.

  • @sawmill035
    @sawmill035 Před 8 měsíci +1

    Excellent video, however, I must say that total distance missed is a very bad idea. Lets take an example
    Game 1: Home team wins 15-13. 400 pitches were thrown in the game, and the umpire had to make 200 calls. He missed 10 calls by an average of 1 inch each, for a total distance missed of 10 inches.
    Game 2: Home team wins 1-0. 200 pitches were thrown in the game with 100 calls made by the ump. That umpire also missed 10 calls by an average of 1 inch each, for a total distance missed of 10 inches.
    You see the problem here?
    The solution is "average distance missed per call". In game 1, it would be 0.05 inches. In game 2, it would be 0.1 inches. So, the umpire in game 1 was actually better, as we expected.
    However, imo this is more confusing than relative accuracy, which very clearly indicates the expected accuracy a normal umpire would have given the pitch distance from the zone. I think a percentage from 0-100% is much easier to read than something like 0.00894 inches/call.

  • @Zyrchin
    @Zyrchin Před 8 měsíci

    Look, it's a great video and all but *hot damn* so I love that into - down goes Anderson 👌🏻

  • @CMCFLYYY
    @CMCFLYYY Před 8 měsíci

    One thing to keep in mind. You rightfully brought up how their "Kernel Density Estimation" can create wonky estimated zones, because the algorithm they're using to do the estimation has issues with small sample sizes. So I would keep that in mind when leaning on Relative Accuracy so hard - if the algorithm for KED estimations can produce such wonky results, how do we know the algorithm they use for Expected Accuracy isn't similarly flawed and similarly produces wonky results.
    Honestly I think the best metric to use would be the Total Distance Missed you mentioned in the video.
    What we want to know is...how often did this ump miss calls and by how much did he miss them. That's it. Just looking at Accuracy can be misleading because it doesn't factor in by how much he missed on those misses. And Relative Accuracy (based on whatever algorithm they use) is flawed in the same way if it's all you look at, because it too ignores how little or egregious the misses are. Both treat all misses the same.
    So IMO, Accuracy is good but I think Total Distance Missed is just as if not more important. And break it down by balls and strikes for each team. So you could say the ump missed 4 balls for this team by 2 inches, and 12 strikes by 18 inches. But for the other team he only missed 1 ball by an inch and 2 strikes by 3 inches etc.
    And then look specifically at the worst calls by distance to see if those came in key situations where he could've been favoring one team over the other, instead of using expected runs. Meaning, if an ump missed 18 calls by 12 inches but 10 of those inches came on 2 calls in key 2-out situations where he called obvious balls strike-3s to end innings, that could possibly point to a situation where he was favoring one team over the other.
    Great stuff though.

  • @gator1dl
    @gator1dl Před 8 měsíci

    You're a genius. And you're hired! Go fix this!

  • @christianhall3051
    @christianhall3051 Před 8 měsíci

    Im curious as to whether the intro was a remix of labyrinth or if the background from the intro and labyrinth are from a common source.

    • @BaseballsNotDead
      @BaseballsNotDead  Před 8 měsíci

      I tracked the synth when doing a cover of The Mind Electric (it's the same synth as labryinth), really liked the Castlevania vibe it gave, decided to add the Labyrinth drums... then switch up the synth enough so it wouldn't trigger a copyright claim and added a bassline I just made up.

  • @ligomi
    @ligomi Před 8 měsíci

    I throughly enjoyed the Naked Gun clip

  • @CYMotorsport
    @CYMotorsport Před 8 měsíci

    Maybe a pipe dream but can you explain to me why baseball has yet to test out leveraging sensors and accelerometers ? The nfl uses them for the pylon. Formula 1 uses them but our cars go twice as fast as baseballs. It’s a static plate I do not understand why you wouldn’t rig up some type of real time hyper accurate true zone monitoring. They already do it with cameras and the sensors are reliable and hidden. They are exponentially more accurate than an ump who can still manage the game with their work load on batter ball plays. They can also be there as back up in case of tech failure early on while they implement.

  • @zachmoney716
    @zachmoney716 Před 8 měsíci

    This is a sick video

  • @prestonk6271
    @prestonk6271 Před 8 měsíci

    Maybe in the “Impactful Calls” section there’s something in parentheses stating how many runs that play accounted for. Ex: (MIA +2.1 R)

  • @joepiazza3756
    @joepiazza3756 Před 8 měsíci

    EUZ is meant to show what a zone usually is called for the ump over a career. It's like an ump scouting report for teams so they know where they can get away with pitching or laying off a swing. So in that first game shown, the calls may all be correct but one that was close but the players expected something else based on his history and thus was inconsistent this game compared to how he normally calls it.

    • @BaseballsNotDead
      @BaseballsNotDead  Před 8 měsíci

      That is not how EUZ works. I explain it fully in the video. If what you're saying was the case, every individual ump would have the same EUZ for each game, which they don't.

  • @freedbygsus
    @freedbygsus Před 8 měsíci

    This is a really great video, but I think you have a significant blind spot: the Strike Zone does not have a static size.
    The top and bottom of the strike zone are defined by the height of 3 points on the batter's body relative to the ground *when the pitch is delivered*. Even before you consider how batters adjust their stance for the pitch delivery, the Strike Zone still varies a good amount from one batter to another. A lot of computerized systems set the zone boundaries based on some percentage of the batter's total height, but two 6' 1" batters can have two differently sized strike zones based on their body proportions. A 6' 1" batter with a taller torso will have a larger strike zone than a 6' 1" batter with a shorter torso and the batter with the shorter torso will have a strike zone that is higher off the ground than the other batter's zone. Now factor in that batters have different stances at pitch delivery and you see even more dramatic variation.
    All of that is to say that umpire consistency should be a measure of the consistency of their *zone accuracy* from one batter to another. An umpire calling an accurate zone for Altuve and Judge in the same game is demonstrating good consistency (with the rules) and should be appreciated more than an umpire who basically gives up on adjusting to certain batters. That kind of consistency matters because that's what makes batters question whether they can rely upon their own sense of their own strike zone at the plate which significantly affects their approach for a PA.

  • @blue17echo
    @blue17echo Před 8 měsíci

    You actually probably want something like the sum of the squares of the missed distance, then maybe normalized to a percentage scale for readability.-- kinda like in linear regression where you seek to minimize the sum of squares of distances.

  • @jacobs7424
    @jacobs7424 Před 8 měsíci

    "Average distance missed" plotted against "catcher framing score" would be the best judge of consistency vs bias.

  • @RipleySawzen
    @RipleySawzen Před 8 měsíci

    This would be downright easy to fix, as a programmer.
    1. No correct call should count against the ump, unless that correct call is locally surrounded by incorrect calls.
    2. If you can draw a box around all of the strikes, and there are no balls inside that box, it's an automatic 100 for consistency.

  • @jayball820
    @jayball820 Před 8 měsíci

    My biggest issue with the ump score card is that it is fighting against frame rate. Especially since the only reason I believe we should keep a human ump behind the plate is that if we removed them we would remove the benefit that a catcher can provide by framing a pitch well. Correct me if I'm wrong, but there is nothing on the ump score card that gives them a benefit for having a great catcher that has amazing framing skills. Those "missed calls" on the ump shouldn't be counted against them while the catcher is celebrated.
    Beside that problem I personally have with ump scorecards, you brought up a lot of good points I'd never been able to articulate. I always knew when looking at it there was something off, but I could never understand why I had that feeling. As always great video, keep it up!

    • @panner11
      @panner11 Před 8 měsíci

      Doing that would raise a bit of issue, like a bit of feedback loop. It's like if you adjusted pitcher ERA based on how good the batter is. But then you adjust the batter's stats to how good the pitcher is. If you don't count the missed call when the catcher fools the ump, then you're not awarding the correct call if the catcher wouldn't have fooled a better ump. Things start feeding backs until it normalizes to the mean.
      This type of feedback loop is why we generally don't adjust for these types of things. Just take the flat stats.

  • @Kirk00077
    @Kirk00077 Před 4 měsíci

    If we assume that umpires aren’t actually biased toward or against a particular team (which I think is a bit silly) then one interesting interpretation of favor is that it reflects the difficulty of calling one team’s pitch mix correctly compared to the other: if Aaron Nola starts against Dustin May, I might expect the umpire to “favor” the Phillies.

  • @kongsbeard
    @kongsbeard Před 8 měsíci

    U need Tim Anderson and Jose Ramirez on the intro 😂😂

  • @grife3000
    @grife3000 Před 8 měsíci

    Honestly I have no use for "relative accuracy" or "overall consistency". Just move ball and strike accuracy to the top section, and that leaves room for another couple of "impactful calls", maybe like a top 5?
    And if they just added the "runs favored" stat to each "impactful call" it would show so much more about what you were worrying about -- the one really super impactful call compared to the others. Imagine if it went "1. +1.71 Runs for SEA 2. 0.23 Runs for OAK 3. 0.21" you'd have a much better idea if there was a massive bias or not. And then it would be up to you to interpret that as you will.
    And I second your request to have a season-long strike zone map shown for each umpire. While it would still be suspect to the same biases you mention (missing high on 4 seamers intentionally, trying to hit the lower pitches more), at least the sample size could show an ump's general consistency.
    Great video, I love this information age. Can't wait for robo umps to be consistent enough to use on an every pitch basis. I'm dreading the stupid challenge system that will occur first though.

  • @TinoMartinez20
    @TinoMartinez20 Před 8 měsíci +1

    Miss your videos bro

  • @540058
    @540058 Před 8 měsíci

    Total distance missed-->Distance missed per ball
    Amazing video.

  • @zelandakhniteblade5436
    @zelandakhniteblade5436 Před 8 měsíci

    There is a really easy solution to the Overall Favor stat - adjust it by the expected error. So if a bad call was close, the change in OF would be reduced, whereas if it were obvious the change in OF would be increased. This seems obvious.
    On the EUZ I agree completely. Your suggested approach is a good one. If the league wants to stick with the current approach though, it would be very easy to use a weighting to the edge line, so that instead of being drawn halfway between, it would instead be weighted more towards the true zone. That ends up being a compromise between the 2 methods.
    Finally, I rather disagree on the Total Distance metric, at least as a replacement for Relative Accuracy. RA is derived statistically and so is a true measure of performance. TD would depend heavily on the number of called pitches in a game, so I think you would at the very least need to use Average Error Distance or some such to achieve this but even then it would be less useful than RA. The real answer here is for the league to explain RA better and promote it as the primary measure of plate umpire performance. If TV broadcasts regularly promoted the best umpires in their pre-game show by their very high RA scores, that might go a long way to introducing the metric to a wider audience.

  • @olivialambert4124
    @olivialambert4124 Před 8 měsíci +1

    The metric of distance missed should be squared imo. Not only would distance squared be the default norm for statistics but to me at least it makes sense. If he's missing by half an inch that's quite significantly better than missing by an entire 1 inch. Squaring the distance missed accounts for that. I'd also drop the favour entirely and just do a simple ratio of how many calls were correct for side A vs side B. Anyone who wants to know how much it impacted the game can look in depth at the specific calls. Anyone who won't spend the time looking likely won't be using that metric correctly and wants to see a 2% bias for team x as an easier representation of the data.