Ground balls (even though the OF looks silly here I think it makes sense – there really are three dead spots that sorta blend together on the fringes, one behind each defender):
Pop ups (no, I’m not changing the name of the Y-axis):
SLG BY BALL TYPE
I really should merge pop ups with fly balls, but for now just merge them in your mind. Also, if you look really closely you’ll notice that fly balls and line drives extend out to 480 ft in the y direction, while ground balls and pop ups only go to 420. I’m going to re-run everything tonight, extending out to 500 ft (but probably won’t share those, unless someone asks).
So I haven’t posted here in much, much too long. But that doesn’t mean that I haven’t been doing things! Without any additional commentary, here are plots of balls hit by batted ball type (fly ball/ground ball/line drive/pop up) for 2007-2009. Note that to smooth the data, the count shown on the right is the number of balls within a 5 foot radius of that point. Also, I have not done any work yet to clean the data, so there are a few points here and there that don’t make much sense.
Cliff Lee is dealing tonight: 9 Ks/0 BBs through 7 IP (and maybe more to come soon). No doubt he’s one of the best – if not the best – pitcher in the game at the moment. He’s throwing nasty pitches but also mixing them up effectively and hitting his spots. Now I think it’s clear to say that his pitch quality and precision are “talents.” Maybe more debatable is his pitch selection and general location – I’ll contend that it is a strategy, instead. Could he possibly be performing even more effectively if he had Greg Maddux in his ear calling pitches?
Lee is still really good by any measure – now 10 Ks/0BBs through 8.
This post has been revised on account of being written while highly distracted by baseball and after a few beers. It needs to be expanded upon anyway.
Spent the weekend in the Bay Area visiting soon-to-be family and already-been friends, including your beloved Niv. On the flight out I played around with TTL in order to get the layout down for the time being. Obviously it’s nothing fancy, but it’ll do for the time being.
Now the fun begins – actually getting some content together. I think I’ve got the basics down for what I need to do, and really the goal is only to get a dummy hitter and pitcher pages up in time for Phoenix. I think I’m going to run with Shin-Soo Choo on the hitter side and Max Scherzer on the pitcher side – for no other reason than they’re just both pretty fun players.
Off to do that now. Who knows? Maybe there will be another update later tonight if things really work/really don’t.
I’ve been playing around with 2010 Gameday data all night (courtesy of Niv, thanks as always!). Downloaded the R RMySQL plugin to help run the show, but unfortunately the only documentation I can find is this. I’m sure it’s got everything I need… but for some reason it’s all arranged in alphabetical order. So for someone just getting started, I have to guess about which command is relevant for where I am in the process (connecting, import data, run a query, save output) and go from there. Certainly doable, but not exactly the easiest.
For a bit of a break I figured I’d jump over here for a bit to talk about what I’m up to. First step in the process is this – calculating an AVG and SLG for each spot on the field by batted ball type. Using the XY data from Gameday and these park adjustments (I’m starting with the 2008 figures from here – even though I’m using 2008-2010 data – and will eventually tweak them as I see necessary as I become more familiar with the data) I’m coming up with a total list of all balls in hit into play and the rate in which the possible outcomes occur for that location on the field.
Once I get all that in, I’ll have to run a loop to help smooth out the data – for each spot/type, I’ll look at all the balls hit within an X foot radius (weighting balls on the periphery of that area less than ones at the center) and calculate a weighted average of all of the relevant stats.
Now there are a million and a half different reasons that this isn’t optimal (data quality, adjustment factors, player positioning, the way different parks play, etc) but it’s a reasonable start. This is the best data that I have available to me for now, and I think at the very least I can use it to create some fun maps in R (something I’ve been wanting to do for some time now).
Not that I wasn’t motivated to be working on this project before, but now we’ve kicked it up another notch. Niv was in town for the weekend and we had a whole lot of time to talk about where things are going and what it all could lead to. And I’m very much convinced it can lead to big, big things.
So I’ve begun to think about some tables for TTL and as of right now I’m leaning toward showing everything per plate appearance (or at bat/ball in play when appropriate). Now for hitting stats this is how it’s done and it makes perfect sense. However for pitching stats, rate stats tend to be per inning or per 9 innings (ERA, WHIP, K/9, HR/9, etc.). Now it doesn’t make intuitive sense to talk about “earned runs per plate appearance,” but for everything else I don’t know why we still don’t use the same denominators as when looking from a hitter’s perspective.
For instance, this year Carlos Marmol is certainly having an outstanding season, striking batters out at nearly 16 per 9 innings (which would be a record). However, while much more than half of his outs are coming via the K, he has truck out 41.1% of the batters he has faced. This is slightly ahead of the 38.5% posted by Billy Wagner this season and trails the 44.8% posted by Eric Gagne in his 2003 season and the 42.5% posted by Brad Lidge in 2004. No doubt Marmol is having a very good season, but the choice of denominator here is what really counts. Excluding walks and especially hits from the equation doesn’t really make much sense in this case; why should the fact that a ground ball sneaks through the hole rather than become an out make a difference on how successful a pitcher is at striking guys out?
Therefore, what I’ve decided to do is display parallel stats for hitters and pitchers. These will typically be on a per plate appearance/at bat/ball in play basis, although I do want to show one of ERA/FIP/xFIP for pitchers and RC/27 for hitters (again to be parallel). Not only does this the process easier by keeping the same fields for both groups, to me it makes sense as the way we should be looking at stats. Hitters try to score runs, pitchers try to prevent runs; hitters try to get on base and pitchers try to make outs; hitters try to help their team win and pitchers do the same. Since it’s two sides of the same coin, why do we currently treat them differently in so many cases, even when looking at “advanced” metrics?
There are a whole lot of things that I like about my job. It’s interesting and rewarding work, and normally it’s not too hectic. However, the past 10 days or so have involved a lot of frantic coding so that we can publish one of our main products by October 1. Not that there’s a choice… it has to be out by October 1. And as Niv can attest, I’ve pretty much gone AWOL so that I can finish up everything for our website.
Working at this pace at work has really killed all desire to come home and write even more code, even if this code is much more fun. However, I did just now sit down and got True Talent Level looking a bit more presentable. Not that there’s any content there, but at least there’s a link over to everything that we’ve got going on over here (including Niv’s post, which was well worth the wait!).
I’ve been doing a few things behind the scenes, but really hope to step things up if possible this week. I’m seeing Niv the next two weekends in a row, and hopefully there will be a great deal of discussion about our respective projects and where we need to go next with them.
Remember this little guy? It only took two additional weeks but I can officially check #1 off the list! And by me checking it off, I mean Niv really checked it off by doing everything necessary to get it set up.
It’s obviously not much as of yet – it’s only been around for a few hours – but I’m going to start to play around with it a bit and work towards making some progress on goal #2. Obviously really digging into the data is the main priority before November, but I just want to do some of the web stuff up front so that I can add to it as I progress with the real work.
luck (noun) a: a force that brings good fortune or adversity b: the events or circumstances that operate for or against an individual
Chad and Niv brought up some points in the comments to this post, and then I discussed things a bit further with them (in person in the case of Chad!). The biggest problem with that post is that i just basically categorized events as being luckly/unlucky without any real definition of the term. Well, I guess I was using the above definition, but there is way too much room to interpret how that may apply to baseball. In addition, there is a big distinction to be made between luck on an individual play and luck over the course of an entire game – I was confusing the two in a bit in that post, and will try to be a bit more distinct here.
Let us take a step back first and think about a game which is much more simple than baseball: darts. Let’s say you hit a bullseye; if you were to recreate the same exact physiological motion from the same exact location (location meaning physical structure and placement within that structure) and environment (same noise level/people moving around you) with the same exact grip that you used to hit that bullseye, you would with 100% certainty hit it again. You only depend on your own talents for an individual throw, so there is no luck involved whatsoever. Because the game itself is practically unchanging (you might have to aim for a 17 instead of the bullseye, but you are still the same distance away and using the same darts) and is turn-based (as opposed to interactive between the various players), you aren’t really competing against someone else as you play but rather you are competing against yourself relative to someone else competing against themselves (assuming you are not playing with teams). Your own performance can be neither lucky or unlucky; rather, you as an individual reap what you sow. However, because you yourself have no control over the performance of your opponent, the final outcome of the game may be lucky or unlucky based on how they perform relative to what is expected of them: if they hit numbers more quickly than they are expected to then you are unlucky, and if they hit numbers less quickly than they are expected to then you are lucky.
A game that adds an additional layer of complexity to darts is pool: it is turn-based, played in a climate-controlled environment, and uses a consistent set of equipment. However, unlike in darts the placement of the balls creates an infinite set of possible situations in which one can face. A player can repeat the same motion given the same situation and get the same result as in darts – therefore there is still no luck involved on an individual shot – but unlike darts they must be prepared to deal with any number of situations. Since these situations can range in difficulty from simple to impossible, it is possible for one player to face a much easier set of shots than the other. However, since leaving your opponent difficult shots is a skill in itself, this must be taken into account when trying to determine the role of luck. Therefore, one is lucky over an entire game of pool if they face easier shots than expected given opponent quality and/or have more opportunities at shots given opponent quality (due to unexpected misses by the opposition).
Golf seems to be a mixture of these two games to me – while the required shots change significantly over the course of a hole, each player is responsible for where their ball lies and plays no role in shots of the of their opponents. However, weather is brought into play for golf – an unexpected gust of wind blowing a shot off the green, for instance – and this means there is the potential for a slight bit luck on an individual shot (I would say this is slightly more common than someone dropping a glass while you are about to take your shot in pool, but overall it is pretty similar).
However, over the course of an entire tournament golf is a bit different than the other two games due to the fact that: 1) there are many more holes of golf in a tournament, reducing the volatility of the performance of each individual competitor; 2) there are many more competitors playing, reducing the volatility of the performance of all competitors in aggregate; and 3) the weather can change, potentially creating different conditions for the various competitors. Points 1 and 2 lead to the conclusion that, for the most part, you can have a pretty good idea about how the tournament scores will be distributed before it even starts. However, these may be affected by point 3. Also, while the overall distribution of scores are what matter for a tournament, when it comes to winning it is those scores on the end of the distribution that matter, and these will still be a bit volatile. Nevertheless, one can be considered to be lucky over an entire golf tournament if their competitors have performed worse than expected given the weather which they have faced and/or if the competitor has played in more favorable weather conditions than the rest of the competitors overall.
Finally, baseball. Let’s skip over the whole pitch selection part of the equation (which deserves a whole different post in itself) and just deal with balls in play. If you throw the dart/hit the pool ball/hit the golf ball where you want to throw/hit/hit it, then on an individual basis there is very little luck involved (except for dropped glasses/wind gusts). In baseball, however, there is no such guarantee. If you hit the ball exactly where you want to hit it in play, there is some non-zero probability it will turn into an out. And due to the nature of the game, is is much more difficult to hit a baseball exactly where you want it than to throw a dart/hit a pool ball/hit a golf ball exactly where you want it. That means that, on an individual play in baseball, there is some luck involved. A batter is lucky on a single play if the value of the outcome of the play exceeds the expected value of that ball in play. As samples increase in size this luck should even out somewhat – over a career there should be very little luck, but over the course of a game it certainly still plays a role in the outcome.
Note: I extracted some of this post in its original form for this so just imagine this as all being a prequel to that post.