Dan Vacanti
Transcript
[00:00:07]
Hello, ladies and gentlemen. Hello, my friends. Hello everyone. Uh, thank you for being here. That's about it, you're going to get all the French you're going to get out of me today, so.
[00:00:23]
thank you very much for the kind introduction. Um, this, this session is very, very challenging for me for, for, I was going to say for one reason, but then I, I've just come up with two more, so for for three reasons. Um, number one, first and foremost, this is a session right after lunch. I saw a lot of you had wine during lunch, so I'm hoping I can keep you all awake. Uh, for the next hour or so, we'll see how that goes. Uh, number two, uh, I just got introduced by somebody who does statistics for a living and we we are going to be talking a lot about data, so I'm almost certainly going to get everything wrong. Um, and I'm sure he's going to correct me. Fabrice is going to correct me uh after, he's he's he's taking notes right now, uh after the session is over. Um, last and potentially most important is I hate to break it to you, but we're going to be talking a lot about basketball today. So, um, hopefully all of you know a little something uh about basketball. Um, if not, maybe you will by the by the end of the session. Uh, to start off though, first, um, I I want to say a special thanks to to Dr. Donald Wheeler. If you don't know who who Donald Wheeler is, uh, please uh make sure to to go look him up. I've got a link to his website at at the end of presentation. Um, I've talked to Dr. Dr. Wheeler several times about this presentation, um, and I even stole the title of this talk. uh, from one of his papers. He was gracious enough to let me to actually to actually steal it, so that's that's not my idea, that's um, um, that's his. So, special thanks to to Dr. Wheeler. Uh, more more about um Wheeler when uh as we go through this.
[00:02:02]
Does anybody know the American author Malcolm Gladwell? Does anybody? Yeah. Um, I'm just going to come clean upfront. I personally am not a big fan of of Malcolm Gladwell, if you've read him. And and I'm not I'm not saying he's a bad person. Uh, I'm not saying the work he does is terrible. Um, I'm not saying he's even wrong, although we will talk about that as well. Um, I know, he's just never done anything for me. Um, but you may you may or may not know that Malcolm Gladwell does a podcast called Revisionist History. I don't listen to the podcast. And so a friend of mine called me up one day and said, hey Dan, there's this great episode on on Malcolm Gladwell's podcast called The Big Man Can't Shoot. Uh, it was published in June 2016. Uh, so you can imagine how dubious I was when he said, when my friend said, oh, you absolutely have to go listen to this podcast. Um, so I I gave it a shot. Revisionist History, Big Man Can't Shoot. It is about, supposedly, these are Malcolm Gladwell's words, not mine. Uh, Malcolm Gladwell's words, the the greatest basketball game ever played, um, in America. I'm sure there have been great basketball play uh games played over here. But in America, the greatest basketball game ever played happened in a place called Hershey, Pennsylvania. Uh, in March of 1962. I don't know if anybody is familiar with Hershey, Pennsylvania. They're they're known for they're known for chocolate. I'm not even sure that I would call what comes out of Hershey chocolate, though. Um, has anyone ever had Hershey's? I mean, especially compared to what you get over here. Hershey uh yeah. Hershey's Hershey's is terrible, it's just awful. Um, but in in Hershey, Pennsylvania, on uh, March, in March of 1962, uh, a gentleman by the name of Wilt Chamberlain scored 100 points in a single game. It's the highest scoring game in NBA history. Uh, it never happened before, hasn't happened since. A couple people have come close-ish in the in the 80s.
[00:03:58]
Um, 100 points, uh, Will Chamberlain scored 100 points, uh, you know, in in this game. Playing for uh, playing for the Philadelphia Warriors, I guess I it's it's right there, right? Um, a little a couple things about Wilt in 1962. If if you have if you haven't heard of Wilt Chamberlain, if you haven't heard the name Wilt Chamberlain. At the time, he was considered the greatest basketball player uh in the in the league. Um, he was the dominant player. He was the player that teams hated to play. He had won several awards up until this point. I think he'd won a rookie of the year award, if rookie means anything to you. Um, I don't know what the French translation of rookie is. Um, I'll have to go look that up. Uh, but his first year, his first year playing. Um, I think he'd won a couple of MVP awards. Anyway, he was he was the dominant player um in the league. Teams hated playing Will Chamberlain. Um, but there was one kind of thing against him and he was he was he was renowned for being an absolutely an absolute terrible free throw shooter. He was just not good at at shooting free throws, um, at all. However, um, in March of 1962 in Hershey, Pennsylvania, according to Malcolm Gladwell, again, these are his words, not mine. Will Chamberlain is a great free throw shooter. He actually says he's an incredible free throw shooter. His career free throw percentage is 51%. This is the evidence by the way, this is the evidence that Gladwell gives us. Will Chamberlain's career free throw percentage is 51%. Um, on that night, on the 100-point game night, he shoots 87.5%.
[00:05:37]
Somebody, I've got a note here, and I I hope I will remember, but if I don't, somebody remind me, I need to come back and talk about why why this comparison here is just just a little bit dubious. Um, but career free throw percent 51, that night he shoots uh 87.5. Like any good story, there's a twist. Right? Now, uh, anybody who's watched professional basketball knows that most players, when in fact, the vast majority of players, when they shoot free throws, they shoot using a what's known as an overhand motion, like what is seen in this picture here. Uh, Will Chamberlain in tried and trying to shake things up, and trying to try something different, to try to get his free throw percentage higher, in 1962, he switches to what's known as an underhand motion for shooting his free throws. Uh, in America, we use the pejorative, this is a granny shot. I don't know if anyone's heard heard the term granny shot, which I think is offensive to grannies everywhere. Um, but that's that's what they call it. So, in 1962, he's not shooting overhand anymore, he's shooting underhand. And Gladwell says it's because of that switch, that switch that allows Will Chamberlain to shoot 87.5% of whatever he did on that night. He says Will Chamberlain switches to a better free throw shooting technique, it pays off in the greatest basketball game ever played.
[00:07:01]
But were free throws really the reason that Will Chamberlain did so well that night?
[00:07:07]
That's what we're going to have to talk about. Before we can get to that, um, before we can make that judgment of whether free throws were responsible or not, we have to look at the data. Of course, we have to go look at the data. Um, and this is this is what's known as a run chart, we don't necessarily need need to know that. But what I've done here is uh sequentially with time progressing from left to right, I've plotted how many points Will Chamberlain scored in every single game that season. Now, it's pretty obvious, hopefully, just at a glance, it's pretty obvious to see uh what happened on that March night. That dot is way out there. That's that's pretty obvious to see. However, if we look at, so this is, just to be clear, this is his points per game, right? This graph is his points per game. This graph is showing his free throw percentage per game. So, same thing, um, games sequentially from left to right, now what we're plotting is his free throw percentage. Um, it's pretty again, it's I I would argue it's a little bit harder to see that March 2nd night there. But as, you know, when I first did this and I looked at this chart, it wasn't that dot that I've got highlighted right now that actually drew my attention. Anybody have any guess what was what was the dot that Dan really zeroed in on? What was the one that that I really is like, wait, what, what about this one? Anybody? Can you want to shout out to
[00:08:43]
The lowest one. Yeah. If you look, this is just four games prior, four games prior to his 100 point game, Will Chamberlain shoots somewhere around 30%. Less than a third of his free throw goes in. Now, I went and I tried to do research on this particular game. If you go look up Will's 100-point game, there's all kinds of articles, there's all kinds of interviews, there's even I think even been books written about that 100-point game. When you go try and find information about this this game on February 24th, nothing, crickets.
[00:09:19]
Um, in fact, I think the only information I found was on some really, really obscure NBA uh statistics website, uh, where I got this information. There's absolutely no information, um, uh, or any story about this um this this point here. Which brings us to kind of the the the I was going to say the main thing, the main thing we need to talk about is a is a little bit later. But before we can dig into this a little bit deeper, what's really going on here before we can dig into that a little bit deeper, we need to take a step back uh and talk about data analysis just for a second. Uh, and I'm going to try and do this at at a fairly high level, I I was going to promise there's no math involved, but there might be, maybe, maybe not. Um, hopefully, we'll just do pretty pictures and and and we'll keep it at that. But before we talk about data analysis, we need to talk about data itself. Um, and first principle of data, period, any data you're collecting, especially measurement data, although this really applies to any data, is that all data have noise. I don't care what data set you're looking at, all data will have noise. They will, all data, all data has noise. Some data might have signals. Right? All data have noise, some data might have signals. So the first thing that we need to do, the primary purpose of data analysis, this is why we we talk about data analysis, the primary purpose of data analysis is to be able to separate signal from noise.
[00:10:47]
If you cannot separate signal from noise, you are liable to make one or both of of two mistakes. The first mistake is to mistake noise for signal. Right? If you have if you're looking at a data set and you're not able to separate signal from noise, you might be looking at some very, very noisy data and saying, hey, that's signal. Likewise, you probably guess what the second mistake is, um, when you're looking at data sets. You might you might mistake signal for noise. So, if you don't have a way to separate uh noise from signal, you might be making one or two of these mistakes. Which is um, which is why the the trick is, we need to figure out a way to be to to be able to separate signal from noise in our data set, if we can, if there is, if there is signal, um, separate signal from noise, separate possible signal from probable noise, in a way that minimizes the economic impact of making one of those mistakes or both of those mistakes. You can't eliminate the possibility of making those mistakes, but we can probably minimize the impact of that. Right? Setting all the background for you. Oh, by the way, I I should mention, um, there's definitely going to be questions, um, hopefully there's going to be time for questions at the end. But if anybody has a question as I'm going through this, let's keep this very informal, you know, I'd like to get you involved. Shout out questions, raise your hand, throw something at me, whatever you need to do. Um, if anything doesn't make sense, uh, by all means, stop me and we can talk about it. But like I said, there also will be an opportunity at the end, but don't don't feel like we have to I'm going to be talking for the whole 60 hours or or 60 minutes or we're both going to be in trouble. Um, okay, enter this gentleman. Anybody know who this this just by picture alone, does anybody know who this is? Has anyone seen this picture before?
[00:12:39]
I think I'm hearing the name. If I were to if I were to tell you, this is not who this is, but if I were to say the name W. Edwards Deming, does everybody has everybody heard of the name Deming? Yeah.
[00:12:52]
This is Deming's mentor.
[00:12:55]
This is the person who taught Deming everything there is to know about quality, about statistical process control and things like that. This gentleman's name is Walter Shewhart.
[00:13:05]
Um, and I bring up Walter Shewhart because everything that I'm going to tell you right now, nothing here is new. I am regurgitating everything, um, literally 100 years ago. I think Walter Shewhart started his work on data analysis 100 years ago. For whatever reason, we don't really in the Agile community, um, Lean community's a little bit better, but in the Agile community, um, nobody really talks about Shewhart. And if you read Deming, Deming talks about Shewhart all the time. You know, there there people that call the PDSA cycle the Deming cycle and Deming was always like, no, that's a Shewhart cycle, right? Um, Deming would talk about Shewhart all the time.
[00:13:40]
So what what I'm going to be mentioning here is uh is Shewhart's approach because he was the one who studied the the problems I was just talking about in terms of data analysis. He's the one that uh that really pioneered all of this work. And what Shewhart said is if you have a data set like this, the first thing that we need to do is if we can, which we can, um, be able to draw some boundaries around that data, such that, um, it gives us a clear idea of what is signal versus what is noise. And what Shewhart would say is anything outside of those boundaries is possible signal. He called that assignable cause. Uh, by the way, assignable cause, special cause, exceptional, whatever, all of those, all of those really kind of mean the same thing. He said, hey, that's, um, that's possible signal or assignable cause. Anything within those boundaries, he would say is noise, that's just noise in the data. Right? And he he he called that, um, he called that chance cause. I think, but you can think of it as routine variation, uh, Deming called it the common cause. Um, I have to keep all these terms straight in my head. Um, but that's just that's just noise in the data. If it's outside, it's potential, it's potential signal, anything within those boundaries, um, is potential, uh, potential, is just noise.
[00:14:58]
You're going to have to trust me, like I said, I'm not going to get into the math of how these boundaries are drawn, I'm not going to get into that. If anybody wants to get into the math of how those boundaries are drawn, by all means, go read Dr. Wheeler, um, go go read his stuff. You're just going to have to trust me that all the charts that I I've drawn here are following Shewhart's Shewhart's approach. Uh, we'll talk about lean six sigma maybe a little bit later. We're not necessarily following their approach, we're we're following uh Shewhart's approach um specifically. Speaking of Lean Six Sigma, most of you might know this chart like this known is known as a control chart. Maybe you've heard it called a control chart before. Um, because Shewhart's method has been perverted over the years.
[00:15:39]
Um, usurped maybe by by organizations like like Lean or or movements like Lean Six Sigma. Um, Wheeler calls these things process behavior charts. So if you go read Dr. Wheeler and you see something called a PBC or a process behavior chart, that's what that's what this is, okay? Technically it's the same thing as a control chart. I would argue more correctly it's a control chart, um, but just know the the language of that now is a process behavior chart. And the idea is simple, like I just said.
[00:16:09]
If you if you're able to to create one of these charts, then anytime you have potential signal, that's a clear pretty clear indication that your process is unpredictable. And by unpredictable, we mean the data that you've got, the past data that you've got, will not necessarily be a good predictor of the future that you're trying to forecast. Right? Shewhart would say, you know what, if you've got one of those signals, you're unpredictable, um, and uh, therefore, what you should do is go figure out why that dot or dots are outside those limits, uh, and fix that first because any other process improvement that you might want to do while those assignable causes are going on is premature.
[00:16:55]
Right?
[00:16:57]
Again, shout out if you hear anything. If you've controlled all of your assignable variation, all your assignable cause variation, then what you're left with, like I said, then what you're left with is just noise in the data. If you want to improve your system, that's going to require some fundamental systemic change. Right? There's there's no one thing that you can go look at and say, oh, we have to go fix this. Now you actually got to go change the process. You have to go change the system. Like Will Chamberlain, we have to change from shooting overhand to shooting underhand, right? That that's pretty much what you have to do. All right.
[00:17:32]
Okay. This is cycle time from uh a cycle time run chart from a Scrum team that was doing two week sprints. Okay. And and by the way, this is not a commentary on Scrum. I'm not saying Scrum is bad. I'm not saying don't do Scrum or your data will look like this. I'm not saying that at all. I'm saying this was a team that we went and we collected a bunch of data from and this was their cycle time. If you were to draw those control the the natural, I'll call them the natural process limits for this team. That top line is at 65 days. Uh, the bottom line is zero and that's that's going to happen a lot by the way when we talk about cycle time data. It's zero bounded data, so almost always will be that lower line will almost be zero. 65 days, Scrum team doing two week sprints. That top line is 65 days. Is this process predictable?
[00:17:41]
And and by the way, this is not a commentary on Scrum. I'm not saying Scrum is bad. I'm not saying don't do Scrum or your data will look like this. I'm not saying that at all. I'm saying this was a team that we went and we collected a bunch of data from and this was their cycle time. If you were to draw those control the the natural I'll call them the natural process limits for this team, that top line is at 65 days. Uh the bottom line is zero and that's that's going to happen a lot by the way, when we talk about cycle time data, it's zero bound data so almost always will be that lower line will almost be zero. 65 days. Scrum team doing two weeks sprints. That top line is 65 days. Is this process predictable?
[00:18:27]
Somebody go out on a limb, somebody, anyone. I heard a yes.
[00:18:34]
Anybody want to vote no?
[00:18:37]
No.
[00:18:39]
Should we do a raise of hands?
[00:18:42]
Is it is everybody sleeping by now already? Let's say, who says let's just show hands, who says this process is predictable? Yes. Looks like a third, maybe a quarter. Who says it is not predictable? Vast majority of people say this is not predictable. Believe it or not, this process is predictable.
[00:19:01]
According to Schwartz rules, according to Schwartz's definition of predictable, this process is predictable. This data is showing us exactly that this team is is operating the exact process that they've designed. Right? Now, this might not be very useful. But this process is predictable. We have to expect when um when this team is operating, we have to expect uh that anything within 65 days is uh uh is to be expected. Right? If they want to make things better, they're going to have to make a fundamental change to the process, right? Okay. What does this mean in terms of data analysis? Enough of looking at data, why we need to look at data a little bit more. Um what does this mean in terms of data analysis? Well what this really means is that you're going to have to throw out a lot of advice that maybe you've been given when you're looking at your data. You're going out and reading a whole bunch of stuff and maybe you've read some things that we're going to go over and those things that you've you've read, you might want to question them a little bit, you know, potentially. Uh the first thing I want to talk about is this word called outliers. Um interestingly enough, Malcolm Gladwell also wrote a book called outliers. Um I haven't read it. Um but I don't think the outliers he's talking about is what I'm going to be talking about. I think he's talking about outliers that are people. I'm talking about outliers that are data, I think. I know somebody correct me on that, but I don't know. We're talking about outliers. What is an outlier? So we have to ask what what is an outlier? If we go back um to Wilt's free throw percentage, and and we draw our natural process limits, um or before we draw the natural process limits, remember Mr. Gladwell wanted to call that dot an outlier. If we draw natural process limits around that, you'll see that that point is not an outlier. Uh but that one maybe is.
[00:20:50]
Right. Um you also may maybe have heard, hey, whenever you get extreme outliers in your data, one of the first things you want to do is you want to remove those extreme outliers, um because they're statistically insignificant. Uh that may be true, but it probably isn't. Um if you know for sure it is bad data then yeah, absolutely remove it. But from Schwartz's perspective, those extreme outliers, that's where all the signal is. There is a reason that you got that dot, again, assuming it's good data. There's a reason you got that dot. Um and so instead of just ignoring it, instead of blindly removing it, what we really need to be doing is asking why did that what what what caused that thing? What's what's what's causing this this quote unquote, uh quote unquote outlier? Because again, if we draw those lines, by definition, that one is assignable cause. So the first thing you need to do is anybody who talks to you about outliers, question what they mean by outliers. Secondly, if they tell you to, ah, you just remove all outliers. That's an outlier, just remove it. Um that should make your skin crawl a little bit, um cause you to ask some things. This is the one that's most controversial. Trends. Right? What what what what is a trend? If I say the word trend and I put this chart up again. This is this is Wilt Chamberlain's uh free throw percentage per game. If I say trend, what and you look at this data, what what kind comes to mind? Do you see any trends in this data?
[00:22:25]
Kinda. Does anybody kind of see a trend there, doesn't it look like kind of the middle part, doesn't the data look like it's it's kind of trending upward? Maybe, maybe not. Um if you were to draw but again, if we were to draw those those boundaries, um uh you know, around the data. Notice all of that is well within the noise. A pattern like that is actually to be to be expected. There's not necessarily a trend there. Right? That's all noise. This brings us to the title of the talk. This is teen tobacco use percentage. If I were to ask you, is there a trend in this data?
[00:23:05]
Your gun shy now, so you're probably not going to say anything. Um but most people would say, yeah, it looks like, doesn't it look like in general, over time, team tobacco use is is going down. Yeah, it looks it looks like it's trending down, right? If we draw those boundaries again, you notice that all of that is well within the noise.
[00:23:24]
Um if we add in 2018, by the way, this is this is data to 2017, if we add in the dot for 2018, would we say that use is trending back up?
[00:23:36]
Because that's exactly what USA Today said. They said, hey, look, teen tobacco use is is trending up. Um is that what that really means?
[00:23:47]
Moral of the story is, when we're talking about noise, there there for the most part, there really is no such thing as a trend in noise. One of the best ways I think so far that I haven't come up with a really great way of explaining this, so hopefully this makes sense. One of the best ways I think I have come up with explaining this is, imagine you had a fair six-sided die.
[00:24:07]
And imagine you started rolling that die and the first roll was six and the second roll was five, and the third roll was four, and the fourth roll was three. Would you say that your die rolls are trending downward?
[00:24:22]
I would say no, that all those outcomes are to be expected. Just like his, this data is telling us all of these outcomes are to be expected. There's really no trend there. For the most part. We'll we'll we'll see if you agree or disagree me maybe a little bit later. Um this is just for fun. Remember we talked about comparing a single point to an average. This is the thing that you're probably going to see most. Um don't get me started about scrum teams and velocity and talking comparing a single sprint to long-term velocity, do not get me started, but but hopefully some of this will will resonate. Comparing a single point to an average. USA Today, remember when it said teen teen use trends upward, as evidence, it said, well, you know what the long-term average was 16.2%.
[00:25:07]
The 2018 value was 18.3%. That means things are getting worse.
[00:25:15]
Why do you think it's so dangerous to compare a single value to an average? Why do you think that's that's dangerous? Anybody have any ideas?
[00:25:22]
There're actually two reasons, maybe more, but at least two.
[00:25:27]
Let me ask an easier question. What are your chances if you're comparing a single point to an average, what are your chances of being right?
[00:25:38]
What's that? Sorry. The two values are. Oh, the two values are quite close. Yeah, we're going to get to that one in a second. That's that's the second point I want to make. But if I'm comparing a single point, if I say, hey, my long-term average is this, the next thing, the next point I'm going to have is going to be above that average, what what are my chances of being right? About about 50%. It's not necessarily 50%, we have to we have to get into statistics and distributions and things like that, but it's probably probably pretty close. I've got a 50% chance of being right, 50% chance of being wrong, big deal. Right? What have I really learned there? But more importantly, is this uh how close is it to the average? Just because it's above or below the average, it's probably, as Schwart would say, it's almost certainly within the noise. So that forecast gives us almost no information whatsoever. I live in Miami. You might be wondering, Dan, if you live in Miami, what are you doing in Paris in October? Um to which I answer it's Paris and that's Miami. I don't know if anyone's ever anyone's ever been to Miami. Um but the real reason is it's it's the middle of hurricane season right now in Florida. Um and I like to run away whenever there are hurricanes, I just like to to to get out of get out of the state as quickly as possible. This um is a headline from I believe Forbes. I might have the article here. Forbes, yep.
[00:26:59]
Um at the beginning of hurricane season, they said experts predict above average 2022 hurricane season.
[00:27:06]
What does that tell you?
[00:27:09]
Okay. What if I told you the long-term average of hurricanes for uh for the for all of the North Atlantic, let's not just talk about Florida, for all of the North Atlantic is 14 storms. Okay, so they're saying it's going to be above 14. What does that tell you?
[00:27:26]
Nothing. Almost almost nothing, almost nothing.
[00:27:30]
Uh by the way, do you think they're right or wrong this year?
[00:27:34]
Anybody know how many storms there have been so far? You really have to be following this. Anybody want to guess how many storms there have been so far?
[00:27:42]
Not three.
[00:27:45]
That's a good guess. It's more than three.
[00:27:48]
It's it's actually 11. 11 and hurricane season wraps up here in a few weeks. Um so they're probably going to be wrong with this. They had a 50% chance of being right, they're going to be wrong. Um and not only were they wrong, but the the forecast they gave was well within the noise anyway. Um where have we seen this false comparison before?
[00:28:08]
Somebody was supposed to remember this and remind me, where have we seen this? Comparing a single point to a long-term average, where have we seen this before? Yeah. Remember Mr. Gladwell said, well, well, look, his long-term average is 51%. He shot 87.5%. If we go look at the data, and I I don't have that chart handy, but if we go look at that data, remember that's well within the noise, not really well within the noise, but it is it is within the noise. This just because he shoots really, really well that night, is no evidence, necessarily is no evidence that free throw shooting was uh was the reason he played such a great game. Histograms, probably my biggest pet peeve when it comes to data analysis in general, especially time series data analysis, I guess you should say I used to have a full head of hair. Um until people started talking about histograms. Um this is this is where most of the egregious mistakes are made. This is a a cycle time histogram for a different scrum team. This is still a scrum team, again, I know it feels like I'm picking on scrum, but I promise I'm not. Um and you might look at this and anybody who's studied cycle time might say, oh, look, uh this is what it's supposed to be. This is supposed to be a log normal distribution or a Weibull or Weibull or Weibull or however the heck you pronounce that name or whatever it is. oh look, look, that's that's what it's that's what it's supposed to look like, right? This is this is cycle time data.
[00:29:25]
If we throw this into a chart that sure it would um expect, something more like this. What do you see? This is now, by the way, this is now cycle time over time. I apologize, these are American dates over here. I don't know if this is going to work. Um but these are American dates across the bottom. Oh no, no, sorry. No, they're European dates. I did change it for you. Um so that is October 1st, that is January 1st, although that's the same as in American.
[00:29:55]
Um what do you see different about this chart versus that one?
[00:30:02]
That one? That one? It's like an eye test. Number one. Number two. Number one. Anybody shout out some some things that you see. What's that? Side in here.
[00:30:15]
Yeah, yeah. So it looks like, to me, it looks like something happened roundabout October 1st.
[00:30:21]
If I were to say, Dan came in and did Kanban training for this team on October 1st. You might be like, ah yeah. Um if I were to go back to this chart and I were to say, hey, when did Dan come and do compound training for this team?
[00:30:38]
Your argument might be, well, you know what, Dan, though, but we can still draw boundaries around a histogram, just like we can draw boundaries around this chart. What's the problem with that approach?
[00:30:49]
Do you think the natural process limits that we would draw for this section of the data should be the same as the national process natural process limits that you would draw for that section of the data? Probably not. There was a fundamental change in the process. You do not know where that fundamental change happened, um on this chart.
[00:31:11]
Histograms. Um for extra credit, you you've already kind of seen my disdain for this. Uh but for extra credit, you just never really fit a probability distribution to your data. A probability distribution did not generate your data. Your process generated your data. It wasn't some fancy mathematical thing out there, it was your process. Um at best your data is an approximation of um of one of those distributions, but it's probably not even that. More importantly, it doesn't even matter. Because you don't you don't really need to know any of that stuff to do any of this type type of analysis. Lastly, don't forget your get out of jail free card, um and that is that how are we doing on time? I'm going through this very quickly. Uh data have no meaning apart from their context.
[00:31:57]
I don't care what um what data analysis technique you're using. I don't care uh who's told you what. You understand your context where your data was collected. You understand that. And so again, if you know there's a reason why that dot happened, um whether that's telling you a signal or not. If you know it's signal, then it's signal. If you know it's not signal, then it's not signal. Right? Data have no meaning apart from their context.
[00:32:21]
So let's get back to Wilt. See how we.
[00:32:26]
Hopefully everybody can understand where I'm going with this. Um in that 62 season. There yes, there is possible signal there if we look at the points per game, there's possible signal there. Something did happen that caused that 100 point game. Something something exceptional did happen. That it was caused by free throw his free throw shooting technique is dubious.
[00:32:52]
Because what the data might suggest is that underhand free throw shooting actually made him worse. I think that's potentially um a better conclusion to draw than he got better using his hand underhand free throw. Does anybody know what Will Chamberlain did in the 1963 season? Did he keep shooting underhand? Anybody happen to know this story? Is there anybody know? Anybody want to guess, did he keep shooting underhand? He did not. It was because he went and looked at this data, right? Anybody want to know why Wilt Chamberlain switched back to overhand?
[00:33:29]
Any guesses?
[00:33:31]
No, they didn't change the rules. That's a good guess. They didn't change the rules, you can still shoot it underhand if you wanted to. He did injury. That's another good guess. No, it's because he just felt silly.
[00:33:44]
He said he looked stupid. I don't want to shoot. I don't care if it's making me better or worse, I don't care. I look stupid doing it, so I'm going to go back to overhand.
[00:33:54]
Turns out that was probably a reasonable thing to do, reasonable outcome, uh wrong uh wrong analysis. Right?
[00:34:03]
Um if anybody's interested. Not that I not that I usually recommend you go look at Wikipedia for anything. Because having done a lot of research in my time, I I found, believe it or not, Wikipedia gets things wrong. Can you believe this? It actually gets things wrong from time to time. Um but it is a pretty interesting story. There turns out there was a lot of other things that were going on that night that had almost nothing to do with his his free throw shooting. Um and what's interesting to me is this story is much more interesting than what Gladwell presents in his uh in his podcast. So if anybody's interested, go look up that that uh that game on March 2nd, 1962, and you can see all the things that were happening. None of it really had anything to do with uh free throw free throw shooting. So. To sum up, Mr. Gladwell, he's making that mistake. He's mistaking noise for signal. Right. Happens to the best of us.
[00:34:52]
Um okay, so what? There's always got to be a so what section of the um of the keynote. I know this is not your typical keynote, um but there's but there's there's got to be a so what.
[00:35:03]
We need to recognize that our our process world is dominated by randomness. That's just the world we live in. I know a lot of people like to take a deterministic approach to um to our data analysis, doesn't work. That's why so glad that we got introduced by a statistician who probably understands this stuff. Our world is dominated by randomness. Which means that for a given set of data, the very first thing that we need to do is to ask if your process is predictable. Is your process exhibiting any signal? Because if it's exhibiting signal, that that what we need to do is go inspect why is that signal happening. Um.
[00:35:39]
If if um if your process is not kind of behaving the way that you expect it to, if it's not predictable, uh do you know where to go look for it? I'm suggesting one approach, this is one approach by the way. I'm not saying it's the approach, but this is one approach. One approach to to go and doing that. Do you know where to go look for that signal? Right. Um once you've made that change, how do you know if that change resulted in um in in an improvement or a non-improvement? Did we fundamentally change the system? How do we know that? Again, I would argue process behavior chart, wonderful way to to see that. Um and then what do you do if if we've already got a process that is quote unquote predictable? If we already have that process,
[00:36:21]
well, now what do we do?
[00:36:24]
The answer to that is
[00:36:26]
everybody was paying attention, what's the answer to that? If our process is predictable, but it's really not what we want, what do we need to do?
[00:36:33]
A fundamental systemic change to the process. Right?
[00:36:38]
Going after a little um little or or specific assignable causes ain't going to get it done. Right?
[00:36:45]
Um finally, maybe most importantly, once you have your uh once you've um what you really need to do is design a process that can answer. Um when will it be done? Kind of a shameless plug there. A special thanks to a good friend of mine, you might you may have may have heard his name. Hopefully you haven't. Um I apologize if you have. His name's Prateek Singh. We don't like to show pictures of Prateek, but we love his dog. Um so this is Nisha, by the way. So special thanks to Prateek, but especially to Nisha, uh for helping with with some of the research and the analysis, you know, on this topic. For next time, maybe, I don't know. I keep saying I'm not going to get invited back to these things, they keep inviting me back. I don't know. Um but for next time, now that we kind of understand randomness and uh and outcomes and things like that. Uh yes, it really is output over outcomes. You're reading that correctly.
[00:37:36]
Uh I don't know you hear a lot of people in Agile talk about, no, it's it's outcomes over output, it's outcomes over output. Mm not really. It's technically output over over outcomes. For next time, if if anybody's interested. Um and of course, uh don't don't use story points. Uh but I don't I don't want to necessarily get into that. This is the information that I was I was I was promising you. Uh Dr. Wheeler's website is SPCpress.com. Um you can find out more um at any of these other places, but if you want to want to go here any anymore about um uh about process behavior charts and what they are and the history of them and all that stuff. Uh Dr. Wheeler's website SPCpress.com is the place to go.