Lies, Damn Lies, and Statistics

You are correct.

For some things, weight, height, age etc average measures are indeed useful. That is because of the range from low to high it rather limited. In the case of family income where the range could be from less than 10K to over a billion, summary measures are much less useful.

For these cases really one needs to look at much more data but people do not want to have to dig into things.

I think the problem is not with statistics as such but rather with the fact we want quick and easy answers and life is just not that neat and simple.

Somebody better tell reddit that their simple up and downvoting system is flawed. And Slickdeals and also tell Twitter about retweets being inaccurate.

Come on people. There is a simple button that says whether you like or found a post helpful. Sure there is going to be noise, but after enough posts it speaks for itself. The conflating of something so simple with the broader subject of statistics, numbers, analysis is intellectually dishonest.

At the very least, look at the numbers themselves and what you know about the poster, and offer some nuanced interpretation of what you see.

I suspect that some are taking the idea that there is validity to the helpful vote ratio at a personal level. But it actually needs little interpretation. The bottom line is that some people are clearly more helpful on forums than others. And being helpful is defined by the person voting. That's just a simple fact and it's reflected in helpful votes.

Ok, let's test the hypothesis you advanced.

The data for the 4 members I posted represent my "hunch" about which members would have been viewed as being most "helpful" by the membership collectively. Since the forum data were not available to me in a downloadable form I looked only at a limited sample so there may be others who have higher scores.

Now do the following:

  1. ask yourself who the 4 most "helpful" members on the forum were in your experience

  2. look at their statistics on social and see if they are anywhere close to the ones I posted.

  3. if your group from 1) does not contain all of the 4 from my post, see if you can correctly identify each of the missing members.


In order to make that judgement, you would have to ask why they were voted up, or liked, or whatever the term is. If it's because your friend likes your posts, that is not useful data. If it's because you genuinely helped someone, sure, it's useful data. But we don't know that. And no, it does not average off necessarily. It's being used to promote what a great poster person X is. That may indeed be true. But you can not make that judgement and say person A was more helpful than person B.

I will give you a second example. There were FAQs created by various folks. Being labelled FAQS, those might have been read by and helped ten thousand people, almost zero of which upvoted the FAQ entry since it's a FAQ. Does that mean the creator was less helpful that someone who helped individuals? Possibly, but doubtful.

I know what you are saying, you are trying to look at it from a top level view, more in aggregate. But given the example of friend up-voting each other, which BTW I saw a lot of and I myself did so, I'd have to say drawing conclusions to say who was more helpful than who can not be done with any certainty.

In your case, you seemed to be trying to compare yourself to someone you disagreed with, i.e., taking it too personal. I myself would not deny that you were indeed helpful.

But it is not dishonest given that stats are misused all the time, each and every day, to try and make a point. I would disagree on that, and think that's pretty obvious and we've seen many examples of it in this thread.

I don't understand why we would do this analysis. Who I find helpful is different from who you find helpful.

If we aggregate everyone's helpful votes, then we have a sample size of voters that is meaningful.

Apart from the cases where friends are rigging the vote, why do I need to ask why a post was marked as helpful?

The whole point about being helpful is that it is being helpful in the eyes of the consumer. Questioning their reasons means you are placing a value over your reasoning over their reasoning.

As to friends voting up other's votes, I have to say that if we posit that there were broadly two camps on Social, there seemed to be more friends and friends voting on the "pro without reason" R+ side than the "trying to help R+ with realism" side.

The point is you can still interpret the data if you apply these filters. What you seem to be doing instead though is throwing out it's entire validity. Based on the above, I can still draw conclusions. Even you might at least say that people on either side of the opinion divide were comparable to each other. Indeed, on that basis alone, what I suspect to be true, has been confirmed by looking at the numbers.

Sorry but you've just revealed extremely flawed thinking here. Firstly, the votes, either a surfeit or a lack of, on a single faq make no difference to a person with a high post count. Secondly, it could equally be said that somebody who does more faqs is going to get disproportionate helpful votes. In fact, that's what I thought you were going to say. I personally find well written faqs more helpful and far more deserving of a helpful vote. Lastly, again, it is all about what the consumer finds helpful, not your judgement of how they should treat different posts. If somebody marks one post helpful and another one not helpful, I don't question the validity of that vote because it was their decision to make.

My motivation is not to compare myself to someone I disagreed with for personal reasons. Part of participating in a forum is to be helpful to others. Other's marking contributions as helpful helps me do that. It gives a feel for what contributions are useful to others.

My original purpose for bringing it up was to refute the statement by an individual that the "forum" found my contributions unhelpful. The comparison was made because one needs context to state such things. You can't say that a ratio of 101% is proof unless you have some sort of comparison.

Furthermore, I'm totally able to say that I found certain moderators to clearly be very helpful. No doubt that Chelle and KentE were very helpful although I disagreed with them on certain points and issues. It is interesting that by and large, the helpful ratios of the moderators reflected what I would have expected to have seen based on my observation on how the frequency of their comments were actually helpful to others. Both the upper end and the lower end in particular, really reflect that.

"how the second US invasion of Iraq was justified."

I do not want to get into that hornet's nest but just take up the idea of "justifying" although my experience has nothing to do with any aspect of US foreign or domestic policy. It is in a much more down-to-earth area.

One of the things we did routinely in the office was to prepare releases that would be made available almost immediately after big meetings concluded.

Since it was never known until the meeting ended whether a particular position would be adopted or rejected, it was necessary to have two statements ready. Each statement was based on exactly the same set of data, much of which was freely available to the public.

Any of my colleagues or I could write up two completely different statements one saying why on the basis of these data it had been possible to reach an agreement and another saying the exact opposite. There is really no problem doing that--one simply changes the weight attached to a particular variable in "shaping" the outcome.:lol:

I suppose in a thread with this title it is not out of place to mention last night's pronouncement of "La La Land" winning the Oscar for Best Picture.

It certainly qualifies under the Lies, Damn Lies rubric and seems destined to be enshrined in the upper echelons of statistical data on the most embarrassing public mess ups ever.:lol:

How the system is supposed to work:

and how the error happened:

Yes, I had followed Nate as well. This time though, it was clear something was up. I found it equally stunning that Michael Moore was so emphatic that something was up, as JTSR71 has said. How did he get this info (I don't know), and from where? Then, I heard the Donald say it was going to be much closer than the polls predicted in certain key states like Michigan, and he even went into some detail. Now, in every election, the candidate losing always says they are going to win, etc. But this was different if you heard his statements, it was clear it wasn't the usual BS all the candidates say. So, here was 2 guys from opposite ends saying the same thing. How did they both know or at least suspect, certainly in Michigans case, that things were not as the polls showed? And if they knew that, why couldn't someone else? Did they have statistics that others had as well, just interpreted differently? Did they have better statistics?

State polls are what matters of course in the electoral college. I don't see how there was some foreknowledge of the result by two prominent people who rarely would agree on anything. Yet, there was.

I am curious what Nate wrote. I will have to dig it up and read.

Were you referring to this? Why FiveThirtyEight Gave Trump A Better Chance Than Almost Anyone Else | FiveThirtyEight

If so, his very last statement is what I attempted to say earlier about bias, and I agree with him: "There was widespread complacency about Clinton’s chances in a way that wasn’t justified by a careful analysis of the data and the uncertainties surrounding it.". I think this cost her quite a bit, complacency is not good when you are "ahead" by razor margins, and, minimizing that fact and also over-stating the facts don't help either. Tends to make some people not vote, after all, their candidate is certainly going to win, clearly. This is often the problem with what I perceive as biased news (on all sides).

There was an interesting case in Maryland in 2014 where against all expectations the Republican candidate won the Governor's race.

The Democratic Lt. Governor had been expected to win easily especially since Governor O'Malley was very popular and was expected to announce for the party's nomination for the Presidency. The theory was that it would simply be a continuation of policies already approved by voters.

The Monday morning quarterbacks are convinced that it was largely complacency on the part of voters that led to the outcome. Many registered voters said they did not bother to actually vote because they thought there was no need to stand in line when the outcome was already "known."

Yeah, I heard about that as my brother was living there at the time. Sometimes, news bias for their candidate backfires.

O.T. for oldbooks1, Mr. stock market from the sounds of it, what is the reason for the massive market surge since the election? I've sure made a ton of money since then. Not complaining of course, but, why? Have any thoughts?

Well, if you want to get the whole forum engaged you certainly picked a good topic because everybody will have a view on this one.: lol:

First, let's look at the numbers.

Since November 7, 2016, the market (measured by the S&P 500 which is a good proxy) is up about 14% in less than 4 months. Over long periods of time, it normally takes about 15 months or so to move that much.

The other interesting thing is that it was already "expensive" pre-election and is now pretty pricey.

Microsoft, for instance, is trading at a P/E of 30 and AT&T is at around 20. What this is supposed to mean (given their respective dividends) is that investors are not expecting much capital appreciation over the next 12 months or that the current belief is we are at a plateau.

The "fear index" which is measured by the VIX is pretty much at historic lows (around 10-12%). That is another strong sign of complacency.

As to the "experts", there are of course many views but the general drift is that the Administration is going to make it a lot easier for the business sector to thrive through a variety of policy actions especially less regulation and tax cuts. That is what is supposed to have sparked the rally.

Traders (who have no interest in anything other than the very short term) are bored out of their minds because there is nothing to do and no way to make much money.

So on the surface, it is clear sailing as far as the eye can see. Mr. Market is saying that right now the most likely outcome at the end of the year (1 standard deviation move) will be up another 4% or down about 4%. Of course, Mr. Market is known to change his mind without warning.

Historically, that is the classic set up for a major event with serious downside potential. Anything could cause this, an incident with another country, a scandal at a financial institution, or something else. It really would not matter what is might be, all that is needed is a reason to head for the exits.

Will this happen? That is anybody's guess. However, if one believes in probability then the odds are good that the next move of 20% or more will be down rather than up and it could happen rather quickly.

For the normal investor, there is nothing whatever to do now except continue what one is doing. It is probably not a good time to be taking on much additional risk and a good time to start preparing psychologically for some red ink to start appearing.

That wisdom and $5 will get you a cup of coffee at Starbucks.:slight_smile:

Mine is up 30% (depending on the account, have 5) the past year, most of it the past 4 months, so, for me, massive increase. Which by itself would seem to make it more likely to go down. But, I don't pump and dump. These days, incidents seem to spook people more than ever so totally agree with that. Definitely would not be buying at this time. Of course there are methods to minimize downside risk as well (other than taking all money out, such as bonds). I've done that for the most part, and, have shorter term money (since I am about to retire at end of year), and longer term money that I won't need for 20 years or so. So, did the re-allocation this month in fact.

Michael Moore said that he knew these people and they'd suffered, had been ignored by Obama but stayed quiet and had had enough. I believe he felt that either the polls were not including them or that they weren't the type of people who would reveal what their vote would be. He also talked about how having been ignored for so long, these people seemed to collectively feel that they actually had a chance of sticking it to the establishment because they knew they were in swing states.

I'm a big follower of Nate Silver but the fact is that he is only as good as the polls he incorporates in his model. I saw a BBC documentary before the Brexit vote where they asked him to review the polls. They took him to a pub where he got to hear opinions that were all over the place. He looked out of his element, or maybe it's just his normal data nerd look after half a pint. But he was forced to admit that a one off election has more uncertainty.

And in retrospect, apart from the statistical reason that the Trump voter area in swing states was under represented, that's probably part of the miss as well. Thinking that the 2016 election was like any election before hand - Silver's modeling I believe uses the past for adjustment and margin of error calculations / predictions.

The Trump story is different. Apparently, Trump and others did not believe they would win. However, do not forget that KellyAnne Conway is a pollster. Indeed, when I first saw her interviewed, was the point that I became worried. Trump's campaign used Cambridge Analytics in the UK. The same people who worked on Brexit. The focus on the states that swung was a direct result of analysis.

It was still a long shot, but it was the route that their analysis told them was the most possible because they had an understanding of the Michael Moore voter in those states. Clearly, the Clinton campaign totally ignored those voters.

As to how Trump knew. He didn't. But his lifelong way of operating is to believe in what he wants to be true. And he also goes by what he sees and hears. So if he's in those swing states and is getting a crazy level of enthusiasm at his rallies while Clinton and the Democrats are ignoring those places, he knows he's in with a chance.

Yes, I got the Michael Moore knowing these people part, he said that. and it was true. I know Trump didn't know for sure, but, as you said, they had different data that was providing them with a path that others were not considering. So in that respect, while he may not have known (no one does for sure), he of course had some confidence based on some data and some actions they had taken. It was that focus on those swing states that proved very useful for them. I believe that was the source of some of the claims. This was different, I watch all the elections, and, it was clear they had something that the common analysts did not and it wasn't the usual boast (to me). Perhaps it was the Cambridge Analytics folks. The candidate always says they are going to win, I know that. But there was no doubt in my mind this came from a different place than the usual claim. I remember, for example, until the day was over, the Clinton campaign saying something about 100k votes left to count in Detroit that would go massively their way and it would be their state, no problem. That did sound like the usual claim of confidence (to me, and disinformation to me too), and, it proved to be a phantom set of votes from what I saw.

But you may have it, the Brexit set of folks. It wasn't just Michigan of course, but, that was a big one. Seems like pollsters and analysts have not been doing as well as they used to. I get that it's hard to predict exact vote counts in very close states. of course it is. But, when they were daily saying how she had a 97% or more chance of winning, it's over, let's get ready to celebrate, etc., how can you claim even that if the margins are razor thin in battleground states, that makes no sense statistically does it given the electoral college? Then we all remember the media calling Florida for Gore. Another foo bar. etc.

[b]Were you referring to this?[/b]

Sorry, I did not see your post earlier.

I think he did a separate piece that broke down where he thought his approach had gone astray. Unless I am mistaken, he was fairly objective about what he had missed.

Unfortunately, I did not save a copy and cannot seem to locate it on the site.

It is an interesting question as to whether all the coverage and never ending analysis of where a race stands might in itself have an impact on the outcome by causing people to stay away from the polls.

Of course, it is impossible to show this happened in a given case but there have a few cases where the margin was deemed so large some voters may have stayed away and caused an upset as a result.

This would be a truly perverse situation if the media which is supposed to simply inform (although in fact it does much more) is through its actions actually affecting outcomes. I have no idea how one might address this but it definitely is a concern.

Perhaps one approach would be some strongly worded disclaimer being provided during such reports.

Sadly, I think we are beyond that. There is no news any more, it's all bias, on all sides. They have to tell us what to think.