Humanist Discussion Group

Humanist Archives: July 6, 2021, 6:58 a.m. Humanist 35.127 - spurious correlations and oppositional AI

				                  Humanist Discussion Group, Vol. 35, No. 127.
        Department of Digital Humanities, University of Cologne
                   		Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2021-07-05 12:12:02+00:00
        From: Henry Schaffer 
        Subject: Re: [Humanist] 35.123: mashing up spurious correlations with oppositional AI

Maurizio asks an interesting question - but the answer depends on a lot of
undefined aspects. How does it happen that "for us the spurious
correlations are obviously spurious"? What do we know that allows us to
reach that conclusion, and from where do we get that knowledge? Tyler Vigen
clearly picked those examples because it is obvious that they are spurious
- and so my question about "obvious" still  is there.

If, e.g., the same high correlation was between the diameter of tree trunks
in Lake Waccamaw and the mean weight of alligators living in that lake over
time, is that obviously spurious? It depends on information outside of the
subject matter being discussed. For a Tyler Vigen example, we have enough
(outside) knowledge about Per capita consumption of mozzarella cheese and
Civil engineering doctorates awarded to conclude that the high correlation
is spurious. But what about the tree trunks and alligators? We don't know
much about them and certainly there could be cause/effect or common causes.
But does AI "know" about cheese and doctorates?

The statistical question to ask is about the larger project, not just about
the correlation itself. What were *all *of the variables studied? Were all
of the correlations calculated? If so, then the discussion is really about
"multiple comparisons". The significance level of a correlation is usually
found by looking it up in a table - but the assumption made in construction
of the table is that *one* correlation coefficient was calculated for a
specific reason and its significance level is given, If multiple
correlation coefficients were calculated, then there needs to be a
correction, e.g. the Bonferroni correction. (See, e.g.
https://en.wikipedia.org/wiki/Multiple_comparisons_problem Note that the
example shown at the top of that web page is from Tyler Vigen.)

Do we know if a high correlation is the result of multiple comparisons? For
the ones labelled "spurious" we do know that both because we have "outside"
knowledge of the variables and because of the name of the website/book.

Does AI have that outside knowledge?

For the tree trunks and alligators, the outside knowledge (if not given in
the presentation) concerns what the investigators did. If they ran multiple
comparisons, didn't mention that, and then presented just the largest
correlation(s), we can't know that and neither can AI.

--henry

On Sat, Jul 3, 2021 at 1:27 AM Humanist  wrote:

>                   Humanist Discussion Group, Vol. 35, No. 123.
>         Department of Digital Humanities, University of Cologne
>                                 Hosted by DH-Cologne
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
>
>
>
>
>         Date: 2021-07-02 12:36:52+00:00
>         From: maurizio lana 
>         Subject: Re: [Humanist] 35.120: phantoms of Big Data
>
> hi Willard,
>
> if I do a mashup of the messages of François who cites Geoffrey
> Rockwell and Stéfan Sinclair
>
> With enough data one can get spurious correlations, as there is always
> something
> that has the same statistical profile as the phenomenon you are studying.
> This
> is the machine equivalent to apophenia, the human tendency to see patterns
> everywhere, which is akin to what Umberto Eco explores in _Interpretation
> and
> Overinterpretation_ (1992).
>
> and of Henry Schaffer who cites Tyler Vigen
>
> I'll end with citing my favorite book/website on correlation
> https://www.tylervigen.com/spurious-correlations
>
> with the thread about Artificial Intelligence, I end observing that
> for us the spurious correlations are obviously spurious and I wonder
> if an AI software would equally be able to spot them as spurious.
>
> could someone among us manage to submit to an AI system some of the
> correlations described in the book by Tyler Vigen and to ask the
> system to identify the spurious ones?
>
> Maurizio
>
>
> Giulio Regeni, Mohammed Mahmoud Street, Cairo
>
> https://alwafd.news/images/thumbs/752/new/027f918bb62bf148193d5920ca67ded7.jpg
> https://www.bbc.com/news/world-middle-east-20395260
>
> Maurizio Lana
> Dipartimento di Studi Umanistici
> Università del Piemonte Orientale
> piazza Roma 36 - 13100 Vercelli
> tel. +39 347 7370925
>


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php