Humanist Discussion Group, Vol. 35, No. 127. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2021-07-05 12:12:02+00:00 From: Henry Schaffer <hes@ncsu.edu> Subject: Re: [Humanist] 35.123: mashing up spurious correlations with oppositional AI Maurizio asks an interesting question - but the answer depends on a lot of undefined aspects. How does it happen that "for us the spurious correlations are obviously spurious"? What do we know that allows us to reach that conclusion, and from where do we get that knowledge? Tyler Vigen clearly picked those examples because it is obvious that they are spurious - and so my question about "obvious" still is there. If, e.g., the same high correlation was between the diameter of tree trunks in Lake Waccamaw and the mean weight of alligators living in that lake over time, is that obviously spurious? It depends on information outside of the subject matter being discussed. For a Tyler Vigen example, we have enough (outside) knowledge about Per capita consumption of mozzarella cheese and Civil engineering doctorates awarded to conclude that the high correlation is spurious. But what about the tree trunks and alligators? We don't know much about them and certainly there could be cause/effect or common causes. But does AI "know" about cheese and doctorates? The statistical question to ask is about the larger project, not just about the correlation itself. What were *all *of the variables studied? Were all of the correlations calculated? If so, then the discussion is really about "multiple comparisons". The significance level of a correlation is usually found by looking it up in a table - but the assumption made in construction of the table is that *one* correlation coefficient was calculated for a specific reason and its significance level is given, If multiple correlation coefficients were calculated, then there needs to be a correction, e.g. the Bonferroni correction. (See, e.g. https://en.wikipedia.org/wiki/Multiple_comparisons_problem Note that the example shown at the top of that web page is from Tyler Vigen.) Do we know if a high correlation is the result of multiple comparisons? For the ones labelled "spurious" we do know that both because we have "outside" knowledge of the variables and because of the name of the website/book. Does AI have that outside knowledge? For the tree trunks and alligators, the outside knowledge (if not given in the presentation) concerns what the investigators did. If they ran multiple comparisons, didn't mention that, and then presented just the largest correlation(s), we can't know that and neither can AI. --henry On Sat, Jul 3, 2021 at 1:27 AM Humanist <humanist@dhhumanist.org> wrote: > Humanist Discussion Group, Vol. 35, No. 123. > Department of Digital Humanities, University of Cologne > Hosted by DH-Cologne > www.dhhumanist.org > Submit to: humanist@dhhumanist.org > > > > > Date: 2021-07-02 12:36:52+00:00 > From: maurizio lana <maurizio.lana@uniupo.it> > Subject: Re: [Humanist] 35.120: phantoms of Big Data > > hi Willard, > > if I do a mashup of the messages of François who cites Geoffrey > Rockwell and Stéfan Sinclair > > With enough data one can get spurious correlations, as there is always > something > that has the same statistical profile as the phenomenon you are studying. > This > is the machine equivalent to apophenia, the human tendency to see patterns > everywhere, which is akin to what Umberto Eco explores in _Interpretation > and > Overinterpretation_ (1992). > > and of Henry Schaffer who cites Tyler Vigen > > I'll end with citing my favorite book/website on correlation > https://www.tylervigen.com/spurious-correlations > > with the thread about Artificial Intelligence, I end observing that > for us the spurious correlations are obviously spurious and I wonder > if an AI software would equally be able to spot them as spurious. > > could someone among us manage to submit to an AI system some of the > correlations described in the book by Tyler Vigen and to ask the > system to identify the spurious ones? > > Maurizio > > > Giulio Regeni, Mohammed Mahmoud Street, Cairo > > https://alwafd.news/images/thumbs/752/new/027f918bb62bf148193d5920ca67ded7.jpg > https://www.bbc.com/news/world-middle-east-20395260 > > Maurizio Lana > Dipartimento di Studi Umanistici > Università del Piemonte Orientale > piazza Roma 36 - 13100 Vercelli > tel. +39 347 7370925 > _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php