book-reviews

Past, Present and Future of Statistical Science

by Lin et al. (eds.)

Rating: ★★★

A collection of articles from fifty COPSS award winners. Content and delivery varies by author, so I've reviewed each article below. There are 52 articles, so you could in theory read one a week. I didn't do this, I just took a long time to read the book in incredibly uneven bursts, sometimes forgetting about it for months, and other times burning through several articles in a day.

Part I: A brief history of COPSS

Absolutely no idea why this section was considered worth including. Incredibly dry administrative history of COPSS, delivered in the form of a list of everything COPSS has ever done, with no indication of why any of these things might be significant or relate in a broader sense to anything in the history of statistics. The paragraphs read like material copied from a low-effort organisation website, and unless you want to know empty facts like who won the Elizabeth L. Scott Award in 2004, are completely without value. It seems like someone wanted to promote COPSS in the book they were producing, and thought that this was somehow a way to do that. It is at least short, and easily skipped.

Part 2: Reminiscences and personal reflections

Ingram Olkin: Very spotty, like listening to someone ramble, and not always clear what the point of an anecdote was, but a few were amusing. Quite a lot of name-dropping that went over my head.

Herman Chernoff: Prodded me to look up the Neyman-Pearson lemma. Contains an interesting discussion of estimating and applying confidence measures to third-grade students' multiple-choice test answers, plus developments in optimization and asymptotics which were beyond me. Amusing to hear of Chernoff and Bather developing a method for correcting rocket flight midway to the moon under the assumption of poor instrumentation, and shortly after their derivation hearing that such an event had taken place without it, expensive instrumentation rendering their work unnecessary.

David Brillinger: Commencement speech on the topic of Tukey, very brief but somewhat entertaining.

Juliet Popper Shaffer: The first genuinely accessible and interesting biographical sketch. Shaffer became a statistician despite some significant hurdles in her background, and writes with impressive clarity about fundamental issues in multiple comparisons.

Peter Bickel: Lists the important figures in early statistical science, before narrowing to his own background. Some interesting overlap with the development of machine learning.

Donna Brogan: A quite relatable and open account of her early life, including the difficulties of getting into statistics as a woman, even with an obviously high aptitude.

Bruce Lindsay: A very lively and readable biography, bringing along with the usual background details some examples of an applied introduction to mixture models and the EM algorithm.

Dennis Cook: Another good writer, demonstrating how expertise was developed through hands-on experience (actually running agricultural experiments on the ground, and hand-calculating ANOVAs), and with some strange fortuity (Cook only took mathematics education because it had the shortest line when he was signing up). Segues gracefully into a discussion of historic and current developments in regression diagnostics.

Kathryn Roeder: Discussion of the importance of finding good collaborators, delving into the highly interesting field of population genetics, with some humanising anecdotes about rejections and imposter syndrome.

Jeffrey Rosenthal: Extracts a series of lessons and morals from his life for early-career researchers. I usually find this sort of thing a bit condescending, but somehow Rosenthal's modesty and energy pulls it off, and it's actually genuinely inspiring. Stresses collaboration, the unexpected twists of a career, and being open to exploration.

Mary Gray: A rather strident message on equal remuneration for women. Like Gwern, I struggle to see how she understands the results she presented about longevity as for her position on life insurance rather than against it. She generally talked more about her activism than academia, and her tone left a bitter taste compared to Rosenthal's open encouragement.

Part 3: Perspectives on the field and profession

Stephen E. Fienberg: Opens by putting forward what seems to be a rather controversial claim to (unpublished) precedence in the development of heirarchical Bayesian models. Gets a little lost in some scattered detail, but circles around the notion of statistics being something that can be carried out in public service, and how applied work leads to methodological improvements.

Ian M. Johnstone: Amusing data-driven little ditty, albeit quite clearly a 'I just about had time to write this' contribution. Statistics seems to be an uncommonly unpopular subject, but perhaps mostly because it is taught as-well-as rather than as a focus in itself.

Peter G. Hall: Although he apologises for it very much, Hall's discussion of his personal history in statistics was far more attractive content than his fuzzy discussion of concerns in statistics (I'm still not sure exactly what these are, other than funding issues) or the particulars of the development of function estimation -- though his theory-first background is very distant from my own.

Rafael A. Irizarry: A very upbeat attitude to the field of statistics at the moment (in contrast to Johnstone's faint worries), with some personal history identifying the importance of support and collaboration.

Nilanjan Chatterjee: Well-constructed piece, tying a contemporary news event into their personal history in statistics, and following the thread of this topic in sufficient detail that some real content is communicated. If Chatterjee is well-published, I think it is deserved.

Xihong Lin: An overview of the field of genomics, with a lightly personal angle, mostly focused on some pitches for current problems.

Mary E. Thompson: Focuses on the topic of 'women in statistics', taking a historical perspective. A lot of this is a blur of names, sometimes with short biographies hanging limply off them, but there is an attempt at narratisation in there as well, and the final section shows some reflection.

Nancy M. Reid: Weird editorial choice, to put two 'women in statistics' articles one after another. Reid's compares very favourably in construction and flow, with a strong message about gender bias which does a good job of balancing perspective on progress with the need for still more. Her discussion of the topic is valuable, touching on many recurring issues (quotas, a prize 'for women' equating to 'not real') in a manner which agitates without needing to be inflammatory.

Louse M. Ryan: Passionate essay on work in racial diversity programmes, unfortunately dropping a number of references to now-undermined priming effects, and a short section on gender issues which seems to flag Australian academia as lagging the US.

Part 4: Reflections on the discipline

Donald A.S. Fraser: Analyses the divergences between the major schools of statistical thought by starting from a point where all approaches agree and seeking out the divergences from that common ground. Detailed enough to become too heavy for me.

James O. Berger: Another article on the interactions between frequentist and Bayesian thought, via conditional probabilities, and including a discussion of sequential hypothesis testing which was quite enlightening.

Arthur P. Dempster: Takes the opportunity to sell his own Dempster-Shafer theory of statistics. So far as I understand it, the core shift is providing a third-position "don't know" probability in a binary probability situation, such that all three probabilities sum to unity. The discussion of implications wasn't really concrete enough for me.

David B. Dunson: Attempts to introduce nonparametric Bayes, and could have succeeded with someone less dense than me. The discussion of broader issues in the field, and the connection between stats and ML was at least approachable, though.

Andrew Gelman: An approachable, high-level discussion of how we might choose statistical theories, with reference to Gelman's own history in Bayesianism

T. W. Anderson: Extremely dry, with no attempt to explain the significance of his topic in a manner a non-specialist would understand.

Pascal Massart: A discussion of non-asymptotic approaches to model selection. Written in a lively and approachable style, but the material is inherently at a level beyond me (unfortunately common in this collection), so I only got the faint outline of the novelty.

Norman E. Breslow: Interesting discussion of bad statistical practices being found in the wild in medical science. Accessible and entertaining.

Nancy Flournoy: Describes the history of the discovery of CMV transmission by blood donation, with some detailed footnotes on experimental design.

Ross L. Prentice: Punts for the importance of public health research, highlighting particular areas in need of development from a statistical or applied perspective. A little wordy, but has a clear enough message to make up for it.

Tze Leung Lai: Outreach opportunities for stats in finance and health care. Particularly, comparing the effectiveness of medical treatments (good trials are hard to design and often ignored because of null findings) and addressing the risk modelling flaws that led to the 2008 financial crisis.

Nan M. Laird: Quite poorly written, to say Laird is presumably a well-published scientist. Discusses meta-analysis methods (particularly Laird's), and their use in genetic epidemiology.

Alice S. Whittemore: Engaging discussion of personalised medicine and risk profiling. Biases in covariates for subgroups selected for analysis can be quite important, and handling multiple outcomes (like risk of both breast cancer and stroke), an issue that seems oddly poorly addressed at the moment.

Michael A. Newton: Reflects on the value of using old results in a new context, finding a use for Tukey's K-functions in standardising sample variances, an old nonparametric testing result in the weighted bootstrap problem, and a solution for the overlap of dust particles in the expected overlap of cancers from different origins.

Roderick J. A. Little: Husts for the field of survey sampling, which goes beyond the simple random sampling I'm familiar with to demonstrate population estimates from stratified sampling, and discusses the debate over model-based and design-based inference, with a model-based compromise.

Noel Cressie: Poster-piece for the field of environmental informations -- think bioinformatics but for environmental science -- which focuses on how uncertainty is modelled at different conceptual layers within the discipline. Concrete examples from remote sensing of atmospheric CO2 by satellites.

Elizabeth A. Thompson: Provides a rather good overview of the history of statistical genetics, humbly incorporating acknowledgement of the environmental factors that led to to the field, and closing on a personal note of general advice to young statisticians.

Mark van der Laan: Rather heavy and dry description of an area of statistical estimation, with some amusing if rather bruising remarks about current practices.

Grace Wahba: Describes a series of academic 'ah-ha' moments from her life, many of which relate to Reproducing Kernel Hilbert Spaces (RKHS), and seem to chart our her research career (and early developments of SVM as a classification tool). The device works very well, breaking up topics naturally and avoiding you having to remember too much detail. It helped that her area was somewhat closer to the machine learning problems with which I'm most familiar.

Robet J. Tibshirani: Lively general-audience introduction followed by something of a transition into lasso-penalised regression, which becomes more accessible as you read on, with an example from a cancer diagnosis project.

Jianqing Fan: Addresses the 'big data' hurdles and opportunities -- the latter being particularly for finding heterogeneous small groups whose members would be disregarded as outliers in smaller analyses. Shockingly well-communicated, with examples that slotted perfectly into the holes in my comprehension.

Larry A. Wasserman: Tackles the overlap between machine learning and statistics, and in particular what statisticians need to learn from machine learning research in order to stay relevant. Comes out strongly in favour of the conference culture of machine learning driving fast-paced innovation, and the value of thinking of problems computationally (i.e., having results which are actually applicable). Also reflects on what stats can bring to machine learning (more uncertainty quantification).

Xiao-Li Meng: Starts by urging statisticians to better market their subject, so that something like the COPSS award attracts at least as much attention as a Fields medal, then outlines three current grand challenges for ambitious students. His problems are well stated, and capture the imagination.

Part 5: Advice for the next generation

C. F. Jeff Wu: Rather cliched and poorly-written cobbling-together. It opened with dictionary quotes for inspiration, aspiration and ambition, then described various notables as using one or the other of these to support their achievements. I'm not sure what the point was here, but the anecdote about Pearson and Fisher was at least amusing. Worries about citation-counting, and appends a very short and seemingly directionless speech.

Raymond J. Carroll: Refreshingly uncensored personal reflections about academia and the field of statistics in particular, with advice of the sort you more generally find from a senior colleague with their guard down. Absolute gold, and exactly what young academics need to hear.

Marie Davidian: A solid blast of clear, practical advice for new academics, including protecting your research time, the importance of good scientific writing, and how you carry that out.

Donald B. Rubin: Amusing reflections on some rejected papers and what Rubin learnt from getting them rejected, and from withdrawing them to rewrite them. Handy for steeling your nerves.

Donald B. Rubin: A second helping! This time Rubin writes a more typical personal history, centred on his mentors, who are a surprisingly well-known cast.

Terry Speed: Strongly warns against taking or giving advice, suggesting the uncertain statistician explore randomisation instead. Disclaimer away, then advises making mistakes, accepting mediocrity (after all, most people are), and enthusiasm.

Bradley Efron: Critical advice for giving bad talks.