New Theory Cracks Open the Black Box of Deep Learning; Natalie Wolchover; In *Quanta Magazine*, also ~~syndicated out to~~ copied onto *Wired.com*; 2017-10-09; pdf.

Teaser: *A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.*

tl;dr → the “information bottleneck,” an explainer; as the metaphor.

and → <quote><snip/> that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.</quote>

## Mentions

- Deep Neural Networks (DNN)
- “deep-learning” algorithms
- <buzzzz>the architecture of the brain</buzzzz>
- architectures of networks
- Information is about…
- semantics, information is about semantics.
- relevance → information is about relevance.

- “deep belief net”
- renormalization
- “critical point”
- “stochastic gradient descent”
- “back-propagated”
- Whereas
- “certain” very large deep neural networks don’t seem to need a drawn-out long compression phase in order to generalize well.
- use: early stopping in memorization

- Naftali Tishby
*et al.*contra Andrew Saxe*et al.*disagree on approaches, classifications & capabiliteis of DNN algorithms;*e.g.*, the applicability of early stopping. - The two-phase learning model of “fitting & compression” is not similar to “the way” that children learn, attri uted to Brenden Lake.

#### Phases of Deep Learning

- “fitting” or “memorization”
- Is shorter (than the longer phase).The network learns labels for training data.
- “compression” or “forgetting”
- Is longer (than the shorter phase).

The network observes new data, to generalize against it. The network

optimizes (“becomes good at”) generalization, as measured differential with the (new) test data.

#### Factoids

- 330,000-connection-deep neural networks to recognize handwritten digits in that certain 60,000-image corpus.

Modified NIST database (National Institute of Standards and Technology) - adult [human] brains → “several hundred trillion” connections among circa 86 billion neurons.

#### Not Amenable [to DNNs or ML at all]

- Classifiability
- Discrete problems
- Cryptographic problems

## Who

- Alex Alemi, Staff, Google.

…quoted for color, background & verisimilitude; a booster. - William Bialek, Princeton University.
- Kyle Cranmer, physics, New York University.

…quoted for color, background & verisimilitude; a skeptic. - Geoffrey Hinton,…quoted for color, background & verisimilitude; is non-committal, “It’s extremely interesting.”
- Staff, Google
- Faculty, University of Toronto

- Brenden Lake, assistant professor, psychology & data science statistics, New York University.

In which a*data scientist*is a statistician who performs statistics on a Macintosh computer in San Francisco; and Prof. Lake’s employer is the university system of the State of New York. - Pankaj Mehta
- Ilya Nemenman, faculty, biophysics, Emory University.
- Fernando Pereira, staff, Google.
- David Schwab
- Andrew Saxe, staff, Harvard University.

Expertise: Artificial Intelligence, The Theory of The Science of The Study of The Neuron; a.k.a. neuroscience. - Ravid Shwartz-Ziv, graduate student, Hebrew University, Jerusalem, IL.

Advisor: Naftali Tishby - Naftali Tishby, Hebrew University, Jerusalem, IL.
- Noga Zaslavsky, graduate student, Emory Univerity.

Advisor: Ilya Nemenman.

#### Pantheon

- Stuart Russell, éminence grise.
- Claude Shannon, theorist.

## Referenced

- (perhaps) Naftali Tishby; Some Talk; Some Conference, in Berlin; On
*YouTube* - Naftali Tishby, Fernando C. Pereira, William Bialek; The Information Bottleneck Method; 1999 (2000-04-24); 18 pages; arXiv:physics/0004057, pdf.

<quote>first described [the “information bottleneck”] in purely theoretical terms </quote> - Ravid Shwartz-Ziv, Naftali Tishby; Opening the Black Box of Deep Neural Networks via Information; 2017-03-02 → 2017-04-29; 19 pages, arXiv:1703.00810

tl;dr → application of methods are reported. - Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy; Deep Variational Information Bottleneck; In
*Proceedings of Some Conference with the Acronym ICLR*(ICLR); 2017; 19 pages; arXiv:1612.00410, pdf

tl;dr → approximation methods are described. - Pankaj Mehta, David J. Schwab; An exact mapping between the Variational Renormalization Group and Deep Learning; 2014-10-14; 9 pages; arXiv:1410.3831.

tl;dr → <quote>surprising paper</quote>, per Natalie Wolchover. - Naftali Tishby, Noga Zaslavsky; Deep Learning and the Information Bottleneck Principle; In
*Proceedings of the IEEE Information Theory Workshop*(ITW); 2015-03-09; 9 pages; arXiv:1503.02406. - Modified National Institute of Standards and Technology (MNIST), a database.
- Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum; Human-level concept learning through probabilistic program induction; In
*Science*(Magazine); 2015.

tl;dr → suggests asserts without proof that the [human] brain may does deconstruct the handwritten letters into a series of previously-known hand strokes.

## Previously

In archaeological order, in *Quanta Magazine*…

- Clever Machines Learn How To Be Curious, 2017-09-19.
- Is Alphago Really Such A Big Deal, 2016-03-29.
- A Unified Theory Of Randomness 20160802/, 2016-08-02.
- Artificial Intelligence Aligned With Human Values Q&A With Stuart Russell, 2015-04-21.
- Deep Learning Relies On Renormalization Physicists Find, 2014-12-04.

.Previously filled.