New Theory Cracks Open the Black Box of Deep Learning; Natalie Wolchover; In Quanta Magazine, also
syndicated out to copied onto Wired.com; 2017-10-09; pdf.
Teaser: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.
tl;dr → the “information bottleneck,” an explainer; as the metaphor.
and → <quote><snip/> that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.</quote>
- Deep Neural Networks (DNN)
- “deep-learning” algorithms
- <buzzzz>the architecture of the brain</buzzzz>
- architectures of networks
- Information is about…
- semantics, information is about semantics.
- relevance → information is about relevance.
- “deep belief net”
- “critical point”
- “stochastic gradient descent”
- “certain” very large deep neural networks don’t seem to need a drawn-out long compression phase in order to generalize well.
- use: early stopping in memorization
- Naftali Tishby et al. contra Andrew Saxe et al. disagree on approaches, classifications & capabiliteis of DNN algorithms; e.g., the applicability of early stopping.
- The two-phase learning model of “fitting & compression” is not similar to “the way” that children learn, attri uted to Brenden Lake.
Phases of Deep Learning
- “fitting” or “memorization”
- Is shorter (than the longer phase).The network learns labels for training data.
- “compression” or “forgetting”
- Is longer (than the shorter phase).
The network observes new data, to generalize against it. The network
optimizes (“becomes good at”) generalization, as measured differential with the (new) test data.
- 330,000-connection-deep neural networks to recognize handwritten digits in that certain 60,000-image corpus.
Modified NIST database (National Institute of Standards and Technology)
- adult [human] brains → “several hundred trillion” connections among circa 86 billion neurons.
Not Amenable [to DNNs or ML at all]
- Discrete problems
- Cryptographic problems
- Alex Alemi, Staff, Google.
…quoted for color, background & verisimilitude; a booster.
- William Bialek, Princeton University.
- Kyle Cranmer, physics, New York University.
…quoted for color, background & verisimilitude; a skeptic.
- Geoffrey Hinton,…quoted for color, background & verisimilitude; is non-committal, “It’s extremely interesting.”
- Staff, Google
- Faculty, University of Toronto
- Brenden Lake, assistant professor, psychology & data science statistics, New York University.
In which a data scientist is a statistician who performs statistics on a Macintosh computer in San Francisco; and Prof. Lake’s employer is the university system of the State of New York.
- Pankaj Mehta
- Ilya Nemenman, faculty, biophysics, Emory University.
- Fernando Pereira, staff, Google.
- David Schwab
- Andrew Saxe, staff, Harvard University.
Expertise: Artificial Intelligence, The Theory of The Science of The Study of The Neuron; a.k.a. neuroscience.
- Ravid Shwartz-Ziv, graduate student, Hebrew University, Jerusalem, IL.
Advisor: Naftali Tishby
- Naftali Tishby, Hebrew University, Jerusalem, IL.
- Noga Zaslavsky, graduate student, Emory Univerity.
Advisor: Ilya Nemenman.
- Stuart Russell, éminence grise.
- Claude Shannon, theorist.
- (perhaps) Naftali Tishby; Some Talk; Some Conference, in Berlin; On YouTube
- Naftali Tishby, Fernando C. Pereira, William Bialek; The Information Bottleneck Method; 1999 (2000-04-24); 18 pages; arXiv:physics/0004057, pdf.
<quote>first described [the “information bottleneck”] in purely theoretical terms </quote>
- Ravid Shwartz-Ziv, Naftali Tishby; Opening the Black Box of Deep Neural Networks via Information; 2017-03-02 → 2017-04-29; 19 pages, arXiv:1703.00810
tl;dr → application of methods are reported.
- Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy; Deep Variational Information Bottleneck; In Proceedings of Some Conference with the Acronym ICLR (ICLR); 2017; 19 pages; arXiv:1612.00410, pdf
tl;dr → approximation methods are described.
- Pankaj Mehta, David J. Schwab; An exact mapping between the Variational Renormalization Group and Deep Learning; 2014-10-14; 9 pages; arXiv:1410.3831.
tl;dr → <quote>surprising paper</quote>, per Natalie Wolchover.
- Naftali Tishby, Noga Zaslavsky; Deep Learning and the Information Bottleneck Principle; In Proceedings of the IEEE Information Theory Workshop (ITW); 2015-03-09; 9 pages; arXiv:1503.02406.
- Modified National Institute of Standards and Technology (MNIST), a database.
- Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum; Human-level concept learning through probabilistic program induction; In Science (Magazine); 2015.
tl;dr → suggests asserts without proof that the [human] brain may does deconstruct the handwritten letters into a series of previously-known hand strokes.
In archaeological order, in Quanta Magazine…
- Clever Machines Learn How To Be Curious, 2017-09-19.
- Is Alphago Really Such A Big Deal, 2016-03-29.
- A Unified Theory Of Randomness 20160802/, 2016-08-02.
- Artificial Intelligence Aligned With Human Values Q&A With Stuart Russell, 2015-04-21.
- Deep Learning Relies On Renormalization Physicists Find, 2014-12-04.