“Information Bottleneck” Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine

New Theory Cracks Open the Black Box of Deep Learning; Natalie Wolchover; In Quanta Magazine, also syndicated out to copied onto Wired.com; 2017-10-09; pdf.
Teaser: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

tl;dr → the “information bottleneck,” an explainer; as the metaphor.
and → <quote><snip/> that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.</quote>


  • Deep Neural Networks (DNN)
  • “deep-learning” algorithms
  • <buzzzz>the architecture of the brain</buzzzz>
  • architectures of networks
  • Information is about…
    • semantics, information is about semantics.
    • relevance → information is about relevance.
  • “deep belief net”
  • renormalization
  • “critical point”
  • “stochastic gradient descent”
  • “back-propagated”
  • Whereas
    • “certain” very large deep neural networks don’t seem to need a drawn-out long compression phase in order to generalize well.
    • use: early stopping in memorization
  • Naftali Tishby et al. contra Andrew Saxe et al. disagree on approaches, classifications & capabiliteis of DNN algorithms; e.g., the applicability of early stopping.
  • The two-phase learning model of “fitting & compression” is not similar to “the way” that children learn, attri uted to Brenden Lake.

Phases of Deep Learning

“fitting” or “memorization”
Is shorter (than the longer phase).The network learns labels for training data.
“compression” or “forgetting”
Is longer (than the shorter phase).
The network observes new data, to generalize against it. The network
optimizes (“becomes good at”) generalization, as measured differential with the (new) test data.


  • 330,000-connection-deep neural networks to recognize handwritten digits in that certain 60,000-image corpus.
    Modified NIST database (National Institute of Standards and Technology)
  • adult [human] brains → “several hundred trillion” connections among circa 86 billion neurons.

Not Amenable [to DNNs or ML at all]

  • Classifiability
  • Discrete problems
  • Cryptographic problems


  • Alex Alemi, Staff, Google.
    …quoted for color, background & verisimilitude; a booster.
  • William Bialek, Princeton University.
  • Kyle Cranmer, physics, New York University.
    …quoted for color, background & verisimilitude; a skeptic.
  • Geoffrey Hinton,…quoted for color, background & verisimilitude; is non-committal, “It’s extremely interesting.”
    • Staff, Google
    • Faculty, University of Toronto
  • Brenden Lake, assistant professor, psychology & data science statistics, New York University.
    In which a data scientist is a statistician who performs statistics on a Macintosh computer in San Francisco; and Prof. Lake’s employer is the university system of the State of New York.
  • Pankaj Mehta
  • Ilya Nemenman, faculty, biophysics, Emory University.
  • Fernando Pereira, staff, Google.
  • David Schwab
  • Andrew Saxe, staff, Harvard University.
    Expertise: Artificial Intelligence, The Theory of The Science of The Study of The Neuron; a.k.a. neuroscience.
  • Ravid Shwartz-Ziv, graduate student, Hebrew University, Jerusalem, IL.
    Advisor: Naftali Tishby
  • Naftali Tishby, Hebrew University, Jerusalem, IL.
  • Noga Zaslavsky, graduate student, Emory Univerity.
    Advisor: Ilya Nemenman.


  • Stuart Russell, éminence grise.
  • Claude Shannon, theorist.


  • (perhaps) Naftali Tishby; Some Talk; Some Conference, in Berlin; On YouTube
  • Naftali Tishby, Fernando C. Pereira, William Bialek; The Information Bottleneck Method; 1999 (2000-04-24); 18 pages; arXiv:physics/0004057, pdf.
    <quote>first described [the “information bottleneck”] in purely theoretical terms </quote>
  • Ravid Shwartz-Ziv, Naftali Tishby; Opening the Black Box of Deep Neural Networks via Information; 2017-03-02 → 2017-04-29; 19 pages, arXiv:1703.00810
    tl;dr → application of methods are reported.
  • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy; Deep Variational Information Bottleneck; In Proceedings of Some Conference with the Acronym ICLR (ICLR); 2017; 19 pages; arXiv:1612.00410, pdf
    tl;dr → approximation methods are described.
  • Pankaj Mehta, David J. Schwab; An exact mapping between the Variational Renormalization Group and Deep Learning; 2014-10-14; 9 pages; arXiv:1410.3831.
    tl;dr → <quote>surprising paper</quote>, per Natalie Wolchover.
  • Naftali Tishby, Noga Zaslavsky; Deep Learning and the Information Bottleneck Principle; In Proceedings of the IEEE Information Theory Workshop (ITW); 2015-03-09; 9 pages; arXiv:1503.02406.
  • Modified National Institute of Standards and Technology (MNIST), a database.
  • Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum; Human-level concept learning through probabilistic program induction; In Science (Magazine); 2015.
    tl;dr → suggests asserts without proof that the [human] brain may does deconstruct the handwritten letters into a series of previously-known hand strokes.


In archaeological order, in Quanta Magazine

.Previously filled.