“Information Bottleneck” Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine

New Theory Cracks Open the Black Box of Deep Learning; Natalie Wolchover; In Quanta Magazine, also syndicated out to copied onto Wired.com; 2017-10-09; pdf.
Teaser: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

tl;dr → the “information bottleneck,” an explainer; as the metaphor.
and → <quote><snip/> that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.</quote>


  • Deep Neural Networks (DNN)
  • “deep-learning” algorithms
  • <buzzzz>the architecture of the brain</buzzzz>
  • architectures of networks
  • Information is about…
    • semantics, information is about semantics.
    • relevance → information is about relevance.
  • “deep belief net”
  • renormalization
  • “critical point”
  • “stochastic gradient descent”
  • “back-propagated”
  • Whereas
    • “certain” very large deep neural networks don’t seem to need a drawn-out long compression phase in order to generalize well.
    • use: early stopping in memorization
  • Naftali Tishby et al. contra Andrew Saxe et al. disagree on approaches, classifications & capabiliteis of DNN algorithms; e.g., the applicability of early stopping.
  • The two-phase learning model of “fitting & compression” is not similar to “the way” that children learn, attri uted to Brenden Lake.

Phases of Deep Learning

“fitting” or “memorization”
Is shorter (than the longer phase).The network learns labels for training data.
“compression” or “forgetting”
Is longer (than the shorter phase).
The network observes new data, to generalize against it. The network
optimizes (“becomes good at”) generalization, as measured differential with the (new) test data.


  • 330,000-connection-deep neural networks to recognize handwritten digits in that certain 60,000-image corpus.
    Modified NIST database (National Institute of Standards and Technology)
  • adult [human] brains → “several hundred trillion” connections among circa 86 billion neurons.

Not Amenable [to DNNs or ML at all]

  • Classifiability
  • Discrete problems
  • Cryptographic problems


  • Alex Alemi, Staff, Google.
    …quoted for color, background & verisimilitude; a booster.
  • William Bialek, Princeton University.
  • Kyle Cranmer, physics, New York University.
    …quoted for color, background & verisimilitude; a skeptic.
  • Geoffrey Hinton,…quoted for color, background & verisimilitude; is non-committal, “It’s extremely interesting.”
    • Staff, Google
    • Faculty, University of Toronto
  • Brenden Lake, assistant professor, psychology & data science statistics, New York University.
    In which a data scientist is a statistician who performs statistics on a Macintosh computer in San Francisco; and Prof. Lake’s employer is the university system of the State of New York.
  • Pankaj Mehta
  • Ilya Nemenman, faculty, biophysics, Emory University.
  • Fernando Pereira, staff, Google.
  • David Schwab
  • Andrew Saxe, staff, Harvard University.
    Expertise: Artificial Intelligence, The Theory of The Science of The Study of The Neuron; a.k.a. neuroscience.
  • Ravid Shwartz-Ziv, graduate student, Hebrew University, Jerusalem, IL.
    Advisor: Naftali Tishby
  • Naftali Tishby, Hebrew University, Jerusalem, IL.
  • Noga Zaslavsky, graduate student, Emory Univerity.
    Advisor: Ilya Nemenman.


  • Stuart Russell, éminence grise.
  • Claude Shannon, theorist.


  • (perhaps) Naftali Tishby; Some Talk; Some Conference, in Berlin; On YouTube
  • Naftali Tishby, Fernando C. Pereira, William Bialek; The Information Bottleneck Method; 1999 (2000-04-24); 18 pages; arXiv:physics/0004057, pdf.
    <quote>first described [the “information bottleneck”] in purely theoretical terms </quote>
  • Ravid Shwartz-Ziv, Naftali Tishby; Opening the Black Box of Deep Neural Networks via Information; 2017-03-02 → 2017-04-29; 19 pages, arXiv:1703.00810
    tl;dr → application of methods are reported.
  • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy; Deep Variational Information Bottleneck; In Proceedings of Some Conference with the Acronym ICLR (ICLR); 2017; 19 pages; arXiv:1612.00410, pdf
    tl;dr → approximation methods are described.
  • Pankaj Mehta, David J. Schwab; An exact mapping between the Variational Renormalization Group and Deep Learning; 2014-10-14; 9 pages; arXiv:1410.3831.
    tl;dr → <quote>surprising paper</quote>, per Natalie Wolchover.
  • Naftali Tishby, Noga Zaslavsky; Deep Learning and the Information Bottleneck Principle; In Proceedings of the IEEE Information Theory Workshop (ITW); 2015-03-09; 9 pages; arXiv:1503.02406.
  • Modified National Institute of Standards and Technology (MNIST), a database.
  • Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum; Human-level concept learning through probabilistic program induction; In Science (Magazine); 2015.
    tl;dr → suggests asserts without proof that the [human] brain may does deconstruct the handwritten letters into a series of previously-known hand strokes.


In archaeological order, in Quanta Magazine

.Previously filled.

Revisiting the convergence of Metcalfe’s Law, Shannon’s Law and McLuhan’s “The Medium is the Message” | Excapite

; Revisiting the convergence of Metcalfe’s Law, Shannon’s Law and McLuhan’s “The Medium is the Message”; In Excapite; 2014-02-23.


  • MobCon‘s Law => as the rate of network adoption increases, the price of software decreases
  • <quote>Over the 3 generations the network effect translates into 100:60:30. i.e. 2nd Generation generates 60% of the first Generation Revenues. The 3rd Generation generates 30% of the Second Generation Revenues. This suggests the 4th Generation will be generating 15% of the 3rd Generation Revenues when it achieves 1 Billion Users.</quote>
  • <quote>The new market reality for software developers is there has never been more customers but prices have never been this low.</quote>
  • <quote>Media has been unbundled by the network. And by that I mean what once was sold as a wholesale product (Think Newspaper, Magazine or Music Album) is now sold as fragments (Think pages and songs). So too with software. Software is under going the same unbundling. What was sold as a wholesale bundle of function points (Think: COTS software) is now being unbundled and sold off as function points and limited functionality (Think: API’s and Apps). The reason being of course, when it comes to the long wave of the product cycle the spreadsheet and the word processor, 30 to 40 years on, is looking very much like end of cycle, and is under the types of market pressures one would expect of a mature market.</quote>
  • <quote>Google’s Search Engine is a function point endlessly combinatorial. Facebook is a collection of function points. To disrupt Google you need to build a better function point. To disrupt Facebook, as with Microsoft, you need merely to fragment the Platform</quote>



Table of Contents

Via: backfill