  Authors:
  • Vladimir Feinberg Google DeepMind

    Google DeepMind

  • Xinyi Chen Princeton University, Google DeepMind

    Princeton University, Google DeepMind

  • Y. Jennifer Sun Princeton University, Google DeepMind

    Princeton University, Google DeepMind

  • Rohan Anil Google DeepMind

    Google DeepMind

  • Elad Hazan Princeton University, Google DeepMind

    Princeton University, Google DeepMind

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsDecember 2023Article No.: 3316Pages 75911–75924

Published:30 May 2024

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Sketchy: memory-efficient adaptive regularization with frequent directions

Pages 75911–75924


Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank k: in the online convex optimization (OCO) setting over dimension d, we match full-matrix d2 memory regret using only dk memory up to additive error in the bottom d - k eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.

Supplemental Material

Available for Download


3666122.3669438_supp.pdf (426.3 KB)

Supplemental material.


    • The sketchy database: learning to retrieve badly drawn bunnies

      We present the Sketchy database, the first large-scale collection of sketch-photo pairs. We ask crowd workers to sketch particular photographic objects sampled from 125 categories and acquire 75,471 sketches of 12,500 objects. The Sketchy database gives ...

      Read More

    • On the HSS iteration methods for positive definite Toeplitz linear systems

      We study the HSS iteration method for large sparse non-Hermitian positive definite Toeplitz linear systems, which first appears in Bai, Golub and Ng's paper published in 2003 [Z.-Z. Bai, G.H. Golub, M.K. Ng, Hermitian and skew-Hermitian splitting ...

      Read More

    • Restrictive preconditioners for conjugate gradient methods for symmetric positive definite linear systems

      The restrictively preconditioned conjugate gradient (RPCG) method for solving large sparse system of linear equations of a symmetric positive definite and block two-by-two coefficient matrix is further studied. In fact, this RPCG method is essentially ...

      Read More

    Get this Publication

    Published in

      Sketchy | Proceedings of the 37th International Conference on Neural Information Processing Systems (89)

      NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

      December 2023

      80772 pages

      Editors:
      • A. Oh,
      • T. Naumann,
      • A. Globerson,
      • K. Saenko,
      • M. Hardt,
      • S. Levine

      Copyright © 2023 Neural Information Processing Systems Foundation, Inc.




          Curran Associates Inc.

          Red Hook, NY, United States

          Publication History

          Published: 30 May 2024


