SPACE: Structured Compression and Sharing of Representational Space for Continual Learning

Humans learn incrementally from sequential experiences throughout their lives, which has proven hard to emulate in artificial neural networks. Incrementally learning tasks causes neural networks to overwrite relevant information learned about older tasks, resulting in ‘Catastrophic Forget...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Gobinda Saha, Isha Garg, Aayush Ankit, Kaushik Roy
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/46706ff8450743849542b1376c3fdb24
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Humans learn incrementally from sequential experiences throughout their lives, which has proven hard to emulate in artificial neural networks. Incrementally learning tasks causes neural networks to overwrite relevant information learned about older tasks, resulting in &#x2018;Catastrophic Forgetting&#x2019;. Efforts to overcome this phenomenon often utilize resources poorly, for instance, by growing the network architecture or needing to save parametric importance scores, or violate data privacy between tasks. To tackle this, we propose SPACE, an algorithm that enables a network to learn continually and efficiently by partitioning the learnt space into a <italic>Core</italic> space, that serves as the condensed knowledge base over previously learned tasks, and a <italic>Residual</italic> space, which is akin to a scratch space for learning the current task. After learning each task, the Residual is analyzed for redundancy, both within itself and with the learnt Core space. A minimal number of extra dimensions required to explain the current task are added to the Core space and the remaining Residual is freed up for learning the next task. We evaluate our algorithm on P-MNIST, CIFAR and a sequence of 8 different datasets, and achieve comparable accuracy to the state-of-the-art methods while overcoming catastrophic forgetting. Additionally, our algorithm is well suited for practical use. The partitioning algorithm analyzes all layers in one shot, ensuring scalability to deeper networks. Moreover, the analysis of dimensions translates to filter-level sparsity, and the structured nature of the resulting architecture gives us up to 5x improvement in energy efficiency during task inference over the current state-of-the-art.