Making multilingual language models smaller
Johns Hopkins computer scientists have unveiled a new approach aimed at shrinking the size of multilingual language models
Multilingual language models (MLMs) are versatile in predicting, generating, and extracting text from multiple languages, aiding in cross-lingual communication and translation. However, as these models grow larger, they perform better when focused on a single language due to "language interference," where parameters optimized for one language can hinder performance in others.
A team from Johns Hopkins University devised Language-Specific Matrix Synthesis, aiming to optimize MLMs for multiple languages by reducing parameters. Instead of creating separate neural networks for each language, they used low-rank matrices, compressing data to need fewer parameters for accommodating new languages.
Their analogy likens this approach to providing a limited color palette (red, yellow, blue) to a classroom of children speaking different languages, allowing them to express themselves while using fewer resources. Tests showed that their method enhanced multilingual performance while using fewer parameters, enabling smaller yet equally effective language models.
This breakthrough could lead to highly efficient AI systems capable of understanding numerous languages, even in smaller devices, broadening the scope for multilingual applications beyond the current limitations. The team aims to apply this method to unwieldy MLMs, creating robust AI systems proficient in multiple languages, comparable to their performance in English.
Keep reading with a 7-day free trial
Subscribe to The PhilaVerse to keep reading this post and get 7 days of free access to the full post archives.