Researchers at Meta AI have made a significant breakthrough in tackling the tokenization challenge associated with GPT (Generative Pre-trained Transformer) models.
In their recent pre-print research, Meta AI introduces the innovative Megabyte framework for constructing GPT systems. This architecture aims to process large-scale datasets, including images, novels, and videos, without relying on tokenization, a lossy compression process.
Tokenization, which converts bytes to tokens for processing, enables AI systems to handle data as numerical sequences. For example, the sentence “my favorite color is red” would be transformed into the token string “3666, 4004, 3124, 318, 2266, 13” for processing by OpenAI’s ChatGPT.
However, the current state-of-the-art systems still face limitations even with tokenization. GPT-3.5 can process slightly over 4,000 tokens, while GPT-4 has a limit of around 32,000 tokens.
Megabyte, on the other hand, introduces a novel multi-layer prediction architecture that operates in an end-to-end manner, enabling modeling of more than 1 million bytes of data. This revolutionary approach eliminates the need for tokenization, providing a 3,025% increase in text capacity compared to GPT-4. Megabyte can process text documents with approximately 750,000 words, equivalent to Leo Tolstoy’s War and Peace and two additional average-length novels.
The Megabyte model exhibits excellent performance in ImageNet tests and audio file processing benchmarks, often outperforming existing byte-based transformer models like DeepMind’s Perciever AR. It matches PerceiverAR’s state-of-the-art performance while utilizing only half the computational resources.
This research has profound implications, as overcoming tokenization barriers enables training AI models with enhanced support for non-English languages and facilitates the democratization of these technologies worldwide. It empowers the development of cryptocurrency trading bots, decentralized autonomous organizations, and other applications using native language codes.
Moreover, Megabyte expands the capabilities of models like ChatGPT to handle multimedia files, such as images, videos, and audio, with similar time and energy consumption as text processing, opening up new possibilities for AI applications.