AI Language Model Powers Up on Ancient Windows 98, Pentium II System!

Home » Technology » AI Language Model Powers Up on Ancient Windows 98, Pentium II System!
AI language model runs on a Windows 98 system with Pentium II and 128MB of RAM — Open-source AI flagbearers demonstrate Llama 2 LLM in extreme conditions

A Pentium II with 128MB of RAM achieved a remarkable 35.9 tokens per second.

EXO Labs recently shared an intriguing blog post detailing their experience with running the Llama AI model on a vintage Windows 98 system, complemented by a short social media video. The clip highlights an old Elonex Pentium II @ 350 MHz powering up with Windows 98, after which EXO launches its specialized C-based inference engine derived from Andrej Karpathy’s Llama2.c. They command the LLM to concoct a tale about “Sleepy Joe,” and impressively, it does so quite swiftly.

This groundbreaking achievement is just the beginning for EXO Labs. Emerging from obscurity in September, EXO Labs announced its mission to make AI accessible to all. Founded by a group from Oxford University, EXO is driven by the conviction that AI controlled by a few large corporations is detrimental to culture, truth, and societal foundations. Their goal is to create open infrastructure to train cutting-edge models and enable anyone to operate them on virtually any device. This demonstration using Windows 98 is a prime example of what’s possible with minimal resources.

The video shared on Twitter is quite brief, but thankfully, EXO’s blog post about Running Llama on Windows 98 offers more details. It is part of their “12 days of EXO” series, so there’s more to look forward to.

Acquiring an old Windows 98 PC on eBay was straightforward for EXO, but configuring it was not without its challenges. Data transfer was particularly tricky, leading them to utilize “good old FTP” to manage files through the retro machine’s Ethernet connection.

Adapting modern code to run on Windows 98 was another significant hurdle. Fortunately, they discovered Andrej Karpathy’s llama2.c — a lean 700-line C script capable of running inference on Llama 2 architecture models. Using the vintage Borland C++ 5.02 IDE and compiler, with a few adjustments, they managed to compile a Windows 98-compatible executable. The completed code is available on GitHub.

Alex Cheema from EXO praised Andrej Karpathy for his innovative code, which allowed a 260K LLM to run at 35.9 tok/sec on the old Windows 98 system. Karpathy, a former AI director at Tesla and co-founder of OpenAI, has contributed significantly to the field. Although a 260K LLM is relatively small, it still performed well on the dated 350 MHz single-core PC. According to the EXO blog, scaling up to a 15M LLM slowed generation speed to just over 1 tok/sec, while Llama 3.2 1B was extremely slow at 0.0093 tok/sec.

See also  BOE Considers Making Glass Substrates for China's CPUs in Semiconductor Push

BitNet: A Vision for the Future

The story goes beyond merely running an LLM on a Windows 98 system. EXO concludes its post by discussing its future aspirations with BitNet, a transformative architecture using ternary weights. This design means a 7B parameter model only needs 1.38GB of storage, a manageable load even for a 26-year-old Pentium II, and trivial for more recent hardware. BitNet prioritizes CPU usage, avoiding the costly need for GPUs and claims to be 50% more efficient than full-precision models, capable of supporting a 100B parameter model on a single CPU at human-like reading speeds (around 5 to 7 tok/sec).

Before wrapping up, it’s worth noting that EXO is still seeking collaborators. If you’re interested in preventing the monopolization of AI by large corporations and think you can help, consider reaching out. For more casual engagement, EXO hosts a Discord Retro channel where enthusiasts discuss running LLMs on vintage tech like old Macs, Gameboys, Raspberry Pis, and more.

Similar Posts

Rate this post
Share this :

Leave a Comment