You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi

03-14-2023 • https://arstechnica.com, BENJ EDWARDS

Things are moving at lightning speed in AI Land. On Friday, a software developer named Georgi Gerganov created a tool called "llama.cpp" that can run Meta's new GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. Soon thereafter, people worked out how to run LLaMA on Windows as well. Then someone showed it running on a Pixel 6 phone, and next came a Raspberry Pi (albeit running very slowly).

If this keeps up, we may be looking at a pocket-sized ChatGPT competitor before we know it.

But let's back up a minute, because we're not quite there yet. (At least not today—as in literally today, March 13, 2023.) But what will arrive next week, no one knows.

Since ChatGPT launched, some people have been frustrated by the AI model's built-in limits that prevent it from discussing topics that OpenAI has deemed sensitive. Thus began the dream—in some quarters—of an open source large language model (LLM) that anyone could run locally without censorship and without paying API fees to OpenAI.

Open source solutions do exist (such as GPT-J), but they require a lot of GPU RAM and storage space. Other open source alternatives could not boast GPT-3-level performance on readily available consumer-level hardware.

Enter LLaMA, an LLM available in parameter sizes ranging from 7B to 65B (that's "B" as in "billion parameters," which are floating point numbers stored in matrices that represent what the model "knows"). LLaMA made a heady claim: that its smaller-sized models could match OpenAI's GPT-3, the foundational model that powers ChatGPT, in the quality and speed of its output. There was just one problem—Meta released the LLaMA code open source, but it held back the "weights" (the trained "knowledge" stored in a neural network) for qualified researchers only.