Experimental ‘exo software’ allows you to cluster all your devices at home and split up AI model (e.g: Large Language Model (LLM)) to run your personal chatbot or other AI project.
Devices can include your Android phones, tablets, as well as computers running Windows, macOS or Linux.
This will you to use your various devices together to appear like one powerful GPU to the AI model.
‘Exo Software’ was successfully able to perform this by running Llama-3-70B at home using an iPhone 15 Pro Max, an iPad Pro M4, a Galaxy S24 Ultra, an M2 MacBook Pro, an M3 MacBook Pro, and two MSI Nvidia GTX 4090 graphics cards.
The exo software is compatible with Llama and other popular AI models. It also includes, through a one-line change in the application, a ChatGPT-compatible API for running models. You only need your compatible devices running Python 3.12.0 or higher to compile and run the software.
Once compiled and running, exo automatically discovers devices on your network to include in the cluster. It provides device equality using peer-to-peer connections.
While exo supports various partitioning strategies to distribute the work across devices, it defaults to a ring memory-weighted scheme that allocates the workload based on how much memory each device has.