Context. While on-device LLMs offer higher privacy over their remotely-hosted counterparts and do not require Internet connectivity, their energy consumption on the client device still remains insufficiently investigated. Goal. This study empirically evaluates the energy usage of client devices when fetching LLM-generated content on-device versus from a remote server. Our goal is to help software developers make informed decisions on the most energy-efficient method for fetching content in different scenarios, so as to optimize the client device's energy consumption. Method. We conduct a controlled experiment with seven LLMs with varying parameter sizes running on a MacBook Pro M2 and on a remote server. The experiment involves fetching content of different lengths from the LLMs deployed either on-device or remotely, while measuring the client device's energy usage and performance metrics such as execution time, CPU, GPU, and memory usage. Results. Fetching LLM-generated content from a remote server uses 3.5 to 8.9 times less energy compared to the on-device method, with a large effect size. We observe a consistent strong positive correlation between energy usage and execution time across all content lengths and fetch methods. For the on-device method, GPU and memory usage are positively correlated with energy usage. Conclusions. We recommend offloading LLM-generated content to a remote server rather than generating it on-device to optimize energy efficiency on the client side. LLM maintainers should optimize on-device LLMs in terms of execution time and computational resources.
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content
Scoccia, Gian Luca;Malavolta, Ivano
2025-01-01
Abstract
Context. While on-device LLMs offer higher privacy over their remotely-hosted counterparts and do not require Internet connectivity, their energy consumption on the client device still remains insufficiently investigated. Goal. This study empirically evaluates the energy usage of client devices when fetching LLM-generated content on-device versus from a remote server. Our goal is to help software developers make informed decisions on the most energy-efficient method for fetching content in different scenarios, so as to optimize the client device's energy consumption. Method. We conduct a controlled experiment with seven LLMs with varying parameter sizes running on a MacBook Pro M2 and on a remote server. The experiment involves fetching content of different lengths from the LLMs deployed either on-device or remotely, while measuring the client device's energy usage and performance metrics such as execution time, CPU, GPU, and memory usage. Results. Fetching LLM-generated content from a remote server uses 3.5 to 8.9 times less energy compared to the on-device method, with a large effect size. We observe a consistent strong positive correlation between energy usage and execution time across all content lengths and fetch methods. For the on-device method, GPU and memory usage are positively correlated with energy usage. Conclusions. We recommend offloading LLM-generated content to a remote server rather than generating it on-device to optimize energy efficiency on the client side. LLM maintainers should optimize on-device LLMs in terms of execution time and computational resources.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


