Context. While on-device LLMs offer higher privacy over their remotely-hosted counterparts and do not require Internet connectivity, their energy consumption on the client device still remains insufficiently investigated. Goal. This study empirically evaluates the energy usage of client devices when fetching LLM-generated content on-device versus from a remote server. Our goal is to help software developers make informed decisions on the most energy-efficient method for fetching content in different scenarios, so as to optimize the client device's energy consumption. Method. We conduct a controlled experiment with seven LLMs with varying parameter sizes running on a MacBook Pro M2 and on a remote server. The experiment involves fetching content of different lengths from the LLMs deployed either on-device or remotely, while measuring the client device's energy usage and performance metrics such as execution time, CPU, GPU, and memory usage. Results. Fetching LLM-generated content from a remote server uses 3.5 to 8.9 times less energy compared to the on-device method, with a large effect size. We observe a consistent strong positive correlation between energy usage and execution time across all content lengths and fetch methods. For the on-device method, GPU and memory usage are positively correlated with energy usage. Conclusions. We recommend offloading LLM-generated content to a remote server rather than generating it on-device to optimize energy efficiency on the client side. LLM maintainers should optimize on-device LLMs in terms of execution time and computational resources.

On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content

Scoccia, Gian Luca;Malavolta, Ivano
2025-01-01

Abstract

Context. While on-device LLMs offer higher privacy over their remotely-hosted counterparts and do not require Internet connectivity, their energy consumption on the client device still remains insufficiently investigated. Goal. This study empirically evaluates the energy usage of client devices when fetching LLM-generated content on-device versus from a remote server. Our goal is to help software developers make informed decisions on the most energy-efficient method for fetching content in different scenarios, so as to optimize the client device's energy consumption. Method. We conduct a controlled experiment with seven LLMs with varying parameter sizes running on a MacBook Pro M2 and on a remote server. The experiment involves fetching content of different lengths from the LLMs deployed either on-device or remotely, while measuring the client device's energy usage and performance metrics such as execution time, CPU, GPU, and memory usage. Results. Fetching LLM-generated content from a remote server uses 3.5 to 8.9 times less energy compared to the on-device method, with a large effect size. We observe a consistent strong positive correlation between energy usage and execution time across all content lengths and fetch methods. For the on-device method, GPU and memory usage are positively correlated with energy usage. Conclusions. We recommend offloading LLM-generated content to a remote server rather than generating it on-device to optimize energy efficiency on the client side. LLM maintainers should optimize on-device LLMs in terms of execution time and computational resources.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12571/35584
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact