Abstract
It is now a common business practice to buy access to large language model (LLM) inference rather than self-host, because of significant upfront hardware infrastructure and energy costs. However, as a buyer, there is no mechanism to verify the authenticity of the advertised service including the serving hardware platform, e.g. that it is actually being served using an NVIDIA H100. Furthermore, there are reports suggesting that model providers may deliver models that differ slightly from the advertised ones, often to make them run on less expensive hardware. That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce \textbf{hardware and software platform inference (HSPI)} -- a method for identifying the underlying architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. Our method leverages the inherent differences of various architectures and compilers to distinguish between different types and software stacks. By analyzing the numerical patterns in the model's outputs, we propose a classification framework capable of accurately identifying the used for model inference as well as the underlying software configuration. Our findings demonstrate the feasibility of inferring type from black-box models. We evaluate HSPI against models served on different real hardware and find that in a white-box setting we can distinguish between different s with between 83.9% and 100% accuracy. Even in a black-box setting we are able to achieve results that are up to three times higher than random guess accuracy.
Community
Turns out it is possible to say what hardware is used to serve a model just from behaviours and from logits.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality (2024)
- Model Equality Testing: Which Model Is This API Serving? (2024)
- LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators (2024)
- CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment (2024)
- Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper