NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...1 min read

Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

Share
NOW LET US Article – Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

zml-smi is a universal diagnostic and monitoring tool for GPUs, TPUs, and NPUs, offering real-time performance insights across NVIDIA, AMD, Google TPU, and AWS Trainium platforms.

zml-smi is a universal diagnostic and monitoring tool for GPUs, TPUs and NPUs. It provides real-time insights into the performance and health of your hardware. It is a mix between nvidia-smi and nvtop. It transparently supports all the platforms ZML supports. That is NVIDIA, AMD, Google TPU and AWS Trainium devices. It will be extended to support more platforms in the future as ZML continues to expand its hardware support.

Getting started

You can download zml-smi from the official mirror.

$ curl -LO 'https://mirror.zml.ai/zml-smi/zml-smi-v0.2.tar.zst'
$ tar -xf zml-smi-v0.2.tar.zst
$ ./zml-smi/zml-smi

Listing devices

$ zml-smi

Monitoring devices

The --top flag provides real-time monitoring of device performance, including utilization, temperature, and memory usage.

$ zml-smi --top

Completely sandboxed

zml-smi doesn’t require any software on the target machine besides the device driver and the GLIBC.

Metrics

Host

zml-smi displays host-level metrics such as CPU model and utilization, memory usage, and temperature.

Processes

zml-smi also provides insights into the processes utilizing the devices, including their resource usage and command lines.

NVIDIA

Metrics are given through the NVML library, which ships with the driver.

AMD

Metrics are provided through the AMD SMI library. In order to support the latest AMD GPUs, zml-smi at build time downloads the amdgpu.ids file and merges them. This allows support for models like Ryzen AI Max+ 395 (Strix Halo) even before official ROCm releases. We created a shared object named zmlxrocm.so to intercept fopen64 calls and redirect them to the sandboxed file.

TPU

Metrics are provided via the local gRPC endpoint exposed by the TPU runtime, including TensorCore Duty Cycle and HBM usage.

AWS Trainium

Metrics are provided through a private API found in libnrt.so, including Core Utilization and HBM usage.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – GLM 5.2 Is Out

dev-tools

GLM 5.2 Is Out

Zhipu AI has officially released GLM-5.2, its most powerful open-source model to date, featuring a 1M context window and advanced long-horizon task capabilities. The release underscores Zhipu's commitment to open-source AI and global scientific collaboration amid rising technological restrictions.

NOW LET US Related – Noise infusion banned from statistical products published by Census Bureau

dev-tools

Noise infusion banned from statistical products published by Census Bureau

The U.S. Department of Commerce has banned "noise infusion" from statistical products published by the Census Bureau, a decision that could have severe consequences for both data utility and privacy protection.

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Every Frame Perfect

dev-tools

Every Frame Perfect

In UI design, perfection isn't just about the start and end states, but every single transition frame in between. Polishing these micro-interactions is key to building user trust.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.