Skip to main content

GreenPT introduces a thought piece about a transparency‑focused metric for AI sustainability: mWh per 100 tokens. GreenPT users get real-time estimated feedback on energy use, promoting more efficient, lower-impact AI use.

The system allows carbon impact to be calculated at the session or prompt level, and is designed to support Software Carbon Intensity (SCI) reporting. By aligning with ISO standards, GreenPT ensures clear, accountable, and actionable guidance for improving sustainability across intelligent systems.

Introduction

The AI sector is finally starting to measure what it uses. Recent work from Google on prompt-level energy, carbon, and water for Gemini, and Mistral’s first end-to-end disclosure of LLM environmental impacts, has moved the conversation beyond back-of-the-envelope estimates.

But much remains opaque: data centers are booming, the wattage of inference keeps rising, and yet no one can say, with confidence, what a real interaction actually costs, or how to compare runs across models and setups.

With GreenPT, we push the lid fully open, although this is a first step, not a final word.

Because we self-host our models and meters, we can expose the true operational costs, the good, the bad, and the awkward. The goal is not perfection. It’s radical transparency that others can replicate, critique, and improve. If the field is to scale responsibly, we need a metric that makes efficiency a product decision, not an afterthought. Today, we begin with one you can run, audit, and act on.

A new van, an unknown road

Emma had just picked up her brand-new electric van, sleek, silent, and full of promise. She wasn’t following a GPS. There was no map, no timeline, just a single intention: to drive until a view made her stop, not because she had to, but because it felt like the destination.

Her first steps were tentative on urban streets pulsing with movement and impatience. Traffic lights blinked red too often, and every jam forced her foot between brake and accelerator. She watched the consumption display kWh per 100 km tick upward with each inefficient maneuver. It didn’t lie. It told her that eco-driving wasn’t just about the van’s battery, it was about her. How she accelerated, how often she idled, how aware she was of momentum. In the quiet cabin, she realized she wasn’t just driving the van, but was measuring her choices.

Leaving the city’s buzz behind, she eased into suburban loops and finally the open road. With fewer stops and better speed control, the energy consumption dipped. A roadside fruit stand appeared as a bright burst of color and life. The elderly couple tending it offered more than strawberries; they shared old road tales and insights about local driving culture. Emma lingered, learning how regional conditions, road gradients, and even wind direction could alter efficiency.

She drove on, experimenting. Could she coast through the next downhill without regenerative braking? How would cruise control handle the winding backroads? She turned off climate control to see its effect, watching consumption drop slightly but feeling the discomfort rise. Driving became a dialogue. Every action had a consequence.

In the highlands, fog crept in. Emma flipped on the fog lights. Then the heater. Then the wipers. She was warm, safe, and consuming more power. The kWh/100 km gauge crept upward again. She paused at a multicultural food truck plaza, where she spoke with a delivery driver who explained how terrain and tire pressure, yes, even that, affected his daily battery projections. Curious now, Emma checked hers.

By now, she saw the gauge not as a performance stat, but as a story of interaction. Wind resistance on the coastal stretch. Headwinds versus tailwinds. Heavy snacks in the backseat? Even that added load had an effect. Every decision, hers or nature’s, wrote another line into the metric. The drive had become a canvas for learning, adaptation, and mindfulness.

Eventually, atop a windswept meadow bathed in golden light, she pulled over. Not because she needed to charge. Not because she was lost. Because something in the vast, open, and quiet spoke to her.

She had arrived. And this time, it wasn’t just a scenic stop. It was the reward for having driven consciously.

From roads to runtime: understanding AI’s energy journey

Emma’s journey was never just about reaching a destination. It was about how she got there and what it cost, not just in battery percentage, but in impact. Every turn of the wheel shaped her electric van’s kWh per 100 km metric, and every stop added insight. She began to see energy use not only as a number, but as a relationship between choices, context, and consequence.
We believe intelligent systems deserve the same transparency. That’s why we developed a new standard:

mWh per 100 tokens (sampled at 1-second intervals)

It’s not just a performance metric; it’s a map of the journey, tailored to AI.

We invite the community to use this metric, test it across their own workloads, and share results with us. Broader participation is essential to move beyond energy totals and toward workload-based, comparable, and transparent AI energy measurements.

Tokens = Distance

In Emma’s van, kilometers marked physical progress. In AI, tokens are our measure of computational distance, the amount of content your system processes. Just as longer drives demand more energy, longer prompts or outputs naturally increase usage. But how efficiently that “distance” is covered depends on the route model architecture, prompt structure, and system settings.

Clarifying Token Scope

In GreenPT, all references to “tokens” refer specifically to output tokens generated by the model during inference. Input tokens are not included in the metric. This ensures that mWh per 100 tokens reflects the energy cost of producing new model output, making comparisons consistent across prompts, workloads, and models.

mWh = Energy

This is our energy use, the electrical work performed to generate AI outputs. Just as Emma watched her van’s battery level shift in real time, AI teams can now see the live energy signature of each prompt, expressed directly in milliwatt-hours.

CO₂ = Cost

Energy is one thing, but impact is another. That’s where our CO₂ layer comes in. Much like price per kWh varies by region and time, we calculate carbon intensity dynamically, factoring in:

  • Grid location (e.g, coal-heavy vs. renewables),
  • Time of day (peak load vs. surplus hours),
  • Embodied carbon (the upstream impact of the infrastructure you’re running on).

CO₂ becomes a cost layer, converting energy into environmental consequences. The cleaner the context, the lower the carbon per mWh, just as off-peak charging is cheaper and greener for EVs.

1-Second Sampling = Live Feedback

Real-time sampling provides continuous visibility into how estimated energy and token generation accumulate during inference. This allows users to adjust prompts, manage load, or select lighter models while the system is running, responding to efficiency changes as they emerge.

Session Time = Journey Duration

A single prompt may be a turn, but a session is the entire trip. Total session time reflects the AI’s active engagement window. A session that meanders or idles consumes more, just as leaving your EV running while parked quietly drains its reserves.

Why it matters

This metric framework shifts AI usage from passive to participatory. It doesn’t just say how much was used, it shows why. It turns system interaction into a transparent feedback loop, where:

  • Distance (tokens) can be minimized by prompt design,
  • Energy (mWh) can be optimized through model selection,
  • Cost (CO₂) can be reduced by smarter timing and infrastructure choices,
  • Duration can be shortened through intelligent orchestration.

And just like Emma’s drive, the journey is no longer just about performance; it’s about awareness, accountability, and conscious navigation.

With visibility comes control. With control, AI becomes not just intelligent but responsible.

Understanding the metric: a new approach to AI energy transparency

The GreenPT AI usage metric expresses the estimated electrical energy required to generate a fixed computational distance of 100 tokens. The metric is time-independent; it represents energy per output unit, while the 1-second sampling interval is used only to observe how total estimated energy and total tokens accumulate during inference.

Energy is expressed in milliwatt-hours (mWh). Energy is estimated by applying a documented power value for the hardware. The estimated energy is obtained by multiplying the assumed power (in watts) by elapsed time (in seconds), summing these increments, and converting the result into mWh.

Because 1 watt-hour consists of 3600 watt-seconds and 1 watt-hour equals 1000 milliwatt-hours, the conversion from watt-seconds to mWh is:

mWh = watt-seconds ÷ 3.6.

Tokens function as the computational “distance” unit. Normalizing the metric to “per 100 tokens” allows direct comparison between different models, workloads, and hardware configurations. The efficiency metric is therefore computed as:

mWh per 100 tokens = (total mWh consumed ÷ total tokens generated) × 100.

At any point during inference, the system updates the cumulative totals once per second. This yields a live but cumulative value:

current efficiency = (cumulative mWh ÷ cumulative tokens) × 100.

The value becomes increasingly stable as the session continues, since it reflects all energy and all tokens produced since the start of the interaction. The sampling interval does not appear in the formula itself; it is solely the observation frequency used to update cumulative totals.

This formulation aligns with practices in physics and electrical engineering, where energy is obtained by integrating power over time and efficiency is calculated by dividing cumulative energy by cumulative work. The approach maintains physical clarity, avoids conflating measurement cadence with metric definition, and ensures reproducibility.

The metric also supports environmental analysis by enabling estimated energy to be combined with time- and location-dependent carbon intensity factors. This produces CO₂-equivalent values appropriate for LCA, SCI reporting, and sustainability audits.

Because the metric is normalized by computational distance rather than time, it enables coherent comparisons across different hardware architectures and inference conditions. Over extended usage, the cumulative data provide insight into model behavior, token generation efficiency, system load variability, latency effects, and long-term operational characteristics.

Current focus: GPU-Level transparency

At present, the primary scope of measurement focuses on GPU inference workloads, as they represent the dominant source of real-time energy consumption in most transformer-based AI systems. While our benchmark calculations use a fixed power assumption of 300 W for reproducibility, this constant is derived from and periodically recalibrated using direct telemetry from GPU management units (e.g., NVIDIA NVML). This ensures that the fixed benchmark value reflects real hardware behavior while maintaining a stable, comparable metric across runs and models.

Future Expansion: Holistic System Accountability

In upcoming iterations, our measurement framework will be expected to expand to provide full-stack transparency, including:

  • Additional Server Components:
    Integration of CPU usage, memory bandwidth, storage I/O, and interconnect power draw (e.g., PCIe, NVLink) to reflect total system energy behavior.
  • Datacenter-Level Influence:
    Accounting for Power Usage Effectiveness (PUE) and cooling overheads to better reflect the energy implications of infrastructure at scale.
  • Grid-Dependent Carbon Factors:
    Real-time data fusion with carbon-intensity maps from regional grid operators to capture temporal and geographic variation in emissions per kWh.
  • Embodied Carbon and LCA Metrics:
    Future modules will incorporate Life Cycle Assessment (LCA) principles to reflect the embedded environmental cost of hardware manufacturing, transportation, and retirement, providing a fuller picture of the system’s total environmental footprint.

To make the metric tangible, the following appendix introduces a complete, self-contained use case. It demonstrates how the prompts, validation steps, and reasoning structure come together to reveal energy behavior in practice. The appendix provides the full benchmark suite used throughout this document.

Authors

Download the full report here.