How much water and energy does normal AI use consume?

What is true?

The truth is not “AI is free” and not “one prompt drains a bottle of water.” The responsible answer is a range with assumptions.

~0.05–3 Wh Reasonable range for many ordinary short text queries, inference only. Long answers, frontier models, tools, images, video, or agents can be higher.

contested The bottle-of-water claim is best treated as a high, older, modeled scenario for GPT-3-style use — not a settled current fact about normal ChatGPT prompts.

training varies Training is a fixed cost. Per prompt it is tiny for heavily used models, but meaningful for very large or lightly used systems.

Most defensible public statement

A normal text AI query probably uses electricity comparable to running a 10-watt LED bulb for seconds to minutes, not hours. Water use is real but highly location-dependent; commonly cited estimates range from less than a milliliter to a few teaspoons per query. The bigger environmental question is not one teacher’s occasional prompt, but billions of prompts, larger media generation, datacenter buildout, local water stress, and whether vendors disclose enough to verify claims.

Per prompt and per token estimates

“Per prompt” is unstable because a prompt can produce a sentence or a 2,000-word lesson plan. Per-token accounting is cleaner, but closed providers rarely publish audited per-token footprints.

Metric	Plausible range for text AI	What drives it
Inference electricity per short query	~0.05–3 Wh	Model size, output length, batching, hardware, datacenter efficiency.
Inference electricity per 1,000 output tokens	~0.1–10 Wh	Small open models can be near the low end; frontier models and long outputs can be higher.
Water per query	Company-reported average: ~0.3 mL; older modeled GPT-3 scenario: ~10–50 mL	Direct cooling water, indirect electricity water, datacenter location, weather, cooling design, and accounting boundary. The high figure should be presented as contested, not as a settled fact.
Training amortization per query	near-zero to several+ Wh	Training energy divided by lifetime served tokens. Unknown for most closed models.

These ranges synthesize public academic measurements, modeling papers, provider statements, and arithmetic. They are not a claim that every model or every query falls inside the range.

Water formula

If a datacenter reports water usage effectiveness (WUE), the rough relationship is:

water per query = electricity per query × WUE

Example: a 1 Wh query is 0.001 kWh. At 1 liter/kWh WUE, that is about 1 mL of direct water consumption. At 10 Wh and 3 L/kWh, it is 30 mL.

Why estimates disagree

Some count only onsite cooling; others include electricity-generation water.
Some use older GPT-3-era assumptions; others claim newer average fleet efficiency.
Some compare short benchmark prompts; real classroom tasks can be much longer.
Closed providers usually do not publish model-specific energy, water, or served-token data.

Doesn’t the water come back?

Yes, in the global water-cycle sense. But environmental water accounting is mostly about local timing, quality, scarcity, and opportunity cost — not literal disappearance from Earth.

The clean distinction: withdrawal vs consumption

Water withdrawal means water taken from a river, aquifer, lake, or municipal system and later returned. Water consumption means water not returned to that same local source in a usable form on a useful timescale — for example, water evaporated in cooling. Evaporated water is still part of Earth’s water cycle, but it may not rain back into the same watershed when the community, river, farm, or aquifer needs it.

At the datacenter

Some datacenters use evaporative cooling, where water absorbs heat and evaporates. That can reduce electricity use compared with some air-cooling approaches, but it consumes local water. Other facilities use more closed-loop or air-cooled systems, which may use less water onsite but more electricity.

At the power plant

Even if a datacenter itself uses little water, the electricity powering it may have a water footprint. Thermal power plants can withdraw or consume water for cooling; wind and solar generally have much lower operational water use. This is why some studies count both onsite water and offsite electricity-related water.

Why “it rains later” is incomplete

The issue is not that water is destroyed. It is that water can be moved from a local river/aquifer into the atmosphere, returned somewhere else, returned in a different season, or returned after local stress has already happened. In a water-rich region this may be minor; in a drought-prone watershed it can matter.

What water are we discussing?

Direct onsite water: cooling water consumed at the datacenter.
Indirect/offsite water: water consumed to generate the electricity used by the datacenter.
Withdrawal: water taken and mostly returned, often warmer or changed.
Consumption: water evaporated or otherwise not returned locally in the same usable window.

Best wording

AI water use is not about water vanishing from the planet. It is about local freshwater demand: how much water is consumed, where it is consumed, whether that place is water-stressed, what season it happens, and whether the datacenter or power source could have used less scarce water or cleaner energy.

Have there been actual academic studies?

Yes, but not the kind people usually imply. There are academic studies of AI water footprints and datacenter water use. Public prompt-level water numbers, however, are modeled estimates — not direct metering of ChatGPT/Claude/Gemini water use per prompt.

The evidence hierarchy

I do not find a public academic study that directly meters water use per LLM prompt inside an AI provider’s datacenter. The closest evidence is: measured or provider-reported aggregate datacenter water use; peer-reviewed datacenter footprint models; direct studies of LLM energy use; and AI-specific papers that convert estimated energy into estimated water using WUE and grid-water assumptions.

Source type	Examples	What is measured?	What is modeled?
AI-specific water paper	Li/Ren et al., Making AI Less “Thirsty”	Not prompt-level water. Uses public/assumed infrastructure data.	Training and inference water from estimated energy, WUE, datacenter location, cooling, and electricity-water factors.
Datacenter water papers	Mytton 2021; Ristic/Madani/Makuch 2015; Siddik/Shehabi/Marston 2021	Sometimes company-reported aggregate facility/fleet water; usually not independently metered by researchers.	Direct and indirect water footprint at facility, national, or sector scale.
LLM inference energy papers	Luccioni/Jernite/Strubell 2024; Samsi et al. 2023	Energy/power for specific models/tasks/hardware.	Water only if someone multiplies energy by WUE and grid-water intensity.
Energy-water nexus studies	Macknick et al.; Meldrum et al.; DOE/LBNL reports	Power-sector water withdrawal/consumption factors from reported and harmonized data.	Indirect water for datacenter electricity consumption.

Why direct prompt-level water is hard

A datacenter serves many workloads, not just one AI model.
Cooling water varies by weather, season, and time of day.
Prompts may run across multiple machines or facilities.
Providers rarely disclose model-specific traffic, energy, location, or WUE.
Indirect power-plant water depends on the grid mix at that time and place.

Best answer

There are real academic studies, but the famous per-prompt water numbers are not direct observations. They are educated models built from energy estimates plus datacenter and electricity-water assumptions. That does not make them useless; it means they should be presented as scenarios or ranges, not as measured facts.

Critique of the two headline estimates

The two numbers people argue about are built very differently: one is an academic scenario model; the other is a company-reported average with almost no public methodology. Neither is an audited prompt-level measurement.

Estimate 1: Li/Ren et al. “500 mL per 10–50 responses”

What it is based on: an academic model for GPT-3-style inference. The paper assumes a medium request, roughly ≤800 input words and 150–300 output words; starts from an outside estimate that GPT-3 uses about 0.4 kWh to generate 100 pages; then adds assumptions for prompt processing, non-GPU server energy, datacenter WUE, cooling, location, weather, and electricity-generation water intensity.

Strengths	Weaknesses
Transparent enough to inspect; distinguishes onsite cooling water from offsite electricity water; explicitly accounts for location/time variation; grounded in real datacenter water concepts like WUE and water consumption.	Not direct metering; depends on old/indirect GPT-3 energy estimates; assumes datacenter locations and cooling behavior not publicly confirmed; may not reflect current models, batching, hardware, routing, or efficiency improvements; “response” size is larger than many casual prompts.

Best use: a high-side scenario showing how AI water could matter under some infrastructure assumptions. Bad use: saying a modern average ChatGPT prompt has been measured to use a bottle of water.

Estimate 2: Altman/OpenAI “0.34 Wh and 0.000085 gallons per query”

What it is based on: a 2025 public statement by Sam Altman that “the average query” uses about 0.34 watt-hours and 0.000085 gallons of water. The post does not provide a methods appendix, system boundary, model mix, prompt/response length, datacenter locations, WUE assumptions, whether training is included, or whether the figure is independently audited.

Strengths	Weaknesses
Potentially closer to OpenAI’s real internal fleet because the company may see actual traffic, routing, hardware, and datacenter data unavailable to academics; likely reflects newer efficiency gains and a mix of model sizes.	Opaque; not peer-reviewed; not independently auditable from the public claim; may be an average across many short/simple queries and efficient models; unclear whether it includes indirect electricity water, training amortization, failed runs, hardware lifecycle, or only operational serving.

Best use: a plausible lower/current-company-average anchor. Bad use: treating it as a fully audited lifecycle footprint or using it to dismiss local datacenter water concerns.

What I would say if challenged

The academic estimate is more methodologically transparent but probably stale/high and not directly measured. The OpenAI estimate may be more current and operationally informed, but it is methodologically opaque and company-provided. The truth for a specific prompt depends on model, output length, datacenter, cooling, grid, and whether you count training or indirect water. So the defensible position is a range plus uncertainty — not either headline number as gospel.

Where the viral claims come from

The disagreement is mostly between two reference points: an academic water-footprint model and a later company-provided average.

“A bottle of water per 10–50 responses”

Usually traces to Li/Ren et al., Making AI Less “Thirsty”. The paper says GPT-3 could “drink” a 500 mL bottle of water for roughly 10–50 medium-length responses, depending on when and where it is deployed. That was a modeled GPT-3 scenario, not a direct measurement of every modern ChatGPT prompt.

500 mL ÷ 10 responses	50 mL/response
500 mL ÷ 50 responses	10 mL/response

Best reading: not “debunked” as in impossible, but often overstated, outdated, and misquoted. It should not be used as a settled current average unless those assumptions are stated.

“One prompt is 1/800,000 of daily water use”

This appears to come from Sam Altman’s 2025 statement that an average ChatGPT query uses about 0.000085 gallons of water and 0.34 Wh of electricity.

0.000085 gallons	0.322 mL
68 gallons/day ÷ 0.000085	~800,000 queries

The math works if you accept the company-provided average and a U.S.-style daily household water baseline. It is not the same as an audited full lifecycle footprint.

The real controversy

The controversy is not whether AI has an environmental footprint. It does. The controversy is whether public per-prompt claims are being compared fairly: direct vs indirect water, withdrawal vs consumption, model-specific vs fleet-average estimates, training vs inference, and average global use vs local water-stressed datacenters.

Should a prompt include a slice of training?

Yes for lifecycle accounting; usually no for the marginal impact of one extra use of an already-trained model. The cleanest report shows both.

Marginal-use view

Question: “What happens if I ask one more question today?” Training is mostly sunk. The added impact is inference electricity, cooling, network/storage overhead, and a tiny share of hardware wear.

Lifecycle view

Question: “What is the average footprint of this AI service?” Then allocate training across total lifetime served tokens or queries. The hard part: companies rarely publish the denominator.

Training run example	Reported/estimated energy	If amortized over usage	Interpretation
GPT-3-scale estimate in Patterson et al.	~1,287 MWh	~1 Wh per 750-token interaction if spread over 1 trillion served tokens; ~0.1 Wh over 10 trillion.	Training share can become small at heavy usage.
BLOOM 176B, transparent accounting	~433 MWh training electricity	~0.32 Wh per 750-token interaction over 1 trillion served tokens; ~0.032 Wh over 10 trillion.	Open accounting helps; amortization still depends on usage.
Llama 3.1 405B back-of-envelope	tens of GWh from reported GPU-hours	Could be several Wh per interaction unless spread over tens or hundreds of trillions of tokens.	Very large models make the denominator matter a lot.

Blue jeans, golf, LEDs, and other comparisons

Comparisons can help scale the issue, but many are apples-to-oranges.

LED bulb

A 10-watt LED bulb uses 1 Wh in 6 minutes. If a short text query uses ~0.34 Wh, that is about 2 minutes of that bulb; if it uses 3 Wh, about 18 minutes.

Blue jeans

Public jeans water-footprint estimates often run from several thousand to around 10,000 liters. Depending on whether AI water is 0.3 mL or 10–25 mL/query, one pair of jeans can equal hundreds of thousands to tens of millions of prompts. Useful scale; not a moral free pass.

Golf irrigation

U.S. golf-course irrigation is often estimated around millions of acre-feet per year, vastly larger than one AI prompt. But golf and datacenters stress different local watersheds at different times, so this should not dismiss datacenter siting concerns.

Google search

Older claims that ChatGPT used about 10× a Google search often used ~3 Wh/query. Altman’s later public figure, 0.34 Wh/query, is much lower. Treat any single “X times a search” claim as year-, model-, and method-dependent.

How long prompting equals a normal outing?

This is rough comparison math, not moral accounting. Assumptions: a 20–40 mile car outing, a 25 mpg gasoline car, and Water Footprint Calculator’s cited estimate that refining gasoline uses about 1–2.5 gallons of water per gallon of gasoline. It excludes the theater/store building, snacks, and most purchases.

Comparison	Water estimate	Equivalent prompts at ~0.322 mL/query	Equivalent prompts at contested 10 mL/query	Equivalent prompts at contested 50 mL/query
20-mile outing, gasoline refining only	~3–7.6 L	~9,400–23,500 prompts	~300–760 prompts	~60–150 prompts
40-mile outing, gasoline refining only	~6.1–15.1 L	~18,800–47,000 prompts	~600–1,500 prompts	~120–300 prompts
20-mile outing plus an amortized slice of manufacturing the car	~10–19 L	~31,000–58,000 prompts	~1,000–1,900 prompts	~200–370 prompts

At an aggressive human rate of two prompts per minute, 9,400 prompts is about 78 hours of nonstop prompting. At the high contested 10 mL/query water estimate, the same 20-mile gasoline-refining comparison is only a few hours. That spread shows why the chosen AI water number dominates the answer.

If you actually buy something

The purchased item usually swamps the transportation comparison. Water Footprint Calculator lists a cotton T-shirt at about 2,720 L and cotton jeans at about 10,850 L. At Altman’s 0.322 mL/query figure, one T-shirt is roughly 8.4 million prompts; one pair of jeans is roughly 33.7 million prompts.

Best reading

For a no-shopping errand, the water footprint of the gasoline supply chain can equal anywhere from hundreds to tens of thousands of prompts depending on which AI water estimate you use. For a shopping trip where you buy cotton clothing or manufactured goods, the item’s hidden water footprint can dwarf both the drive and the AI prompts.

How to talk about this with teachers

The goal is responsible use, not environmental theater and not denial.

Classroom-safe framing

AI tools are not environmentally free. They run in datacenters that use electricity and sometimes water for cooling. The impact of one classroom text prompt is usually small, but widespread use adds up. Use AI where it clearly improves learning, accessibility, feedback, planning, or teacher workload; use simpler tools when they are enough.

Good habits

Use the right-sized tool: search, spreadsheet, calculator, spellcheck, or local template when that is enough.
Batch questions and write clearer prompts to avoid repeated generations.
Reuse useful rubrics, emails, examples, and lesson structures.
Be more cautious with image, video, and agentic workflows, which can use much more compute than text.

Questions for vendors

Do you publish energy, emissions, and water data?
Do you report water usage effectiveness or datacenter water consumption?
Can schools choose efficient models for routine tasks?
Are renewable-energy and water claims audited or only company-wide averages?

Selected sources

Best available public sources still leave gaps, especially for closed frontier models. These are the sources most worth reading before arguing about a single number.

Pengfei Li, Jianyi Yang, Mohammad A. Islam, Shaolei Ren, “Making AI Less ‘Thirsty’: Uncovering and Addressing the Secret Water Footprint of AI Models”, arXiv, 2023.
Paul Reig, World Resources Institute, “What’s the Difference Between Water Use and Water Consumption?”, 2013.
David Mytton, “Data centre water consumption”, npj Clean Water, 2021.
Bahareh Ristic, Kaveh Madani, Zen Makuch, “The Water Footprint of Data Centers”, Sustainability, 2015.
Md Abu Bakar Siddik, Arman Shehabi, Landon Marston, “The environmental footprint of data centers in the United States”, Environmental Research Letters, 2021.
Eric Masanet et al., “Recalibrating global data center energy-use estimates”, Science, 2020.
Lawrence Berkeley National Laboratory, “United States Data Center Energy Usage Report”, 2016.
Water Footprint Calculator, “The Hidden Water in Everyday Products”. Includes cited water footprints for cotton clothing, cars, and gasoline refining.
Sam Altman, “The Gentle Singularity”, 2025. Source of the public 0.34 Wh and 0.000085 gallon average-query figures.
Alexandra Sasha Luccioni, Yacine Jernite, Emma Strubell, “Power Hungry Processing: Watts Driving the Cost of AI Deployment?”, ACM FAccT, 2024.
Siddharth Samsi et al., “From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference”, arXiv, 2023.
David Patterson et al., “Carbon Emissions and Large Neural Network Training”, arXiv, 2021.
Alexandra Sasha Luccioni et al., “Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model”, JMLR, 2023.
Alex de Vries, “The growing energy footprint of artificial intelligence”, Joule, 2023.
Meta, “The Llama 3 Herd of Models”, arXiv, 2024.
International Energy Agency, Electricity 2024 and data center electricity analysis.