Build Carbon‑Aware Software: Forecast Signals, Smart Scheduling, and Real Footprint Cuts

Most teams now track cloud bills and performance, yet few track when their code draws power. That timing matters. The carbon intensity of electricity shifts hour by hour and region by region. If your job can happen later or elsewhere, you can lower emissions and costs without hurting users. This article shows how to design and ship carbon‑aware software: where to get the signals, how to plan work, how to measure results, and how to avoid gotchas.

Why carbon‑aware software now

The grid is changing fast. Renewable output is volatile but forecastable. Markets publish real‑time and day‑ahead signals that let apps pick better moments to run non‑urgent tasks. Cloud providers surface regional clean energy data. Tooling exists to hook signals into your schedulers. What used to be a climate side project is now an engineering practice with clear APIs, SLOs, and dashboards.

Carbon‑aware design also aligns with business: many deferrable workloads use off‑peak energy that is cheaper, hardware runs cooler, and teams can claim verifiable reductions that matter to procurement and reporting.

Understand the signals you’ll use

Average vs. marginal carbon intensity

Two values get tossed around:

Average carbon intensity: grams of CO₂e per kWh for the overall grid mix at a time and place.
Marginal carbon intensity: how much additional emissions a small change in demand causes. This often better reflects the effect of shifting your workload.

If your goal is to optimize when to run, marginal signals are usually more meaningful. If you’re doing high‑level accounting, average may be acceptable. Some providers publish both; know which you are using.

Temporal and locational variation

Signals vary by time and location. A windy night in Region A can be cleaner than a sunny afternoon in Region B, or vice versa. Carbon‑aware apps exploit both axes:

Time‑shifting: Delay a non‑urgent job until the forecast is greener within the next N hours.
Location‑shifting: Place a workload in one of several acceptable regions based on current or forecasted carbon intensity.

Where to get data

You can fetch grid signals from public and commercial sources:

ElectricityMaps publishes live and forecasted carbon intensity by zone with an API and clear licensing.
WattTime provides marginal emissions signals and forecasting, with an API used by many device makers.
UK National Grid ESO Carbon Intensity offers a public API for Great Britain with 30‑minute forecasts.
Some clouds surface regional carbon info (e.g., carbon‑free energy share) to guide deployment choices.

Cache these signals. Don’t call the API every minute from every worker. Use a simple pull‑through cache with a short TTL and backing store like Redis.

Workload patterns that benefit most

Deferrable compute

Batch jobs: report building, log processing, ETL stages, and archiving.
ML training and fine‑tuning: iterative experiments, hyperparameter sweeps, and model distillation.
Maintenance tasks: index rebuilds, database vacuuming, container image scanning, and backups.
Media pipelines: transcodes, thumbnailing, and generative assets with no hard delivery deadline.

Flexible in time but not location

Some jobs must run where the data lives (compliance, latency). Use time windows: allow the job to complete within 6–18 hours and select the cleanest hour inside the window. Offer a max wait override for urgent runs.

Flexible in location but not time

Interactive services cannot wait long, but can choose cleaner regions among a short list. Use location routing with guardrails: only consider regions that meet latency and data residency constraints, then pick the cleaner option.

Designing a carbon‑aware scheduler

Core ideas

Eligibility: each job declares whether it is deferrable and any max wait.
Budget: set a carbon budget per period (week/month) to trigger more aggressive shifting when you’re overshooting.
Forecast window: look ahead N intervals (e.g., 24x 30‑minute slots) and choose the slot with the best expected intensity that meets SLAs.
Fallbacks: if signals are stale or missing, run after a grace timeout to avoid buildup.
Fairness: prevent starvation by giving every waiting job a rising priority boost over time.

Simple pseudocode

Imagine you have scheduled tasks with attributes: earliest_start, latest_finish, region_candidates, min_cpu, est_kwh. You also have a signal service that returns a time‑series of marginal intensity per region.

// choose_best_slot(job, signals, now)
window = all 30-min slots between job.earliest_start and job.latest_finish
best = null
for slot in window:
  for region in job.region_candidates:
    mci = signals.marginal_intensity(region, slot) // gCO2e/kWh
    score = mci * job.est_kwh
    if meets_capacity(region, slot, job.min_cpu) and meets_latency(region):
      if best is null or score < best.score:
        best = {slot, region, score}
if best is null:
  // fallback: run at earliest allowed time in primary region
  return {slot: job.earliest_start, region: job.region_candidates[0]}
return best

Keep it simple. You don’t need an optimizer at first—greedy selection over a small window is fine. Add batching later if you want to align many jobs to the same clean slot.

APIs and data formats

Most providers return JSON with timestamps and intensity values. Normalize into your own schema:

{
  "source": "watttime",
  "region": "CAISO_NORTH",
  "interval_mins": 30,
  "updated_at": "2026-03-15T10:00:00Z",
  "series": [
    {"start": "2026-03-15T11:00:00Z", "marginal_gco2_per_kwh": 280},
    {"start": "2026-03-15T11:30:00Z", "marginal_gco2_per_kwh": 240}
  ]
}

Wrap that behind a small signal service for your org. It owns provider keys, validates timestamps, and outputs a uniform time series. Downstream, the scheduler stays provider‑agnostic.

User experience and guardrails

Controls that build trust

Explain the delay: “We’ll run this by 4:00 PM when the grid is cleaner. Need it now? Run anyway.”
Defaults with choice: opt users into sensible windows, but provide a clear “Run now” button that logs a reason.
Visible wins: show avoided emissions for the last week or month. Keep numbers conservative.

Never trade away reliability

Carbon‑aware design should never break SLAs. Use caps on how long a single user can be delayed, circuit breakers for stale signals, and priority lanes for critical traffic.

Accessibility and fairness

Do not push “Run later” only to users in regions with lower incomes or worse infrastructure. Keep rules transparent and apply them consistently. For global products, offer language‑appropriate explanations and local time windows.

Measuring impact with real numbers

From watts to emissions

Estimate energy use per task and multiply by intensity. A basic model works:

CPU/GPU power: sample average power draw (watts) from platform telemetry, times runtime (hours) for kWh.
Memory and disk: include representative power or use server TDP allocation percentages if detailed telemetry is hard.
Network: for big transfers, add egress energy with a coarse factor (be conservative).

Emissions estimate: kWh × marginal_gCO2_per_kWh. Show confidence intervals when possible.

Dashboards and budgets

Track jobs shifted, avg delay, kgCO₂e avoided, and SLA adherence.
Alert when carbon budget is at risk and ratchet the scheduler to longer windows or stricter thresholds.
Compare to a counterfactual: what if you ran every job at earliest allowed time in the default region?

Cloud deployment strategies

Pick better regions (within policy)

When data residency allows, prefer regions with higher carbon‑free energy shares for background work. Keep a short list of allowed regions per workload and let the router select among them based on current signals and latency checks. Respect compliance constraints first.

Queues and burst windows

Use a queue per region and pop aggressively during green windows. For very green hours, allow concurrency bursts to drain backlog. Cap node autoscaling to avoid cost spikes, and set a cooldown to prevent thrash.

Mix with spot capacity

Spot or preemptible instances often align with off‑peak hours, further reducing cost. Combine carbon windows with spot bidding for batch pipelines. Have a checkpointing strategy to survive preemptions.

Edge and device scenarios

Desktops and laptops

Background sync, photo analysis, and software updates can run during cleaner local grid windows or when plugged in. Show a gentle nudge: “We’ll finish indexing tonight when the grid is cleaner and your laptop is charging.”

Mobile apps

Run heavy ML offline tasks during charging + Wi‑Fi + “after 9 PM” to align with cleaner night grids in many regions.
Avoid user‑visible slowdowns. Prioritize battery and data caps over carbon timing if they conflict.

On‑prem servers

If you run private clusters, fetch local ISO/RTO signals. Schedule backups and compactions when your tariff is off‑peak and cleaner. Even a small office can time NAS scrubs overnight to cut noise and emissions at once.

Privacy, security, and governance

Data you send and store

Do not leak user location. Your signal service should accept a coarse region code (e.g., market zone) and return intensities. Keep API keys in a secrets manager. Rate‑limit and cache responses to control cost.

Auditability

Log which signal input influenced each scheduling decision. Keep these logs for at least one reporting cycle to back up your avoided emissions claims. Use immutable storage or append‑only tables for traceability.

Greenwashing traps to avoid

Do not claim credits for reductions you cannot show. Stick to marginal impacts within your control.
Do not use location shifts that violate residency or privacy rules.
Be clear that carbon‑aware scheduling complements performance tuning and efficient code; it does not replace them.

Pitfalls and practical fixes

Starvation and backlog spikes

If the grid stays dirty for days during heat waves, you can accumulate a large backlog. Fixes:

Escalating priority for waiting jobs.
Daily cap on total deferral time per user/tenant.
Allow a “good enough” threshold: run if intensity is within X% of the week’s median rather than waiting for the absolute best slot.

Inaccurate energy models

You won’t have perfect kWh per job. Start with coarse estimates, then refine. Sample real power on a subset of nodes, calibrate per workload, and publish error bars. The goal is directionally correct decisions, not lab‑grade precision.

Latency regressions

Location shifting can hurt tail latency. Keep a small, fixed set of regions and include synthetic checks in your router. Re‑route if p95 exceeds a threshold for more than a few minutes.

Concrete use cases you can try this week

Build pipelines

Mark nightly builds, final asset bundling, and container image signing as deferrable. Give them an 8‑hour window. You’ll often shift from late afternoon to post‑midnight when wind picks up in many markets. Keep on‑demand builds instant for developer flow.

Data lifecycle tasks

Run compression, tiering, and compaction processes during the greenest hour of the next 12. For object stores with lifecycle rules, pre‑stage moves instead of hammering during business hours.

Media and ML

Offer users a “green queue” for non‑urgent renders or fine‑tunes. Show estimated finish time and let them jump to “run now” with one click. Many will accept the delay when they understand the benefit.

Integrating with existing tools

Schedulers and workflow engines

Add a carbon plugin to engines like Airflow, Argo, or Temporal. The plugin reads from your signal service, tags tasks with scores, and chooses run times within SLA windows. Keep the decision logic idempotent so retries are predictable.

Feature flags

Gate your first rollout behind flags. Enable carbon‑aware for 10% of tenants, compare outcomes, and tune thresholds. Gradually grow coverage as you prove there are no regressions.

Observability

Add carbon_intensity_now and avoided_emissions as time‑series in your dashboards. Annotate with major signal events (e.g., “wind ramp” or “grid constraint”). On incident reviews, note whether deferrals influenced recovery times.

Cost and procurement synergy

Shifting to cleaner windows can ride along with off‑peak pricing, spot compute, and lower cooling loads. Work with finance to tag carbon‑aware runs and compare their unit cost (per run, per GB processed) against standard runs. Share a joint savings report that covers both money and emissions so the value is clear across teams.

From pilot to policy

Start small

Pick a single background job, add a 12‑hour window, and route by marginal intensity among two regions. Measure for two weeks. If SLA is clean and emissions drop measurably, expand to more jobs.

Make it a norm

Document a short engineering standard: which signal to use, default windows, logs to keep, and how to report avoided emissions. Add a code template or library to make the “green window” call a one‑liner. Once friction drops, adoption spreads.

Report responsibly

Share monthly totals with a note on uncertainty and methods. If you can, have a third party review calculations once a year. Keep claims modest and transparent.

What “good” looks like after six months

20–50% of your batch workload runs in greener windows.
No SLA breaches traced to the scheduler.
Dashboards show stable or falling kgCO₂e per task.
Engineers call the API or library by habit when adding new background jobs.
Procurement references your numbers in vendor talks and cloud region choices.

Summary:

Carbon‑aware apps use time and location flexibility to cut real emissions.
Prefer marginal over average intensity for scheduling decisions.
Build a small signal service, cache results, and keep schedulers provider‑agnostic.
Start with deferrable workloads: batch, ML training, maintenance, and media pipelines.
Use windows, budgets, and fallbacks so reliability never suffers.
Measure energy, multiply by intensity, and compare to a clear counterfactual.
Scale with plugins for workflow engines, feature flags, and observability metrics.
Report results conservatively, avoid greenwashing, and make it an engineering norm.