Pricing — Sparsity AI

Starter

$0/mo

For individuals and small projects exploring sparsity.

Up to 1M parameters per model
Standard pruning and quantization
Basic Green Report (PDF)
REST API access
Community support
Python SDK

Pay-per-savings

High-volume users can pay based on actual compute saved rather than a flat monthly fee. Costs scale with the value you receive — nothing more.

$0.01 per FLOP saved

$0.001 per API call

$0.10 per model optimized

Included

In every plan

Instant sparsity optimization

Submit a model, receive results in minutes. No queue, no waiting for a data scientist to review.

Accuracy validation

Every output model is validated against your baseline. We report the exact accuracy delta — not an estimate.

Green Report

Every optimization produces a report with FLOPs saved, energy reduction, and CO2 equivalent. Signed and auditable.

REST API

Fully documented REST API to integrate the optimizer into your existing training and deployment pipelines.

Standard serialization formats

Outputs in PyTorch (.pt), TensorFlow SavedModel, and ONNX. Compatible with your existing serving infrastructure.

Open benchmarking

We publish methodology and benchmarks publicly. No black box — you can reproduce our results.

FAQ

Common questions

Will sparsity hurt my model's accuracy?

In our benchmarks, structured sparsity with calibration maintains accuracy within 0.1% of baseline on most architectures. We validate every output before delivery and report the exact delta. If the result falls outside your threshold, we flag it — you only pay for results that meet your spec.

What model types do you support?

We support transformer-based LLMs, CNNs, and recurrent architectures in PyTorch and TensorFlow. We handle weights up to the parameter limits of your plan. Edge cases — custom layers, non-standard attention — are covered under Enterprise consulting engagements.

How is the Green Report calculated?

We measure FLOPs before and after optimization using hardware-accurate profiling, then convert to energy using published GPU TDP figures and regional grid intensity data from Electricity Maps. The report is signed and includes the full methodology so it can withstand third-party audit.

Can I run this on-prem?

On-prem deployment is available on the Enterprise plan. We ship a containerized version of the optimizer engine that runs in your private infrastructure. Your model weights never leave your environment. Contact sales to discuss deployment options.

What happens to my model data?

Model weights are encrypted in transit and at rest. We process them solely to produce the optimized output — they are deleted within 24 hours of your job completing. We do not train on customer models or retain weights beyond that window.

Simple, transparent pricing