Secondhand Compute as Inference Infrastructure — Investment Research

I. TL;DR + Edge Summary

RTX 3090 Bandwidth

936 GB/s

at 40–55% of 4090 cost

M2 Ultra New Price

$6,000+

used at $1,800–2,200

API Price Gap

13×

local vs commodity

Payback (heavy use)

5–56 days

heavy to light developer

Sellers on Carousell, Xianyu, and eBay list RTX 3090s as gaming cards and used Mac Studios as consumer electronics. They are actually selling AI inference nodes. The information gap is real: secondhand sellers price by consumer electronics comps, not by AI infrastructure comps. The structural gap is also real: institutional buyers cannot purchase Mac Studios on Carousell — they buy new, in volume, from Apple. Individual buyers access a market that institutions structurally cannot.

Reformed Verdict: THESIS HOLDS WITH CONDITIONS The mispricing is real but narrow. It is not “buy any used GPU.” It is: buy specific SKUs with proven bandwidth:capacity ratio at below-infrastructure pricing, with exit strategy as important as entry. The RTX 4090 has no edge — practitioners have fully priced its value. The RTX 3090 and M2 Ultra 192GB have edge — the market has not yet.

The One-Line Edge

Bandwidth is the only metric that matters for autoregressive LLM inference. Every token generated requires loading all model weights through memory once. GB/s beats TFLOPS. Secondhand sellers price by TFLOPS (gaming benchmarks). We buy by GB/s (inference throughput). That is the entire thesis.

II. Investment Thesis

The Core Claim

Second-hand consumer compute hardware is systematically mispriced relative to its value as AI inference nodes. The mispricing arises from two structural conditions: (1) sellers use consumer electronics pricing frameworks (gaming GPU benchmarks, consumer laptop resale comps) rather than infrastructure pricing frameworks (GB/s of memory bandwidth, GB of VRAM, token throughput), and (2) the buyers who would correctly price this hardware — hyperscalers, AI companies, data centers — are structurally excluded from the secondhand consumer market. They buy new, in bulk, via enterprise contracts. A Mac Studio M2 Ultra 192GB cannot be procured on Carousell at any price by an institution. An individual can buy it there for $1,800–2,200.

Critical Assessment: Hidden Assumptions

Before treating this as an investment, six hidden assumptions must be surfaced and stress-tested.

Assumption	The Risk	Stress Test
Local inference demand grows	DeepSeek V3 at $0.89/M blended continues falling. If commodity API cost reaches $0.05/M, local hardware generates less value than electricity costs.	Mitigation: buy below consumer electronics resale floor. Hardware value floor is not inference — it is gaming card / workstation resale. Inference is upside.
Sellers remain uninformed	Carousell and eBay pricing algorithms surface AI demand signals; prices correct toward infrastructure comps within 12–18 months.	Partially happening already. RTX 4090 is fully priced by practitioners. The window is not permanent — act on highest-conviction SKUs now.
Hardware holds residual value	Next-generation models require 48GB+ minimum VRAM. 24GB cards become worthless for inference overnight. Resale as gaming GPU still possible but at lower price.	Real risk. Weight 48GB+ options more heavily. The 24GB play (RTX 3090) works only if you personally use the inference — not as a pure asset hold.
Power costs manageable	Electricity rates spike. At $0.40/kWh, RTX 3090 electricity cost triples. Inference value breakeven shifts dramatically.	Model electricity cost at your actual rate. Singapore/HK rates vary significantly. At $0.30/kWh: still viable for heavy users. At $0.40/kWh: marginal for light users.
Exit is liquid	You need to sell 10 RTX 3090s and crater your own price. Secondhand GPU market is thin. Large positions are illiquid by definition.	Hard position limit: max 3 units of any single SKU. This is not a scale trade. It is a personal infrastructure trade.
Apple Silicon advantage persists	NVIDIA releases a 96GB consumer card at $1,500. Apple Silicon bandwidth per dollar advantage collapses.	Possible but not imminent. NVIDIA’s 2026 roadmap does not show a consumer 96GB card at that price. Even if announced, secondhand Apple Silicon price decay is slow — resale holds better than NVIDIA for 24 months.

Adversarial Challenges

Against the Thesis

Inversion: the real play is being a seller into AI demand, not a buyer
Base rates: “mispriced asset” arbitrages that retail investors actually profit from are rare
Survivorship: we see Mac Minis that appreciated; not GPUs made worthless when models scaled past their VRAM
Incentive mapping: r/LocalLLaMA practitioners promote their own hardware choices
Time horizon: “AI demand drives prices up eventually” — but when? Could be 3 years of depreciation first
Utilization trap: hardware is only positive ROI if continuously utilized

For the Thesis

Structural exclusion of institutional buyers is empirically real and durable
Consumer electronics price floor provides downside protection on specific SKUs
M2 Ultra 192GB: no plausible path for the price to drop below $1,200 as a consumer Mac
RTX 3090 at $600: gaming resale floor is $400+. Inference value is pure upside
The edge does not require price appreciation — it generates value through use
Inversion addressed: seller strategy requires inventory; buyer strategy requires only use

Key Distinction: This Is Not a Speculation, It Is a Capital Allocation The investment case does not require hardware prices to rise. It requires that you use the hardware. The RTX 3090 at $600 generates value every day you run inference on it. Even if it never appreciates, if you save $10/day in API costs and the electricity is $1.26/day, you net $8.74/day for as long as it runs. The appreciation thesis is secondary. The utilization thesis is primary.

III. Consensus Map

Understanding where consensus sits determines where edge exists. Consensus-priced assets have no alpha. Below-consensus assets where our model disagrees have edge.

Group	Awareness	Sentiment	Edge Available?
Mass market (Carousell/eBay sellers)	Unaware	Prices as gaming GPU or consumer electronics	Yes — maximum edge
Institutional (hyperscalers, AI labs)	Fully aware	Structurally excluded from secondhand market	Yes — structural exclusion edge
r/LocalLLaMA practitioners	Aware	RTX 3090: +10 (neutral/mixed). Mac mini M4 Pro: +30 (bullish)	Partial — see below
RTX 4090 buyers	Fully aware	Consensus priced in by practitioners	No edge

Practitioner Sentiment Breakdown

RTX 3090 — Sentiment: Neutral/Mixed (+10). Threads: “RTX 3090 in 2026” and “Talk me out of buying RTX 3090 just for local AI.” The practitioner consensus is ambivalent: 24GB is tight for 70B models, the card is aging, and the RTX 4090 exists. This ambivalence is the edge — the market has not bid up RTX 3090 prices to reflect its actual inference throughput. Our thesis: CONTRARIAN. Neutral sentiment on a card with 936 GB/s bandwidth at $600–900 is the signal.

Mac mini M4 Pro 64GB — Sentiment: Bullish (+30). Thread “Mac Mini looks compelling now... Cheaper than a 5090 and near double the VRAM” had 911 upvotes. This is forming consensus. When r/LocalLLaMA is bullish at +30, the edge is shrinking. Our thesis: ALIGNED but edge is narrowing. The M4 Pro is correctly valued. Wait for M5 release to drop M4 prices, or focus on M2 Ultra used instead.

Edge Signal: RTX 3090 is the contrarian position that still has alpha Practitioner ambivalence (“24GB is tight”, “aging card”) is suppressing demand below what the bandwidth numbers justify. 936 GB/s at $600–900 is underpriced against the only metric that matters for inference. The practitioners focused on VRAM ceiling (fair concern) are ignoring the bandwidth floor (the actual constraint for most use cases).

IV. Edge Model

Why Bandwidth Is the Only Metric That Matters

Autoregressive LLM inference is memory bandwidth-bound, not compute-bound. Every token generated requires loading all model weights through memory exactly once. A 70B parameter model at Q4 quantization occupies approximately 40GB. Generating one token requires moving 40GB of data through the memory system. At 936 GB/s (RTX 3090), that is 40GB ÷ 936 GB/s = 42.7ms latency per token, or approximately 23 tokens/second, ignoring other bottlenecks. TFLOPS determine how fast you compute the attention mechanism — which adds roughly 10–20% to total latency on large batches but is negligible at batch size 1 (single-user inference). Therefore: GB/s is what you buy. TFLOPS are marketing.

Memory bandwidth comparison (GB/s — the only inference metric that matters)

M2 Ultra 192GB

800 GB/s

RTX 3090 Ti

1,008 GB/s

V100 32GB PCIe

897 GB/s

RTX 3090

936 GB/s

A40 48GB

696 GB/s

The No-Lose Price Formula

No-lose price = (inference value generated × P(thesis holds))
              + (resale value as consumer electronics × P(thesis breaks))
              - electricity cost over holding period
              - time cost (acquisition + management + exit)

The formula reveals the key insight: when the consumer electronics resale floor is high enough, you do not need the thesis to hold to avoid losing money. The inference value is pure upside on top of an already-safe resale position.

Per-SKU Value Analysis

RTX 3090 — Core Buy

Parameter	Value	Notes
VRAM / Bandwidth	24GB GDDR6X / 936 GB/s	Fits Llama 3.3 70B at Q4_K_M (barely); 8B at Q8 easily
Token throughput	40–50 tok/s (70B Q4); 150+ tok/s (8B)	Single user, batch 1
Used price	$600–900	eBay / Xianyu; varies by condition
Power draw	350W TDP	$1.26/day at $0.15/kWh, 24h
Heavy dev savings (10M tok/day vs Sonnet)	$118.84/day net	$120 API saved − $1.26 electricity
Realistic dev savings (1M tok/day)	$10.74/day net	$12 API saved − $1.26 electricity
Payback at heavy use	5 days	At $600 entry, 10M tok/day
Payback at realistic use	56 days	At $600 entry, 1M tok/day
Consumer resale floor	$400+	As gaming GPU; provides downside protection
No-lose price	$800 or below	Below this: inference is pure upside on gaming resale value

Reality Check on the 10M Token Math You are not replacing $8,000/day of API calls. You are replacing your own usage. The value = what you would have paid the API for your actual consumption. Most developers realistically process 500K–2M tokens/day under heavy agentic use. At 1M tok/day versus Claude Sonnet ($12/M blended): $12 saved, $1.26 electricity = $10.74 net. Payback in 56 days. That is the honest case, not the 10M case. Still an exceptional return on a $600 asset.

A40 48GB — The Passive Cooler Discount

The A40 is a 48GB GDDR6 workstation card at 696 GB/s. It runs 70B models comfortably and 405B at aggressive quantization. It is cheap not because it underperforms, but because its passive cooler requires a server chassis or DIY open-air cooling. Workstation builders avoid it. The play: add $50 in fans to an open-air rig and unlock 48GB at 696 GB/s for $1,800–2,200. The A6000 Ampere (identical VRAM, better bandwidth) costs $2,200–3,500. The A40 is the same capacity at a slight bandwidth penalty and $400–1,700 cheaper. If you can handle the passive cooler friction, this is exceptional value per GB.

M2 Ultra 192GB — Highest Conviction

This is the highest-conviction position in the entire universe. The M2 Ultra Mac Studio with 192GB unified memory launched at $6,000–7,000+. Used it trades at $1,800–2,200 because buyers treat it as a consumer Mac desktop. It is not. It is 192GB of unified memory at 800 GB/s — the memory bandwidth of an A100 40GB ($8,000–14,000 used), at one-fifth the cost, running Llama 3.1 405B at Q4, DeepSeek 67B comfortably, and Llama 3.3 70B at full precision. The structural edge: Apple does not sell 192GB Mac Studios at used prices in volume. An AI lab cannot procure 100 of these from Carousell. You can procure one.

V100 32GB PCIe — HBM2 at Consumer Pricing

The V100 PCIe 32GB offers 897 GB/s of HBM2 bandwidth at $600–1,200 used. HBM2 is the memory architecture used in data center GPUs; its bandwidth-per-dollar ratio at these prices is exceptional. The caveats are real: PCIe 3.0 limitation, aging Volta architecture with driver quirks, and model scale may outpace 32GB faster than 48GB+ options. This is a speculative position for buyers comfortable with the operational overhead of legacy data center hardware in consumer settings.

RTX 3090 Ti — The Overlooked Bandwidth Leader

The RTX 3090 Ti offers 1,008 GB/s — equal to the RTX 4090 — at $700–1,100. The Ti suffix causes buyers to anchor on gaming benchmarks (“overkill”) rather than inference throughput (“bandwidth leader”). The RTX 4090 at the same bandwidth costs $1,600–2,000 and draws 50W more power. At $700–900, the 3090 Ti is systematically underpriced for inference relative to its bandwidth spec. It often appears in listings beside regular 3090s with only a 10–15% price premium despite 7.7% higher bandwidth. This is a pricing anomaly driven by naming convention confusion.

V. Investment Universe

Asset	Conviction	Current Price	Model Price (as inference node)	Buy Threshold	Position Size
RTX 3090	CONVICTION	$600–900	$900–1,200	Below $800	2–5% of investable capital
RTX 3090 Ti	CONVICTION	$700–1,100	$1,000–1,400	Below $900	2–5% of investable capital
A40 48GB	SPECULATIVE	$1,800–3,500	$2,500–4,000	Below $2,200	0.5–2% of investable capital
M2 Ultra 192GB	HIGH CONVICTION	$1,800–2,200	$3,000–4,000	Below $2,000	5–10% of investable capital
V100 32GB PCIe	SPECULATIVE	$600–1,200	$800–1,500	Below $800	0.5–2% of investable capital
Mac mini M4 Pro 64GB	MONITOR	$1,200–1,800	$1,500–2,000	Wait for M5 release to drop M4 prices	$0 for now
RTX 4090	NO EDGE	$1,600–2,000	$1,600–2,000	Market-priced by practitioners	$0

RTX 4090: Why There Is No Edge The RTX 4090 is fully understood by the r/LocalLLaMA practitioner community. Its 1,008 GB/s bandwidth and 24GB VRAM are correctly valued. Consensus has priced it. Buying a correctly-priced asset is not an investment thesis — it is consumption. The edge in this universe exists only where practitioner consensus is absent or ambivalent. RTX 4090 has neither.

Bandwidth per Dollar (Inference Efficiency)

GB/s per $100 spent (higher = better inference efficiency per dollar)

RTX 3090 @ $600

156 GB/s/$100

RTX 3090 Ti @ $900

112 GB/s/$100

V100 32GB @ $700

128 GB/s/$100

M2 Ultra 192GB @ $2,000

40 GB/s/$100

RTX 4090 @ $1,800

56 GB/s/$100

A40 @ $2,200

32 GB/s/$100

Note: M2 Ultra scores lower on GB/s per dollar but uniquely enables 192GB unified address space for models that cannot be quantized onto smaller VRAM without significant quality loss. The metric above does not capture the “model fits or does not fit” binary — where M2 Ultra is the only consumer option that runs 405B-class models at all.

Position Sizing Principle These are personal infrastructure purchases, not liquid financial instruments. Position size caps reflect liquidity constraints (you cannot exit 10 RTX 3090s on Carousell in a week) and utilization requirements (idle hardware is negative-return hardware). “5% of investable capital” on a $50,000 portfolio = $2,500 = 3–4 RTX 3090s. That is the practical maximum for individual liquidity.

VI. Risk Factors

Risk 1: DeepSeek / API Price Collapse DeepSeek V3 at $0.89/M blended already undercuts Claude Sonnet by 13×. If commodity API pricing falls another 10× to $0.089/M blended, the daily electricity cost of running a RTX 3090 ($1.26/day) exceeds the inference value generated at realistic utilization levels. Mitigation: buy below consumer electronics resale floor. The hardware is not worthless if the inference value collapses — it is worth its gaming resale value ($400+ for RTX 3090). The inference thesis becomes zero, not negative.

Risk 2: Model Scale Risk (24GB Obsolescence) The trend in frontier open-weight models is toward larger size and higher minimum VRAM requirements. If Llama 4 and its successors require 48GB+ as a practical minimum for useful inference, 24GB cards (RTX 3090, RTX 3090 Ti) lose their inference value proposition. This is a cliff risk, not a ramp — one model release could render 24GB cards obsolete for frontier work overnight. Mitigation: weight 48GB+ options (A40, M2 Ultra) more heavily. For 24GB cards, the utilization case must be current, not speculative.

Risk 3: Liquidity Risk The secondhand GPU market is thin. A single seller offering 10 RTX 3090s on Carousell competes with themselves. Liquidating a portfolio of secondhand compute hardware at model price (not distressed price) requires patience of weeks to months, not days. This is not a liquid asset. Position size limits are not conservative suggestions — they are liquidity constraints that determine whether you can exit without cratering your own price.

Risk 4: Obsolescence vs Depreciation Hardware depreciates on a predictable curve. Obsolescence is a cliff. An RTX 3090 bought today will be worth less in 12 months through normal depreciation; this is modeled and priced in. But if the 48GB threshold becomes the practical floor for frontier inference, the 3090 does not depreciate — it becomes worthless for the thesis. These are different risk profiles. Depreciation risk is manageable. Obsolescence risk is binary and requires the exit strategy to be planned at entry.

Risk 5: Utilization Trap A GPU running 2 hours per day at $0.15/kWh costs approximately $0.11/day in electricity. At 2 hours/day utilization, the inference value generated falls proportionally. A developer using the card for 2 hours/day might generate $1/day in API savings — which barely covers electricity and generates nothing against the capital cost. The investment only works with genuine heavy utilization. “I’ll use it eventually” is not a utilization plan. The buy decision requires honest assessment of current and near-term consumption.

Risk Scenario Matrix

Scenario	RTX 3090 Outcome	M2 Ultra Outcome	Probability Est.
Base: thesis holds, moderate use	+60–100% value as inference node; resale at $500–700	+80–140% value vs paid price; resale holds $1,600+	55%
API collapse: inference value drops 10x	Inference useless; resale as gaming GPU $400–600	Inference useless; resale as consumer Mac $1,400–1,800	20%
Model scale: 48GB becomes minimum	24GB worthless for inference; gaming resale $350–500	192GB remains viable; resale $1,500+	15%
Bull: local inference demand spikes	RTX 3090 revalued at $1,100–1,400; strong exit	M2 Ultra revalued at $3,500+; illiquid but high value	10%

VII. Monitoring Dashboard

These are the signals to track on a monthly basis to confirm or invalidate the thesis.

Thesis Confirmation Signals

Signal	What to Track	Source	Frequency
r/LocalLLaMA RTX 3090 sentiment	Is neutral/mixed sentiment holding, or is it shifting to bullish? If bullish: edge is shrinking.	Reddit r/LocalLLaMA	Monthly
eBay / Carousell price drift	RTX 3090 median price. If approaching $1,000+: buyers have repriced. Exit window is closing.	eBay sold listings; Carousell SG/HK	Monthly
Frontier model minimum VRAM	Does the leading open-weight model (Llama 4, etc.) require 48GB+ for practical use? If yes: 24GB cards are obsolete.	Hugging Face model cards; r/LocalLLaMA	On release
Commodity API blended price	DeepSeek V3 blended price per million tokens. If below $0.10/M: electricity parity pressure begins.	Artificial Analysis; provider pricing pages	Monthly
M2 Ultra used price	Has secondhand price risen above $2,500? That closes the arbitrage window.	eBay sold; Swappa; Xianyu	Monthly
NVIDIA 48GB+ consumer card announcement	Any announcement of a sub-$1,500 consumer card with 48GB+ VRAM invalidates the 24GB edge.	NVIDIA investor days; AnandTech; Tom’s Hardware	Quarterly

Exit Triggers

EXIT RTX 3090/Ti: r/LocalLLaMA sentiment exceeds +25, OR eBay median price exceeds $1,100, OR Llama 4 requires 48GB+ for practical 70B-class inference
EXIT M2 Ultra: Secondhand price exceeds $3,000 (profit-take), OR Apple releases M4 Ultra Mac Studio dropping M2 Ultra below $1,200 (reassess position)
EXIT all positions: Commodity API blended price falls below $0.15/M tokens (electricity breakeven at heavy use)
DO NOT EXIT on news alone: Wait for actual price signals, not announcements. Model release speculation does not move secondhand prices reliably

VIII. Sensor Network

Sensors Currently Active

Sensor	Signal	Last Reading	Quality
r/LocalLLaMA RTX 3090 thread sentiment	Practitioner demand signal	+10 (neutral/mixed). Threads: “RTX 3090 in 2026,” “Talk me out of buying RTX 3090 just for local AI”	Active
r/LocalLLaMA Mac mini M4 Pro sentiment	Practitioner demand signal	+30 (bullish). Thread “Mac Mini looks compelling now” had 911 upvotes	Active
eBay sold listings (RTX 3090)	Real transaction price discovery	$600–900 range; median ~$720	Periodic

Signal Gaps

Signal Gaps Limiting Thesis Precision

Live Carousell pricing: scraper blocked. Manual spot checks only. No time-series data on Carousell SG/HK price drift for specific SKUs.
Xianyu real-time inventory depth: Chinese platform; manual access required. Cannot assess supply-side volume to predict price sustainability.
NVLink dual-card pricing: Two RTX 3090s in NVLink config effectively create 48GB at 936 GB/s (two-card bandwidth). No clean pricing signal for this configuration on secondhand markets.
Electricity rate real-time data: No sensor on local utility rates. Analysis uses static $0.15/kWh assumption. Singapore/HK rates need actual tariff data.

Recommended Sensor Build

Priority sensor to build: a hardware price alert monitor for specific SKUs on Carousell SG/HK and Xianyu. Target implementation: a lightweight Python scraper that runs daily, extracts listings for RTX 3090, RTX 3090 Ti, A40 48GB, and M2 Ultra 192GB, and writes to a time-series store. Alert on: median price crossing the buy threshold upward (edge is closing) or downward (buying opportunity).

# Sensor spec: secondhand_compute_price_monitor
SKUs = [
    "RTX 3090",           # buy_threshold: $800
    "RTX 3090 Ti",        # buy_threshold: $900
    "A40 48GB",           # buy_threshold: $2,200
    "Mac Studio M2 Ultra 192GB",  # buy_threshold: $2,000
    "V100 32GB",          # buy_threshold: $800
]

Sources = ["carousell_sg", "carousell_hk", "xianyu", "ebay_sold"]

Alerts = {
    "below_threshold": "BUY SIGNAL — price below no-lose threshold",
    "above_exit":      "EXIT SIGNAL — price above model price, edge closing",
    "volume_spike":    "SUPPLY SIGNAL — unusual listing volume (impending oversupply)"
}

Verdict: THESIS HOLDS WITH CONDITIONS

The mispricing is real and currently actionable on two SKUs. RTX 3090 and M2 Ultra 192GB are priced by their respective markets as consumer electronics; they function as AI inference infrastructure. The structural exclusion of institutional buyers from the secondhand consumer market is durable and not arbitrageable by parties who would close the gap.

The conditions matter. This is not “buy any used GPU.” It is: (1) buy RTX 3090 or 3090 Ti below $800–$900 if you have genuine heavy inference consumption today; (2) buy M2 Ultra 192GB below $2,000 if you need to run 70B+ models at quality and have use for it; (3) do not buy RTX 4090 — the edge is fully priced; (4) the A40 is speculative but high-upside for buyers who can manage the passive cooler constraint.

Exit strategy is as important as entry. The asset is illiquid. The consumer electronics resale floor provides downside protection — use it. Monitor the three thesis invalidation signals monthly: API price collapse, model VRAM floor rising past 24GB, and Carousell price drift past buy threshold. When the edge closes, exit at consumer electronics price — you have lost nothing. When the thesis holds, you have run inference infrastructure for the cost of a gaming card.

IX. References

1. r/LocalLLaMA — “RTX 3090 in 2026” — practitioner sentiment thread, neutral/mixed consensus, sentiment +10

2. r/LocalLLaMA — “Talk me out of buying RTX 3090 just for local AI” — adversarial demand-testing thread; reinforces neutral consensus

3. r/LocalLLaMA — “Mac Mini looks compelling now... Cheaper than a 5090 and near double the VRAM” — 911 upvotes; bullish consensus forming on M4 Pro, sentiment +30

4. NVIDIA Technical Specifications — RTX 3090, RTX 3090 Ti, RTX 4090 memory bandwidth specs — 936 GB/s (3090), 1,008 GB/s (3090 Ti / 4090)

5. Apple — Mac Studio M2 Ultra technical specifications — 192GB unified memory, 800 GB/s memory bandwidth, original retail $6,000+

6. Anthropic — Claude API pricing (claude.ai/pricing) — Claude Sonnet blended ~$12/M tokens; reference for API cost savings calculation

7. DeepSeek — DeepSeek V3 API pricing — $0.89/M blended tokens at time of research; principal commodity API price reference

8. eBay sold listings — RTX 3090, RTX 3090 Ti, A40 48GB, V100 32GB PCIe — transaction price data; median RTX 3090 ~$720, range $600–900

9. Carousell SG/HK — RTX 3090 and Mac Studio listings — consumer electronics pricing framework confirmed; “gaming GPU” categorization

10. Hugging Face — Llama 3.3 70B model card, quantization specs — Q4_K_M quantization fits 24GB VRAM; full precision requires 140GB+

11. Hugging Face — Llama 3.1 405B model card — 405B at Q4 requires ~216GB; M2 Ultra 192GB cannot hold unquantized; fits at aggressive Q4

12. Artificial Analysis — LLM Benchmark Index — API price benchmarking across providers; used for commodity price monitoring

13. Tom’s Hardware — RTX 3090 Ti review and specifications — 1,008 GB/s confirmed; power draw 450W TDP

14. NVIDIA — A40 whitepaper — 48GB GDDR6, 696 GB/s, passive cooler design; workstation card pricing context

15. NVIDIA — V100 PCIe product brief — 32GB HBM2, 897 GB/s; data center pricing context vs. used consumer market