- ai-implementation
- ai-maintenance
- ai-cost
- production-ai
- software-ownership
Full analysis
The question your team is asking when evaluating AI integration is usually one of these: Is this AI good enough to do the task? How much will it cost to build? Will our users find it useful?
These are reasonable questions. They are also the second, third, and fourth questions. The first question — the one almost nobody asks before committing — is this: Who maintains this feature in month 18, and do we have the budget and expertise to give them what they need?
If you don't have a clear answer, you don't have a plan to add AI. You have a plan to take on a new category of technical debt.
Why AI Features Have a Different Cost Structure
Traditional software has a property that makes long-term maintenance relatively predictable: determinism. Given the same inputs, the same code produces the same outputs. The cost of operating traditional software after delivery is mostly infrastructure and labour for new features and bug fixes. These costs are well understood and have decades of estimation heuristics behind them.
AI-powered features break this predictability in four specific ways.
Prompt Maintenance
Language model behavior is sensitive to the exact phrasing of prompts. This sensitivity is not static — it changes as models are updated. A prompt that produces consistent, high-quality output today may produce noticeably different output after the next model update, even on the same model name. The model provider considers this normal. Your users consider it your problem. This means AI features require active monitoring and periodic prompt revision in response to model changes. Traditional code does not have this property.
Model Deprecation Cycles
AI model providers deprecate models on defined cycles. When your feature depends on a deprecated model, you face a migration on the vendor's schedule: test the new model, revise prompts, validate output quality, redeploy. This is not a hypothetical future cost — it is a scheduled one, predictable from the moment you commit to an AI-dependent feature.
Behavior Monitoring
Traditional software either works or it doesn't. When it doesn't, it usually fails visibly. AI-powered features fail differently: outputs that are subtly wrong, inconsistently wrong, or wrong in ways users don't immediately notice and report. You cannot rely on user-reported errors to maintain quality. You need proactive output monitoring — someone or something that evaluates a sample of AI outputs regularly against a defined standard.
Cost Volatility from Usage Scaling
AI API costs are typically per-token: every input and output is metered. A 10x increase in AI interactions produces something close to a 10x increase in AI infrastructure costs. This creates budget volatility that traditional software budgets don't prepare teams for. Most traditional software costs don't scale this linearly with usage.
When This Doesn't Apply
This argument is not a case against using AI. It applies less strongly to: single-use or batch workflows where a human reviews every output before it is acted on (if the AI is producing drafts for human editing, prompt drift is caught by the editor); internal tools where behavior variance is acceptable (an AI-assisted internal search that sometimes surfaces less-relevant results is annoying, not the same risk as a customer-facing feature with quality problems); and time-limited experimental uses with defined evaluation criteria and a built-in decision point.
The Better Question to Ask First
Before evaluating whether AI is capable enough to do the task, answer this with specificity: After this feature ships, who checks that it's working correctly next month? And the month after? And when the model provider releases an update that changes its behavior?
If the honest answer is "we'll deal with that later": the feature cost in your proposal is missing the largest line item. If the honest answer is "nobody, because we're assuming it will just work": the feature will probably work fine for the first few months — the period when your team is still paying attention and the model hasn't been updated yet. Month 18 is a different conversation.
The question is not whether to use AI. The question is whether you are prepared to maintain an AI-powered feature at a production standard for the full period you intend to use it. Most decisions to add AI are made on the first commitment while quietly ignoring the second.
Name the maintainer before you name the model. If you can't name the maintainer, you have more planning to do.
What is the "month 18 question" for AI features?
Before committing to build an AI-powered feature, ask: who maintains this feature in month 18, and does the organization have the budget and expertise to support them? This question exists because AI features have ongoing maintenance requirements — prompt revisions when model behavior changes, forced migrations when models are deprecated, proactive quality monitoring — that traditional software does not. Most teams evaluate AI features on capability and build cost, treating maintenance as an afterthought. The month 18 question surfaces the commitment that is actually being made.
Why do AI features require ongoing maintenance when traditional software does not?
Traditional software is deterministic: the same inputs always produce the same outputs. AI-powered features are not. Language model behavior is sensitive to prompt phrasing and changes as models are updated — so prompts that worked today may produce different output after a model update, even when the model name stays the same. Models are also deprecated on defined cycles, requiring forced migrations. And AI failures are often subtle rather than visible, requiring proactive quality monitoring that traditional software does not need.
Are there AI use cases where the maintenance burden is lower?
Yes. The maintenance burden is lower for: single-use or batch workflows where a human reviews every output before it is acted on (prompt drift is caught by the editor, not a monitoring system); internal tools where behavior variance is acceptable (an AI search that occasionally surfaces less-relevant results is annoying, not a production risk); and time-limited experimental uses with defined evaluation criteria and a built-in decision point to continue or stop. The maintenance argument applies most strongly to persistent, customer-facing AI features running in production without human review of individual outputs.
Share this article
Copy the article URL or use your device share sheet.