🏪 Feature Stores: Do You Actually Need One?
Because sometimes a fancy solution is just a glorified CSV.

Forward-thinking IT Operations Leader with cross-domain expertise spanning incident & change management, cloud infrastructure (Azure, AWS, GCP), and automation engineering. Proven track record in building and leading high-performance operations teams that drive reliability, innovation, and uptime across mission-critical enterprise systems. Adept at aligning IT services with business goals through strategic leadership, cloud-native transformation, and process modernization. Currently spearheading application operations and monitoring for digital modernization initiatives. Deeply passionate about coding in Rust, Go, and Python, and solving real-world problems through machine learning, model inference, and Generative AI. Actively exploring the intersection of AI engineering and infrastructure automation to future-proof operational ecosystems and unlock new business value.
Gradient Descent Weekly — Issue #22
“We should use a feature store!”
— someone who just read an Uber blog post.
Hold up.
Before you spin up Feast, Tecton, Vertex AI, or write your own mini Frankenstein store on S3 + Redis + Duct Tape™ — ask the only question that matters:
Do you really need a feature store, or are you just chasing architecture diagrams?
In this issue, we’ll:
Explain what a feature store is (minus the marketing)
Break down real use cases
Expose hidden costs
Help you decide when it’s overkill
🧠 What Is a Feature Store (Really)?
A feature store is a system for:
Creating features (centralized, versioned transformation logic)
Storing them (offline in data warehouses, online in key-value stores)
Serving them (real-time inference or batch training)
Sharing them (across teams and models)
Tracking their lineage and freshness
🚨 It is not just a database. It’s the “data platform for ML features.”
🧩 Why Feature Stores Exist
ML teams kept reinventing the same wheels:
Duplicated feature logic across training and serving
Different pipelines producing “same” features with slight inconsistencies
No governance or catalog of what features exist
Huge delays between data scientists creating features and engineers productionizing them
So feature stores said:
Let’s centralize this mess.
And it works — but only when the mess is big enough.
✅ Signs You Probably Need a Feature Store
| Signal | Description |
| 🧑🤝🧑 You have multiple ML teams | Prevent reinventing features across org |
| ⚙️ You deploy real-time ML models | Need low-latency online feature serving |
| 🔁 You retrain models frequently | Want consistent train/serve feature logic |
| 📚 You have 100s of features in prod | Need search, reuse, and documentation |
| 🔄 Features get recomputed constantly | Freshness tracking matters |
| 🧵 Compliance matters (HIPAA, GDPR, etc.) | Lineage + audit trail is essential |
💡 In these cases, a feature store prevents chaos and speeds up iteration.
❌ Signs You Should Not Use One (Yet)
| Signal | Description |
| 👤 You’re a solo or small team | The overhead outweighs benefits |
| 🧪 You have only a few models | No ROI on centralizing infra |
| 🗃️ You train monthly on static data | Batch pipeline is good enough |
| 🧼 Your features are simple/flat | SQL or Pandas covers it fine |
| 📦 You don’t need real-time serving | No need for online store at all |
🔍 In these cases, you're better off with:
A well-structured
features.pymoduleVersioned datasets in S3 or Delta Lake
A README with data lineage
You’ll be faster, leaner, and more in control.
💸 The Hidden Costs of Feature Stores
| Category | Cost |
| 🧠 Learning curve | Tools like Feast and Tecton aren’t plug & play |
| ⏳ DevOps overhead | Need infra for offline + online stores |
| 📈 Complexity | One more moving piece in your MLOps pipeline |
| 🧾 Vendor lock-in | Some solutions don’t port well between clouds |
| 🧪 Test burden | You now test feature pipelines, not just models |
Feature stores are infra. They need care and feeding.
🍱 What Feature Stores Actually Manage
| Component | Examples |
| Offline Store | BigQuery, Snowflake, S3, Delta Lake |
| Online Store | Redis, DynamoDB, Cassandra |
| Transformations | Python, Spark, SQL, dbt |
| Metadata/Registry | Catalog of features, freshness, owners |
| Feature Serving API | Real-time REST/gRPC calls to serve features |
You don’t need all of these. Start small if needed.
⚖️ DIY vs SaaS Feature Stores
| Option | Pros | Cons |
| 🛠 DIY (Feast + Redis) | Cheap, customizable | Needs setup, lacks UI |
| ☁️ SaaS (Tecton, Hopsworks) | Scalable, integrated | Costly, learning curve |
| 🧾 Hybrid (dbt + S3 + notebook) | Minimal setup | No real-time capabilities |
Start with what fits your stack and scale — not what sounds coolest.
✅ Decision Checklist: Do You Need a Feature Store?
Do you share features across 3+ models or teams?
Do you need consistent train/serve parity?
Do you serve predictions in real-time?
Do you track feature freshness or staleness?
Do you want to reuse features without rewriting logic?
If you checked 3 or more — explore lightweight solutions like Feast or Feathr.
If you checked 1–2 — stay with your custom Python/SQL pipelines for now.
If you checked 0 — congrats, you avoided accidental complexity.
🧠 Final Thoughts: Build For Your Size
Feature stores solve real pain.
But introducing them before the pain exists? That’s just self-inflicted.
Don’t optimize for Uber’s scale if you’re not Uber.
Instead:
Start with modular, reusable feature scripts
Use notebooks + MLflow to track lineage
Add versioning manually
Scale up when growth makes it too painful not to
You don’t win awards for the fanciest infra.
You win when models ship, stay live, and drive impact.
🔮 Up Next on Gradient Descent Weekly:
- RAG vs Fine-Tuning: Which One Is Right for You?






