In 2023, we saw the fastest enterprise adoption ever of a new technology — Large Language Models (LLMs). The trickle down effect of new technology from research and early adopters to enterprise-scale typically takes years of validation and maturation. In the case of LLMs, this has played out at light speed. LLMs showed such incredible promise that virtually no enterprise wanted to be left behind in its adoption. So 2023 saw board mandates and increased AI budgets which resulted in a wide majority of teams trialing the technology and a handful making it to production.
During this time, LLM Operations or LLMOps (analogous to MLOps and DevOps) has matured rapidly as Enterprises get ready to scale their deployments. Enterprise deployments range from prompt engineered approaches to Retrieval Augmented Generation (RAG) to fine-tuning or sometimes, even training their own models. See Four Approaches to LLMOps.
The LLMOps “MOOD” Stack (Models, Observability, Orchestration, Data)
The early 2000’s saw a rise in standardization of websites and web applications in the emergence of the LAMP stack. It was a reference to four different software technologies that developers use to build websites and web applications — Linux (the operating system); Apache (the web server); MySQL (the database); and PHP (the programming language).
The LLMOps stack has similarly converged to a 4-layered “MOOD” stack — Models, Observability, Orchestration, and Data — that all LLM powered apps are being built on.
- Data: The prompts and responses with their underlying embeddings, test sets, labeled data, business, and technical metadata are all generated by a labeling process and increasingly anchored in a vector database. Embeddings form the foundation of an LLMOps implementation, and these databases make querying embedding vectors easy. Entrants like Pinecone, DataStax, and Chroma are the most popular.
- Modeling: LLM apps can use one or more models across proprietary and open source offerings. The industry had largely coalesced around five foundation model providers — OpenAI, Llama, Anthropic, Cohere, and Mistral. These models are either fine-tuned, RAG engineered, or directly prompt engineered across all the big cloud LLM platforms using a gamut of tooling for experimentation, tuning, and serving.
- Orchestration: Production LLM apps typically have to integrate their data, model, and business infrastructure workflows together through orchestration. LangChain and LlamaIndex are two of the most recent orchestration solutions popular amongst app developers. Enterprises can also have a high degree of custom needs that warrant a homegrown solution.
- Observability: Finally, the Observability layer sits on top to provide governance, interpretability, and operational performance and risk visibility of the LLMs to multiple stakeholders across the enterprise. AI Observability solutions like Fiddler provide evaluation, production monitoring, analytics, and security support for comprehensive operational visibility. Enterprises are often using multiple LLMs simultaneously based on cost, and criticality mixing models across use cases or even in the same use case. As multi-model deployment becomes common, observability offers a single pane of glass for business owners and model developers.
The LAMP stack powered an era of internet growth by bringing efficiency, flexibility, and support through its standardization. As more LLMs get deployed into production, the MOOD stack can bring similar efficiencies in LLM development.
Contact us to learn how the MOOD stack can help you operationalize LLMs.