Most teams approach AI data enrichment as an either/or decision. They pick HubSpot Breeze or an external LLM, push it across their entire contact base, and wonder why the results are patchy. The answer is not a better tool. It's a better split.
HubSpot Breeze and external LLMs are built for fundamentally different signals. Breeze has something no third-party vendor can replicate: access to everything inside your CRM. External LLMs have something Breeze cannot reliably do: search the open web. Run them in parallel, with clear ownership for each, and you get coverage neither can deliver alone.
This framework comes from Gareth Jones, Managing Partner at Oxygen Strategic Partners, who presented on intent signals at a recent HubSpot User Group session.
What Breeze Actually Does Well: First-Party Data Enrichment
Synthesising Your Conversation History
Breeze's real advantage is access. No external vendor — no matter how large their database — can read your meeting transcripts, call recordings, email threads, deal notes, or CRM activity history. Breeze can. Launched at INBOUND 2024, Breeze Intelligence draws on a database of over 200 million buyer and company profiles for standard enrichment, but the more interesting capability is what it can infer from your own first-party conversation data — buying role, preferred communication style, primary interest topic, hesitation signals, and qualitative contact-level context that no third-party source could reconstruct.
This makes Breeze genuinely strong for teams running active sales cycles where the CRM contains real conversation history. If a contact has been through three discovery calls, two demos, and a handful of email exchanges, Breeze can synthesise that into structured properties. That synthesis is not something you can approximate by pointing an external LLM at a LinkedIn profile.
As Gareth put it: "Anything that lives in HubSpot, I would say just use Breeze."
Technical Limitations Worth Knowing
Breeze currently processes transcript text, not audio. It is not fully multimodal. That means it can detect tone through word choice, such as hedging language, enthusiasm, or reluctance, but it cannot pick up on how something was said, which matters for heavily context-dependent signals.
WhatsApp conversations are also not yet a supported data source, though that is expected to change. For teams in APAC and the GCC who rely heavily on WhatsApp for client communication, this is a real gap in the current implementation. If your business depends on messaging platforms for client relationships, it is worth factoring this into how you structure your CRM data capture in the meantime.
Smart Properties: The Underused Capability Most Teams Miss
Beyond the standard enrichment fields, Breeze supports AI-generated custom properties, called smart properties, that can pull from any activity on a HubSpot object. Most teams use Breeze for contact and company summaries and stop there. Smart properties let you go further: close and loss reasons pulled from deal activity, qualitative consultant performance signals from delivery call transcripts, preferred channel analysis, and hesitation patterns across a buying cycle.
Each smart property costs 10 HubSpot credits to populate, roughly one cent on Pro (where you get 5,000 credits included) or 10,000 on Enterprise. Selective deployment across your highest-value segments is essentially a rounding error in most CRM budgets.
Why Adoption Is Low
The practical constraint is workflow: smart properties currently have to be created through the AI prompt interface rather than the standard property builder, which means your CRM admin needs to know they exist and where to find them. They are not prominently surfaced, and that is probably why adoption is low even among experienced HubSpot users.
The use cases extend beyond sales. If you are running post-implementation reviews or delivery calls with clients, Breeze can surface qualitative performance signals from those transcripts at the consultant or account level. For professional services firms or agencies managing complex multi-stakeholder engagements, that kind of structured qualitative data is otherwise entirely invisible in the CRM.
Where External LLMs Outperform Breeze
Signals That Require the Open Web
Breeze cannot search the web reliably. Its native web research capability exists but is, as one HUG attendee described it, "a bit finicky," a characterisation Gareth confirmed, which is why Oxygen routes all web-search enrichment tasks through Gemini externally. For horizontal signals (firmographic data that applies across industries) and vertical signals (sector-specific intelligence), external LLMs are the better tool.
The signals that fall into this category include hiring intent derived from job postings, M&A activity, regulatory certifications, export capacity, technology stack changes, and content freshness on a company's digital presence. None of this lives in your CRM. All of it requires open-web access and, often, synthesis across multiple sources. An LLM with web search enabled, whether Gemini, GPT-4o with browsing, or Claude with appropriate tooling, can pull and structure this faster and more cost-effectively than a human researcher, and at the scale needed for meaningful segmentation.
Choosing the Right Model
API token pricing across Claude, GPT, and Gemini is roughly equivalent on current-generation models. LLM selection for external enrichment should be driven by output quality for the specific signal type you are targeting, not by cost. Test with a sample of 20 to 50 records before committing to a prompt design or a specific model for a given signal type.
Gareth's current approach: "Right now we're 100% on horizontal and vertical signals using LLMs, and 100% HubSpot Breeze on the proprietary data." That clean split is the right starting point. Refinement comes from testing, not from assumptions about which model is generically better.
The APAC and GCC Data Quality Reality
Match Rates by Region
If your contacts are primarily based in the US or Western Europe, the Breeze free enrichment tier, which covers approximately 13 standard properties, will return useful data at a 95% or higher match rate. For APAC-based contacts, expect around 50%. That is a significant improvement from the 15% match rate Gareth observed roughly a year ago, reflecting how quickly HubSpot is expanding its underlying data coverage, but it still means half your APAC contacts will come back empty on standard enrichment fields.
What This Means in Practice
The free Breeze enrichment pass is still worth running across your full APAC and GCC contact base. At zero marginal cost, even a 50% match rate delivers value. But it is not a complete solution, and you need a parallel process for the contacts it cannot cover. That is where external LLMs and specialist data vendors earn their place.
For European contacts, the gap between Breeze and external alternatives is much smaller. Breeze likely covers most of what you need without supplementary tooling. This regional variation also affects how you think about CRM strategy more broadly. A global HubSpot deployment that performs well in London or Chicago will behave differently when the contact base shifts to Hong Kong, Shenzhen, or Dubai, affecting enrichment match rates, data structure, communication channels, and localisation requirements. Our guide to thinking locally within a global CRM strategy covers that broader challenge in detail.
How to Structure the Decision in Practice
The simplest way to operationalise this is two separate enrichment passes with clear ownership for each.
Pass one: Breeze
Run the free enrichment across all contacts and companies, then deploy smart properties on the segments where first-party conversation data is richest. This costs nothing on the standard enrichment side and fractions of a cent per smart property.
Pass two: External LLMs
Use these for signals that require web research, structured as batch API calls against your CRM export or directly via HubSpot's workflow integrations. Before choosing a model, test against a representative sample of 30 to 50 records that reflect your typical customer profile. Run the same prompt across Gemini and GPT-4o and evaluate the outputs on accuracy, parsability, and relevance to the specific signal you are trying to extract.
What Not to Optimise For
Cost should not drive model selection. At current API pricing, the difference between models is negligible for most enrichment volumes. Optimise for output quality.
Constellation Research's analysis of Breeze Intelligence frames the core value proposition as reducing the complexity and cost of managing disconnected enrichment point solutions. That holds, but it only fully applies to the first-party data use case. For most enterprise teams operating in APAC or GCC markets, Breeze and external LLMs are complements. The teams that treat them as a structured two-pass system will get more signal coverage, better data quality, and a cleaner implementation than either approach alone could deliver.