<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Databricksters]]></title><description><![CDATA[Field-tested Databricks solutions from specialists who deployed them]]></description><link>https://www.databricksters.com</link><image><url>https://substackcdn.com/image/fetch/$s_!zPJJ!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff49ecae-7c56-403c-9389-61b28de6a50f_1280x1280.png</url><title>Databricksters</title><link>https://www.databricksters.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 07 May 2026 10:58:16 GMT</lastBuildDate><atom:link href="https://www.databricksters.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Soni]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[databricksters@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[databricksters@substack.com]]></itunes:email><itunes:name><![CDATA[Canadian Data Guy]]></itunes:name></itunes:owner><itunes:author><![CDATA[Canadian Data Guy]]></itunes:author><googleplay:owner><![CDATA[databricksters@substack.com]]></googleplay:owner><googleplay:email><![CDATA[databricksters@substack.com]]></googleplay:email><googleplay:author><![CDATA[Canadian Data Guy]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Multi-Genie Slack Bot: Production-Grade Conversational Analytics in Slack with Databricks Genie Spaces]]></title><description><![CDATA[Connect multiple Databricks Genie spaces to Slack with per-user authentication, intelligent routing, semantic caching, and audit logging &#8212; all deployed as a Databricks App.]]></description><link>https://www.databricksters.com/p/multi-genie-slack-bot-production</link><guid isPermaLink="false">https://www.databricksters.com/p/multi-genie-slack-bot-production</guid><dc:creator><![CDATA[Ambarish]]></dc:creator><pubDate>Fri, 01 May 2026 15:02:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5ne9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5ne9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5ne9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!5ne9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!5ne9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!5ne9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5ne9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5ne9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!5ne9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!5ne9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!5ne9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05a5f66a-7283-47b9-b2b7-279a26945862_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Connect multiple Databricks Genie spaces to Slack with per-user authentication, intelligent routing, semantic caching, and audit logging &#8212; all deployed as a Databricks App. A production-grade Slack bot that brings Databricks Genie directly into Slack, allowing users to ask natural-language questions across multiple data domains without leaving their workspace.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><em>Special Thanks</em></h2><p>I extend my sincere thanks to <em><strong>Sunil Patil</strong></em> (<a href="https://www.linkedin.com/in/pppsunil/">https://www.linkedin.com/in/pppsunil/</a>) for his invaluable help during the brainstorming phase. His insights were instrumental in solving the complex challenge of managing multiple genies across multiple Slack channels. This project definitely levelled up thanks to his help!</p><h1>Features</h1><ul><li><p>Multi-Genie Support: Route questions to the right Genie space based on channel, thread, AI classification</p></li><li><p>Per-User Authentication: Use OAuth 2.0 User-to-Machine with PKCE so queries run under each user&#8217;s Unity Catalog permissions</p></li><li><p>Threaded Conversations: Maintain Genie context across Slack threads</p></li><li><p>Semantic Cache: Reuse answers for similar questions using Databricks Vector Search</p></li><li><p>Delta-Aware Cache Invalidation: Automatically invalidate cached answers when underlying Delta tables change</p></li><li><p>Audit Logging: Track user activity, cache hits, latency, errors, and Genie usage in Delta tables</p></li><li><p>Reliable Genie Calls: Handle rate limits with retry, exponential backoff, and adaptive polling</p></li><li><p>Native Deployment: Run the entire application as a Databricks App with no external infrastructure</p></li></ul><h1>Architecture</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6d3_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6d3_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 424w, https://substackcdn.com/image/fetch/$s_!6d3_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 848w, https://substackcdn.com/image/fetch/$s_!6d3_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 1272w, https://substackcdn.com/image/fetch/$s_!6d3_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6d3_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png" width="1456" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6d3_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 424w, https://substackcdn.com/image/fetch/$s_!6d3_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 848w, https://substackcdn.com/image/fetch/$s_!6d3_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 1272w, https://substackcdn.com/image/fetch/$s_!6d3_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4984365d-ea4d-48d0-aff7-1f56863a02b5_1692x930.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The application runs as a Databricks App and uses two services:</p><ul><li><p>Slack Bolt in Socket Mode to listen for messages, mentions, button clicks, and threaded replies</p></li><li><p>Flask to handle OAuth callbacks and health checks</p></li></ul><p>When a user asks a question in Slack, the bot follows this flow:</p><p>1. Checks whether the Slack user has a valid Databricks OBO token</p><p>2. Routes the question to the correct Genie space</p><p>3. Looks for a semantically similar cached response</p><p>4. Validates cached answers against Delta table versions</p><p>5. Sends the question to Genie on cache miss</p><p>6. Formats the response in Slack with SQL, tables, and suggested follow-ups</p><p>7. Writes cache metadata and audit logs to Delta</p><h1>Prerequisites</h1><p>1. Databricks workspace</p><ul><li><p>Active Databricks workspace</p></li><li><p>One or more Genie spaces</p></li><li><p>SQL Warehouse</p></li><li><p>Unity Catalog permissions</p></li><li><p>Vector Search endpoint and index</p></li></ul><p>2. Slack App</p><ul><li><p>Slack workspace with admin access</p></li><li><p>Slack app with Socket Mode enabled</p></li><li><p>Bot token and app-level token</p></li></ul><p>3. OAuth App</p><ul><li><p>Databricks OAuth application configured for User-to-Machine authentication</p></li><li><p>Redirect URI pointing to the Databricks App callback endpoint</p></li></ul><p>4. Download the application code and make the necessary updates to the config values in app.yaml. Download all files from repo: <a href="https://github.com/adgitdemo/ad_databricks/tree/main/multi-genie-multi-slack-channel-app">https://github.com/adgitdemo/ad_databricks/tree/main/multi-genie-multi-slack-channel-app</a></p><h1>Setup Instructions</h1><h2>1. Slack App Setup</h2><ul><li><p>Create a Slack app and enable Socket Mode.</p></li><li><p>Under OAuth &amp; Permissions, add these bot token scopes:</p><ul><li><p>app_mentions:read</p></li><li><p>chat:write</p></li><li><p>im:history</p></li><li><p>im:read</p></li><li><p>im:write</p></li><li><p>channels:history</p></li><li><p>groups:history</p></li><li><p>users:read</p></li></ul></li><li><p>Subscribe to these bot events:</p><ul><li><p>app_mention</p></li><li><p>message.im</p></li><li><p>message.channels</p></li><li><p>message.groups</p></li></ul></li></ul><ul><li><p>Install the app to your workspace and save:</p><ul><li><p>Bot User OAuth Token</p></li><li><p>App-Level Token</p></li><li><p>Signing Secret</p></li></ul></li></ul><h2>2. Databricks OAuth Setup</h2><ul><li><p>Create a Databricks OAuth application for User-to-Machine authentication.</p></li><li><p>Configure the redirect URI to point to the Flask callback route exposed by the Databricks App.</p></li></ul><blockquote><p>The bot uses OAuth 2.0 with PKCE to securely link each Slack user to their Databricks identity. After authentication, the user&#8217;s pending question is automatically processed, so they do not need to re-type it.</p></blockquote><h2>3. Databricks Workspace Setup</h2><ul><li><p>Prepare the required Databricks resources:</p><ul><li><p>Genie spaces</p></li><li><p>SQL Warehouse</p></li><li><p>Cache Delta table</p></li><li><p>Audit Delta table</p></li><li><p>Vector Search endpoint</p></li><li><p>Vector Search index</p></li><li><p>Catalog and schema permissions</p></li></ul></li></ul><ul><li><p>Permission Grants Summary</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kjO1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kjO1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 424w, https://substackcdn.com/image/fetch/$s_!kjO1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 848w, https://substackcdn.com/image/fetch/$s_!kjO1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 1272w, https://substackcdn.com/image/fetch/$s_!kjO1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kjO1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png" width="714" height="289" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:289,&quot;width&quot;:714,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/196065016?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kjO1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 424w, https://substackcdn.com/image/fetch/$s_!kjO1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 848w, https://substackcdn.com/image/fetch/$s_!kjO1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 1272w, https://substackcdn.com/image/fetch/$s_!kjO1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dcfacee-51fb-4db4-974d-ed5726c4ce5e_714x289.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>4. Configure the Application</h2><ul><li><p>Update the application configuration with:</p><ul><li><p>Slack bot token</p></li><li><p>Slack app token</p></li><li><p>Slack signing secret</p></li><li><p>Databricks workspace URL</p></li><li><p>OAuth client ID</p></li><li><p>OAuth redirect URI</p></li><li><p>Genie space mappings</p></li><li><p>Default Genie space alias</p></li><li><p>SQL Warehouse ID</p></li><li><p>Cache table name</p></li><li><p>Audit table name</p></li><li><p>Vector Search endpoint and index</p></li><li><p>Cache similarity threshold</p></li><li><p>Optional Service Principal fallback flag</p></li></ul></li></ul><ul><li><p>Example routing configuration:</p></li></ul><blockquote><p>{</p><p>  &#8220;trips-data&#8221;: {</p><p>    &#8220;space_id&#8221;: &#8220;your-trips-genie-space-id&#8221;,</p><p>    &#8220;description&#8221;: &#8220;Questions about trips, pickup locations, fares, and distance&#8221;</p><p>  },</p><p>  &#8220;finance&#8221;: {</p><p>    &#8220;space_id&#8221;: &#8220;your-finance-genie-space-id&#8221;,</p><p>    &#8220;description&#8221;: &#8220;Questions about revenue, spend, forecasts, and financial KPIs&#8221;</p><p>  }</p><p>}</p></blockquote><h1>Usage</h1><h2>In Slack</h2><h3>Direct Messages:</h3><p>Send a DM to your bot</p><p>Ask questions like: &#8220;Show me sales data for last month&#8221;</p><h3>Channel Mentions:</h3><p>Invite the bot to a channel: /invite @YourBotName</p><p>Mention the bot: @YourBotName what are the top 10 customers?</p><h3>Threaded Conversations:</h3><p>Continue asking questions in a thread</p><p>The bot maintains conversation context within threads</p><h1>Demo Time</h1><p>In the General Slack Channel, the bot will use the default Genie Space to answer questions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I51n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I51n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 424w, https://substackcdn.com/image/fetch/$s_!I51n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 848w, https://substackcdn.com/image/fetch/$s_!I51n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 1272w, https://substackcdn.com/image/fetch/$s_!I51n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I51n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png" width="1385" height="556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:1385,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I51n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 424w, https://substackcdn.com/image/fetch/$s_!I51n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 848w, https://substackcdn.com/image/fetch/$s_!I51n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 1272w, https://substackcdn.com/image/fetch/$s_!I51n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ad4add-ebf6-4f46-af8a-06c7b1f8f004_1385x556.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>DM Thread, the bot will use the default Genie Space to answer questions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YUbT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YUbT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 424w, https://substackcdn.com/image/fetch/$s_!YUbT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 848w, https://substackcdn.com/image/fetch/$s_!YUbT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 1272w, https://substackcdn.com/image/fetch/$s_!YUbT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YUbT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png" width="992" height="924" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:924,&quot;width&quot;:992,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YUbT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 424w, https://substackcdn.com/image/fetch/$s_!YUbT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 848w, https://substackcdn.com/image/fetch/$s_!YUbT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 1272w, https://substackcdn.com/image/fetch/$s_!YUbT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6a3b38-6010-4d03-9418-d59b735d71a6_992x924.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a dedicated Trip Slack Channel, the bot will use the Trip Genie Space to answer questions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rglc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rglc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 424w, https://substackcdn.com/image/fetch/$s_!rglc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 848w, https://substackcdn.com/image/fetch/$s_!rglc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 1272w, https://substackcdn.com/image/fetch/$s_!rglc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rglc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png" width="1386" height="858" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:858,&quot;width&quot;:1386,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rglc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 424w, https://substackcdn.com/image/fetch/$s_!rglc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 848w, https://substackcdn.com/image/fetch/$s_!rglc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 1272w, https://substackcdn.com/image/fetch/$s_!rglc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e1c3ab-ea35-4c4f-9310-70cd7914ed0c_1386x858.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a dedicated weather Slack Channel, the bot will use the Weather Genie Space to answer questions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WUuT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WUuT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 424w, https://substackcdn.com/image/fetch/$s_!WUuT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 848w, https://substackcdn.com/image/fetch/$s_!WUuT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 1272w, https://substackcdn.com/image/fetch/$s_!WUuT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WUuT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png" width="1399" height="596" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f02a528-ad26-451c-a262-616c7677f209_1399x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1399,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WUuT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 424w, https://substackcdn.com/image/fetch/$s_!WUuT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 848w, https://substackcdn.com/image/fetch/$s_!WUuT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 1272w, https://substackcdn.com/image/fetch/$s_!WUuT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f02a528-ad26-451c-a262-616c7677f209_1399x596.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Stop Settling for Seconds: Building 5ms Ultra Low latency Pipelines on Databricks]]></title><description><![CDATA[A comprehensive demo/ framework to easily understand/tune/customize your mission critical real time mode workload using Databricks' Spark Realtime mode (RTM)]]></description><link>https://www.databricksters.com/p/stop-settling-for-seconds-building</link><guid isPermaLink="false">https://www.databricksters.com/p/stop-settling-for-seconds-building</guid><dc:creator><![CDATA[Emad R]]></dc:creator><pubDate>Tue, 14 Apr 2026 15:02:15 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187423058/e1a5bc1b0c5bb39ad5e3b61a43fb1e4d.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong>Introduction/ Preface</strong></p><p><a href="https://docs.databricks.com/aws/en/structured-streaming/real-time">Real-time streaming in the  Spark ecosystem</a> (RTM) is a great leap in Spark structured streaming, as it guarantees an end-to-end processing as low as 5 ms</p><p>This directly serves use cases where ultra-low latency is crucial for  operations, such as financial transactions and infrastructure monitoring, to name a few, which can directly translate to revenue loss or business image if not addressed in timely manners with Databricks&#8217; RTM streaming </p><p>In this blog, we will introduce a <a href="https://github.com/EmadRizk-db/kafka-spark-rtm-lakebase">simple demo notebook(s)</a> focusing on:</p><ul><li><p>Creating a simulated streaming dataset in a message bus system (Kafka) with control over</p><ul><li><p>Programmatically introducing duplicated records in the upstream</p></li><li><p>Queue partitions (As it impacts downstream processing/ patallelism)</p></li><li><p>Rows/Sec for writing into the message bus (Kafka is for demo purposes here but this can be any of the currently <a href="https://docs.databricks.com/aws/en/structured-streaming/real-time/reference#sources-and-sinks">supported sources by RTM</a></p></li></ul></li><li><p>Reading real-time messages from the message bus in Spark real-time streaming mode and writing down to Databricks&#8217; Lakebase serverless instance, including:</p><ul><li><p>A simple, direct write (no transformation) using <a href="https://docs.databricks.com/aws/en/structured-streaming/real-time">RTM</a></p></li><li><p>A simple stateful transformation (<a href="https://docs.databricks.com/aws/en/structured-streaming/real-time#deduplication">deduplication</a>)</p></li><li><p>Explicit tuning parameters for both scenarios/use cases in order to quickly evaluate your use case for RTM</p></li></ul></li></ul><p><strong>How to use this blog/code:</strong></p><ul><li><p>Real-time mode sizing framework</p><ul><li><p>You can change the upstream spec or</p></li><li><p>Downstream specs through the widgets that expose several tuning parameters </p></li></ul></li><li><p>Learning the ropes of a full pipeline, using foreach in a real-time streaming fashion to write down to a JDBC sink (Lakebase Postgres in our demo)</p></li><li><p>Clone the code from the <a href="https://github.com/EmadRizk-db/kafka-spark-rtm-lakebase">GitHub repo</a> to start your journey!</p></li></ul><p><strong>Compute resources used:</strong></p><ul><li><p>Any supported upstream system by Spark Realtime Mode</p><ul><li><p>In this case, we are using a Kafka topic that we can recreate with n partitions (default 8)</p></li></ul></li><li><p>Databricks classic cluster (no autoscaling)</p><ul><li><p>Driver: rd-fleet.xlarge (32 GB memory, 4 cores)</p></li><li><p>Workers: 8 rd-fleet.xlarge instances</p></li><li><p>DBR 17.3 LTS+</p></li></ul></li><li><p>Lakebase postgres instance with 2CU&#8217;s (<a href="https://docs.databricks.com/aws/en/oltp/projects/reverse-etl#how-it-works">Capacity Units</a>)</p></li></ul><p><strong>The flow Illustration:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wkeo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wkeo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 424w, https://substackcdn.com/image/fetch/$s_!wkeo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 848w, https://substackcdn.com/image/fetch/$s_!wkeo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 1272w, https://substackcdn.com/image/fetch/$s_!wkeo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wkeo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png" width="753" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57cb6043-220a-4206-958e-18df9d775823_753x308.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:753,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61528,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/187423058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wkeo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 424w, https://substackcdn.com/image/fetch/$s_!wkeo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 848w, https://substackcdn.com/image/fetch/$s_!wkeo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 1272w, https://substackcdn.com/image/fetch/$s_!wkeo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cb6043-220a-4206-958e-18df9d775823_753x308.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Important parameters:</strong></p><p>In this framework, we included different parameters to control </p><p>Records generation (Upstream): </p><ul><li><p>Duplication of the same record (including the key) across different timestamps, to simulate a duplication happening upstream</p></li><li><p>Percentage of duplicate records generated</p></li><li><p>Partitions to be created (this will control the slots needed downstream when Spark reads from Kafka upstream in real-time fashion)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!41pW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!41pW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 424w, https://substackcdn.com/image/fetch/$s_!41pW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 848w, https://substackcdn.com/image/fetch/$s_!41pW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 1272w, https://substackcdn.com/image/fetch/$s_!41pW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!41pW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png" width="1456" height="175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:175,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46105,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/187423058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!41pW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 424w, https://substackcdn.com/image/fetch/$s_!41pW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 848w, https://substackcdn.com/image/fetch/$s_!41pW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 1272w, https://substackcdn.com/image/fetch/$s_!41pW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9004ab6-2951-4dad-a6a5-2c2eaa112332_1760x212.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Records persistence (Downstream):</p><ul><li><p>Dedup flag in order to decide a stateful or stateless transformation</p></li><li><p>A batching mechanism to increase throughput</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s4fJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s4fJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 424w, https://substackcdn.com/image/fetch/$s_!s4fJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 848w, https://substackcdn.com/image/fetch/$s_!s4fJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 1272w, https://substackcdn.com/image/fetch/$s_!s4fJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s4fJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png" width="1308" height="174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:174,&quot;width&quot;:1308,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/187423058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s4fJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 424w, https://substackcdn.com/image/fetch/$s_!s4fJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 848w, https://substackcdn.com/image/fetch/$s_!s4fJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 1272w, https://substackcdn.com/image/fetch/$s_!s4fJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F318ece2a-99bd-43e3-ae80-2a3ae5235a2f_1308x174.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p><strong>Key Code Snippets to use/reuse:</strong></p><p>I think the major part that I&#8217;d like to expand on is two things, actually:</p><ul><li><p>defining a Python JDBC-backed code to implement <a href="https://docs.databricks.com/aws/en/structured-streaming/real-time/examples#write-to-postgresql-using-foreachsink">a Foreachwriter</a></p><p>The structure of this implementation requires at least open, process, and close implementations. Additionally, I added a buffer implementation that is time or row counts based in order to control the throughput<br>An implementation skeleton in Python would be as follows (full code available on <a href="https://github.com/EmadRizk-db/kafka-spark-rtm-lakebase">GitHub</a>)<br></p><pre><code>class PgForeachWriter:
    def open(self, partition_id, epoch_id):
        try:
            self.conn = psycopg2.connect(**conn_kwargs)
            self.conn.autocommit = False
            self.cursor = self.conn.cursor()
            
            # ...

            # Buffering state (only used in buffered mode)
            if use_buffered_mode:
                self.buffer = []
                self.last_flush_ts = time.time()
            
            return True
        
        except Exception as e:
            # ...
            return False

    def _flush_if_needed(self, force=False):
        ...

    def process(self, row):
        ...

    def close(self, error):
        ...


def make_pg_buffered_writer(
    jdbc_url,
    jdbc_user,
    jdbc_password,
    jdbc_driver,
    jdbc_jar_path,
    table_name,
    max_batch_size=100,  # flush when &gt;= this many rows (0 = simple mode)
    flush_secs=2.0,      # flush if last flush older than this (0 = simple mode)
):</code></pre></li><li><p>After defining the writer class, all you need is to use it in your writeStream with real-time mode, as shown below</p><pre><code>jdbc_writer = make_pg_buffered_writer(
    jdbc_url=JDBC_URL,
    jdbc_user=JDBC_USER,
    jdbc_password=JDBC_PASSWORD,
    jdbc_driver=JDBC_DRIVER,
    table_name=TABLE_NAME,
    max_batch_size=max_batch_size_param,
    flush_secs=flush_secs_param
)

query = (
    df_for_write
    .writeStream
    .foreach(jdbc_writer)
    .outputMode("update")
    .queryName("jdbc_sink_writer")
    .trigger(realTime="5 minutes")
    .option("checkpointLocation", checkpoint_path)
    .start()
)</code></pre></li></ul><p><strong>Performance measures:</strong></p><p>I ran many scenarios in simulation and consistently sustained around 10k writes/sec over several minutes, with minimal to no delays and no backlog, as shown below. This aligns with a production rate of around 10K upstream writes, which are read in real time by Spark's streaming and written to Lakebase. I was also able to achieve a similar throughput for stateful (deduplication) transformation with no drops </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wR3Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wR3Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 424w, https://substackcdn.com/image/fetch/$s_!wR3Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 848w, https://substackcdn.com/image/fetch/$s_!wR3Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 1272w, https://substackcdn.com/image/fetch/$s_!wR3Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wR3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png" width="784" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2563a30-459b-471e-b135-1bacdb8e93de_784x362.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:784,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27195,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/187423058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wR3Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 424w, https://substackcdn.com/image/fetch/$s_!wR3Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 848w, https://substackcdn.com/image/fetch/$s_!wR3Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 1272w, https://substackcdn.com/image/fetch/$s_!wR3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2563a30-459b-471e-b135-1bacdb8e93de_784x362.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ep-p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ep-p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 424w, https://substackcdn.com/image/fetch/$s_!ep-p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 848w, https://substackcdn.com/image/fetch/$s_!ep-p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 1272w, https://substackcdn.com/image/fetch/$s_!ep-p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ep-p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png" width="762" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:762,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/187423058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ep-p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 424w, https://substackcdn.com/image/fetch/$s_!ep-p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 848w, https://substackcdn.com/image/fetch/$s_!ep-p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 1272w, https://substackcdn.com/image/fetch/$s_!ep-p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e509fd-e80a-4111-8153-dad3d1bd51ea_762x400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!odWR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!odWR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 424w, https://substackcdn.com/image/fetch/$s_!odWR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 848w, https://substackcdn.com/image/fetch/$s_!odWR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 1272w, https://substackcdn.com/image/fetch/$s_!odWR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!odWR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png" width="697" height="300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/edf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:697,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29492,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/187423058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!odWR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 424w, https://substackcdn.com/image/fetch/$s_!odWR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 848w, https://substackcdn.com/image/fetch/$s_!odWR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 1272w, https://substackcdn.com/image/fetch/$s_!odWR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedf2db74-f7cb-4e17-9c48-72d2c4e351c2_697x300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Key considerations:</strong></p><ul><li><p>RTM mode will handle any recs/sec rate when sized right</p></li><li><p>Lakebase will handle writes/sec in the 10&#8217;s K range per CU out of the box</p></li><li><p>That said, there are tuning parameters and things to consider:</p><ul><li><p>To enhance the write throughput, and depending on the use case you can:</p><ul><li><p>Consider <a href="https://docs.databricks.com/aws/en/oltp/projects/about">Lakebase Autoscaling</a></p></li><li><p>Consider batching the writes to downstream (this is generally best practice) to benefit from Postgres batch DML API&#8217;s </p></li><li><p>Batching can be time or number-of-rows-based (example provided in the repo)</p></li></ul></li><li><p>To enhance the RTM read in general:</p><ul><li><p>You can tune your windowing range for stateful transformation</p></li><li><p>Tune the  <a href="https://docs.databricks.com/aws/en/structured-streaming/triggers#trigger-modes-overview">realTime value (default 5 minutes)</a>; this parameter is used for checkpoint commit </p></li><li><p>Increase the number of upstream partitions</p></li></ul></li></ul></li></ul><p><strong>The sweet spot:</strong></p><ul><li><p>For the ideal optimum solution:</p><ul><li><p>Consider right-sizing of:</p><ul><li><p>upstream source and its partitions, as it will control parallelism downstream</p></li><li><p>Spark cluster size makes sure there are enough slots to schedule all stages at once </p></li><li><p>Downstream sizing (in our case, it is Lakebase) and consider autoscaling, more CU&#8217;s = More throughput</p></li></ul></li></ul></li></ul><p><strong>Conclusion:</strong></p><p>While the purpose of this exercise is to demonstrate a full pipeline using Lakebase PostgreSQL as the downstream, and is not benchmarking specifically </p><p>The E2E latency in test is P95 356 MS with a P50 of 225 MS</p><p>These figures are out of the box with no tuning of any sorts, it can be improved further.</p><p>Also, with 8 Kafka topic partitions and an adequate Databricks cluster configuration, there is ~0 lag in real-time streaming read from Kafka and  downstream write to Lakebase</p><p>That said, please note that your mileage will vary! In essence, that:</p><ul><li><p>A clear understanding of records production upstream is key</p></li><li><p>A correct sizing of the Spark cluster and the Downstream OLTP (Lakebase Postgres in our setup) is crucial for throughput</p></li><li><p>There is a trade-off between write capacity and the downstream cluster capacity units</p><ul><li><p>Time or number of rows is a good tuning parameter in such a case Or</p></li><li><p>Autoscaling downstream to scale to more throughput requirements should be considered</p></li></ul></li></ul><p></p><h2>&#128587; Frequently Asked Questions</h2><div><hr></div><p><strong>What is the main goal of this demo?</strong></p><p>It demonstrates how to build a low-latency streaming pipeline from Kafka through Spark Real-Time Mode into Lakebase, while exposing the main tuning knobs that affect throughput and lag.</p><div><hr></div><p><strong>What variables can I change in the simulation?</strong></p><p>You can adjust ingest rate, Kafka partition count, duplicate-record percentage, deduplication behavior, and downstream batching settings.</p><div><hr></div><p><strong>Why does Kafka partition count matter?</strong></p><p>Partition count affects how much parallelism Spark can use when reading the stream. Too few partitions can limit throughput even if the cluster has spare capacity.</p><div><hr></div><p><strong>When should I use buffered writes instead of direct writes?</strong></p><p>Buffered writes are usually better when downstream write throughput is the bottleneck. They reduce per-row overhead and improve efficiency for OLTP-style sinks.</p><div><hr></div><p><strong>Can stateful deduplication still perform well in Real-Time Mode?</strong></p><p>Yes &#8212; based on our test results, the pipeline achieved similar throughput with deduplication enabled, provided the job was sized appropriately.</p><div><hr></div><p><strong>What should I tune first in production?</strong></p><p>Start with upstream partitions, Spark cluster sizing, downstream write capacity, and batching thresholds. Then refine checkpoint cadence and stateful window settings.</p>]]></content:encoded></item><item><title><![CDATA[Migrating Existing Dashboards to Databricks AI/BI, Part 3: User Filters and Row-Level Security with Unity Catalog]]></title><description><![CDATA[How to implement user-based filtering and row-level security using dynamic views, row filters, column masks, and ABAC]]></description><link>https://www.databricksters.com/p/migrating-existing-dashboards-to-3e1</link><guid isPermaLink="false">https://www.databricksters.com/p/migrating-existing-dashboards-to-3e1</guid><dc:creator><![CDATA[Artem Chebotko]]></dc:creator><pubDate>Tue, 14 Apr 2026 15:01:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mMaQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mMaQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mMaQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!mMaQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!mMaQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!mMaQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mMaQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mMaQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!mMaQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!mMaQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!mMaQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa3ac56e-a591-4cbe-8db1-eace395402a7_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As a Specialist Solutions Architect at Databricks, I often hear the same questions from customers who are migrating dashboards from legacy BI tools to Databricks AI/BI Dashboards:</p><ul><li><p><em>&#8220;What&#8217;s the Databricks equivalent of the context filters we use today?&#8221;</em></p></li><li><p><em>&#8220;Can we still do cascading filters where each dropdown only shows relevant values?&#8221;</em></p></li><li><p><em>&#8220;Do you support filter actions when I click on a bar or a point?&#8221;</em></p></li><li><p><em>&#8220;How do we do user-based filtering in AI/BI Dashboards?&#8221;</em></p></li></ul><p>In the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">first blog post in this series</a>, I focused on the first two questions and showed how to recreate:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p>context filters using parameters in dataset SQL, and</p></li><li><p>&#8220;<em>Only Relevant Values</em>&#8221; filters using field filters and query-based parameters.</p></li></ul><p>In the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to-482">second post</a>, I focused on cross-filtering and drill-through interactions.</p><p>This third post tackles the remaining question: &#8220;<em>How do we do user-based filtering in AI/BI Dashboards?</em>&#8221;</p><p>In Databricks, these controls live in <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/">Unity Catalog</a>, not in the dashboard itself. AI/BI Dashboards query governed tables and views, and Unity Catalog enforces fine-grained access control before the data ever reaches the dashboard. I&#8217;ll walk through how to:</p><ul><li><p>Implement user-based filtering and RLS with <a href="https://docs.databricks.com/aws/en/views/dynamic">dynamic views</a> that use <code>current_user()</code> and <code>is_account_group_member()</code>.</p></li><li><p>Apply RLS directly on tables using <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">row filters</a> and protect sensitive fields with <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">column masks</a>.</p></li><li><p>Scale these patterns across many tables and columns using <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/abac/">ABAC tag policies and governed tags</a>.</p></li></ul><p>As in the previous posts, I&#8217;ll use the built-in <code>samples.tpch</code> dataset. I&#8217;ve also published the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a> so you can import it into your workspace, follow along as you read, and adapt these patterns to your own Unity Catalog data.</p><h3><strong>1. How User-Based Filtering Maps to Unity Catalog</strong></h3><p>Before we dive into SQL, it&#8217;s useful to clarify where these responsibilities live in Databricks.</p><p>In many BI tools, user-specific security is often implemented close to the dashboard or semantic layer:</p><ul><li><p>You define user- or group-based rules that map principals to specific regions, customers, or business units.</p></li><li><p>You may use identity-aware logic in filters or calculated fields.</p></li><li><p>You may maintain a security table that drives which slice of data each user can see.</p></li></ul><p>In Databricks, these controls live in <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/">Unity Catalog</a>, not in AI/BI Dashboards:</p><ul><li><p>Object privileges on catalogs, schemas, tables, and views control whether a user can query a given object at all.</p></li><li><p>Dynamic views, row filters, and column masks implement row-level security and masking at query time. They can inspect the current user and their groups and return different rows or values per user.</p></li><li><p>AI/BI Dashboards simply query those governed tables and views. They never bypass Unity Catalog: any row filters or masks you define apply to every query, regardless of whether it comes from a notebook, Databricks SQL, or an AI/BI dashboard.</p></li></ul><p>The result is conceptually similar to user-based filtering in traditional BI tools, but with one important shift: The security rules live with the data, not with a particular dashboard.</p><p>That&#8217;s especially important when:</p><ul><li><p>The same Unity Catalog tables power multiple AI/BI dashboards and external BI tools, and</p></li><li><p>You embed AI/BI Dashboards into applications where thousands of users see the same dashboard definition, but each must see a different subset of data.</p></li></ul><p>In the rest of this post, we&#8217;ll build up from that idea:</p><ol><li><p>Use a <a href="https://docs.databricks.com/aws/en/views/dynamic">dynamic view</a> and a permission table to enforce RLS on a TPCH Sales dataset.</p></li><li><p>Show how to do similar things directly on tables with <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">row filters and column masks</a>.</p></li><li><p>Discuss how to scale those patterns across many tables and columns using <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/abac/">ABAC tag policies and governed tags</a>.</p></li></ol><h3><strong>2. Building blocks: dynamic views, row filters, and column masks</strong></h3><p>To implement user-based filtering in Databricks, you really only need three Unity Catalog primitives: dynamic views, row filters, and column masks.</p><p>They all rely on the same core idea: At query time, Unity Catalog can look at who is running the query (and which groups they&#8217;re in), and then decide which rows and values to return.</p><h4><strong>2.1 Identity functions</strong></h4><p>The main functions you&#8217;ll use in policies are:</p><ul><li><p><code>current_user()</code><br>Returns the current user&#8217;s identity (usually their email).</p></li><li><p><code>is_account_group_member(&#8217;&lt;group_name&gt;&#8217;)</code><br>Returns <code>TRUE</code> if the current user is a member of an account-level group.</p></li></ul><p>You can call these functions from views and from SQL UDFs used by row filters and column masks.</p><h4><strong>2.2 Dynamic views</strong></h4><p>A <a href="https://docs.databricks.com/aws/en/views/dynamic">dynamic view</a> is just a normal SQL view whose logic depends on the current user or their groups.</p><p>You can:</p><ul><li><p>Filter rows based on <code>current_user()</code> or group membership.</p></li><li><p>Mask or null out columns for certain users.</p></li><li><p>Join to a separate permission table that maps users/groups to allowed regions, customers, etc.</p></li></ul><p>Any AI/BI dataset that selects from a dynamic view automatically inherits its logic. You don&#8217;t need to add any special configuration in AI/BI itself.</p><p>We&#8217;ll use a dynamic view for our first <em>TPCH Sales</em> RLS example.</p><h4><strong>2.3 Row filters</strong></h4><p>A <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">row filter</a> attaches RLS logic directly to a table, instead of wrapping the table in a view:</p><ul><li><p>You define a <a href="https://docs.databricks.com/aws/en/udf/unity-catalog">SQL UDF</a> that takes one or more columns as input and returns <code>BOOLEAN</code>.</p></li><li><p>You attach it to a table with <code>ALTER TABLE ... SET ROW FILTER ... ON (column[, ...])</code>.</p></li></ul><p>The row filter runs for every query and can call <code>current_user()</code> and <code>is_account_group_member()</code> internally. This is handy when you want:</p><ul><li><p>A stable table name (no extra view layer), or</p></li><li><p>A single table that&#8217;s consumed by many tools, all of which should respect the same RLS.</p></li></ul><p>We&#8217;ll look at both group-based and <code>current_user()</code> + permission-table examples later in the post.</p><h4><strong>2.4 Column masks</strong></h4><p>A <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">column mask</a> is similar, but operates at the column level:</p><ul><li><p>You define a <a href="https://docs.databricks.com/aws/en/udf/unity-catalog">SQL UDF</a> that returns a &#8220;masked&#8221; value.</p></li><li><p>You attach it with <code>ALTER TABLE ... ALTER COLUMN ... SET MASK ...</code>.</p></li></ul><p>This lets you:</p><ul><li><p>Show full values (for example, email or salary) only to certain groups.</p></li><li><p>Show partially masked or null values to everyone else.</p></li></ul><p>Think of it as the Unity Catalog side of &#8220;row-level security + column-level masking&#8221; that you might combine in legacy BI tools using data source filters and calculated fields.</p><p>Next, we&#8217;ll put these pieces together in a concrete example: enforcing region-based RLS on a <em>TPCH Sales</em> dataset using a combination of a base view, a permission table, and a dynamic view.</p><h3><strong>3. Implementing RLS with a dynamic view</strong></h3><p>Let&#8217;s start with a concrete scenario:</p><ul><li><p>You have a <em>TPCH Sales</em> dashboard shared across multiple sales teams.</p></li><li><p><em>NA Sales Managers</em> should see only <em>AMERICA</em>.</p></li><li><p><em>EMEA Sales Managers</em> should see only <em>EUROPE</em>.</p></li><li><p><em>APAC Sales Managers</em> should see only <em>ASIA</em>.</p></li><li><p>Individual users may have their own custom regions.</p></li></ul><p>In many traditional BI tools, you&#8217;d typically solve this with a user filter or data source filter that maps groups to <em>Regions</em>, and a security table to keep that mapping up to date.</p><p>In Databricks, we&#8217;ll use the same logical pattern &#8211; but move it into Unity Catalog:</p><ol><li><p>Create a base view for <em>TPCH Sales</em> in a demo schema.</p></li><li><p>Create a permission table that maps principals (groups or users) to <em>Regions</em>.</p></li><li><p>Create a dynamic view that joins the base view to the permission table and applies row-level security based on <code>current_user()</code> and <code>is_account_group_member()</code>.</p></li><li><p>Create an AI/BI dataset on top of that dynamic view and build a simple table visualization.</p></li></ol><p>Throughout this section, we&#8217;ll:</p><ul><li><p>Read from <code>samples.tpch</code> (which everyone has).</p></li><li><p>Create objects in <code>main.demo_tpch</code> (you can substitute another catalog/schema if needed).</p></li></ul><p>All of the <code>CREATE</code> / <code>INSERT</code> statements in this section should be run in a notebook or SQL editor, not inside an AI/BI dataset. In AI/BI, you&#8217;ll just <code>SELECT</code> from the resulting view.</p><h4><strong>3.1 </strong><em><strong>TPCH Sales</strong></em><strong> base view</strong></h4><p>First, set up a simple demo schema and define a reusable base view for <em>TPCH Sales</em>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Use the default UC catalog and create a demo schema
USE CATALOG main;
CREATE SCHEMA IF NOT EXISTS demo_tpch;

-- Base TPCH Sales view, reading from samples.tpch
CREATE OR REPLACE VIEW main.demo_tpch.tpch_sales_base AS
SELECT
  r.r_name              AS region,
  n.n_name              AS nation,
  c.c_custkey           AS customer_id,
  c.c_name              AS customer_name,
  o.o_orderkey          AS order_id,
  o.o_orderdate         AS order_date,
  l.l_extendedprice * (1 - l.l_discount) AS revenue
FROM samples.tpch.region   AS r
JOIN samples.tpch.nation   AS n ON n.n_regionkey = r.r_regionkey
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
JOIN samples.tpch.orders   AS o ON o.o_custkey   = c.c_custkey
JOIN samples.tpch.lineitem AS l ON l.l_orderkey  = o.o_orderkey;</code></pre></div><h4><strong>3.2 Region access permission table</strong></h4><p>Next, create a permission table that describes who is allowed to see which <em>Region</em>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Optional: you can keep security tables in the same schema
-- or create a separate one, e.g. main.demo_security
CREATE SCHEMA IF NOT EXISTS main.demo_tpch;

CREATE TABLE IF NOT EXISTS main.demo_tpch.tpch_region_access (
  principal_type STRING,   -- 'group' or 'user'
  principal      STRING,   -- group name or user email
  region         STRING    -- must match tpch_sales_base.region
);

INSERT INTO main.demo_tpch.tpch_region_access VALUES
  ('group', 'NA Sales Managers',        'AMERICA'),
  ('group', 'EMEA Sales Managers',      'EUROPE'),
  ('group', 'APAC Sales Managers',      'ASIA'),
  ('user',  'some.user@databricks.com', 'ASIA'),
  ('user',   current_user(),            'AFRICA');</code></pre></div><p>What this does:</p><ul><li><p>The first three rows grant access based on account-level groups</p></li><li><p>The fourth row grants access to <em>ASIA</em> to a specific user, even if they are not in one of those groups.</p></li><li><p>The fifth row uses <code>current_user()</code> to grant you, the person running this SQL, access to <em>AFRICA</em>. When you execute the <code>INSERT</code>, Unity Catalog evaluates <code>current_user()</code> to your own email.</p></li></ul><p>If you&#8217;re not in any of the <em>NA/EMEA/APAC Sales Managers</em> groups and you&#8217;re not <em>some.user@databricks.com</em>, the only applicable rule for you will be the one that says you can see <em>AFRICA</em>. We&#8217;ll see the effect of that in a moment when we query through the dynamic view.</p><h4><strong>3.3 Dynamic view with region-level RLS</strong></h4><p>Now create a <a href="https://docs.databricks.com/aws/en/views/dynamic">dynamic view</a> that applies row-level security by joining the base <em>TPCH Sales</em> view to the permission table and checking the current user&#8217;s identity and groups:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">CREATE OR REPLACE VIEW main.demo_tpch.tpch_sales_rls AS
SELECT s.*
FROM   main.demo_tpch.tpch_sales_base AS s
WHERE EXISTS (
  SELECT 1
  FROM   main.demo_tpch.tpch_region_access a
  WHERE  a.region = s.region
    AND (
      (a.principal_type = 'group'
       AND is_account_group_member(a.principal))
      OR
      (a.principal_type = 'user'
       AND a.principal = current_user())
    )
);</code></pre></div><p>This view enforces RLS as follows. For each row in <code>tpch_sales_base</code>, it looks for a matching rule in <code>tpch_region_access</code> based on region and either group or user principle. If no matching rule exists for the current user and that region, the row is filtered out.</p><p>You can verify this by running:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT DISTINCT region
FROM main.demo_tpch.tpch_sales_rls;</code></pre></div><p>If you are only granted access via the <code>(&#8217;user&#8217;, current_user(), &#8216;AFRICA&#8217;)</code> row, you should see: <em>AFRICA</em>.</p><p>From now on:</p><ul><li><p>Any query against <code>main.demo_tpch.tpch_sales_rls</code> returns only the regions granted to the current user.</p></li><li><p>This applies uniformly whether the query comes from a notebook, Databricks SQL, or an AI/BI Dashboard.</p></li><li><p>You can add or revoke access simply by inserting or deleting rows in <code>main.demo_tpch.tpch_region_access</code> &#8211; you don&#8217;t need to change the view logic.</p></li></ul><h4><strong>3.4 Using the dynamic view in an AI/BI dataset</strong></h4><p>With <code>main.demo_tpch.tpch_sales_rls</code> in place, using it in AI/BI Dashboards is straightforward. You don&#8217;t need to re-implement any RLS logic in AI/BI &#8211; the dataset just selects from the governed view.</p><p>Create the <em>TPCH Sales (RLS Dynamic View)</em> dataset:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT
  region,
  nation,
  customer_id,
  customer_name,
  order_id,
  order_date,
  revenue
FROM main.demo_tpch.tpch_sales_rls;</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dlAq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dlAq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 424w, https://substackcdn.com/image/fetch/$s_!dlAq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 848w, https://substackcdn.com/image/fetch/$s_!dlAq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 1272w, https://substackcdn.com/image/fetch/$s_!dlAq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dlAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png" width="1456" height="681" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a92013e6-065f-4405-8564-a23013c9a17c_1600x748.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:681,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dlAq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 424w, https://substackcdn.com/image/fetch/$s_!dlAq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 848w, https://substackcdn.com/image/fetch/$s_!dlAq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 1272w, https://substackcdn.com/image/fetch/$s_!dlAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92013e6-065f-4405-8564-a23013c9a17c_1600x748.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Build a simple dashboard page to visualize the dataset. In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, I created a <em>RLS with dynamic view</em> page based on this dataset:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!atyN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!atyN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 424w, https://substackcdn.com/image/fetch/$s_!atyN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 848w, https://substackcdn.com/image/fetch/$s_!atyN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 1272w, https://substackcdn.com/image/fetch/$s_!atyN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!atyN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png" width="1456" height="431" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:431,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!atyN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 424w, https://substackcdn.com/image/fetch/$s_!atyN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 848w, https://substackcdn.com/image/fetch/$s_!atyN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 1272w, https://substackcdn.com/image/fetch/$s_!atyN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F108ce78d-3404-44ae-9cc7-f7e57a4651ab_1600x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you view this page, if your only grant is the row we inserted with <code>(&#8217;user&#8217;, current_user(), &#8216;AFRICA&#8217;)</code>, the table will show only rows where the region is <em>AFRICA</em>.</p><p>The security logic lives in Unity Catalog (dynamic view + permission table). The dashboard just selects from <code>tpch_sales_rls</code> and automatically respects row-level security for each viewer.</p><h3><strong>4. Implementing RLS on tables with row filters and column masks</strong></h3><p>In the previous section, we implemented row-level security for <em>TPCH Sales</em> using a dynamic view and a permission table. That pattern works well when you want a named, shareable view to point AI/BI datasets at.</p><p>Unity Catalog also lets you attach RLS and masking logic directly to tables using:</p><ul><li><p><a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">Row filters</a> &#8211; control which rows a user can access in a table.</p></li><li><p><a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">Column masks</a> &#8211; control what values they see in specific columns.</p></li></ul><p>These policies are evaluated in Unity Catalog at query time and apply to all compute &#8211; SQL warehouses, notebooks, and AI/BI Dashboards. Unlike dynamic views, they keep the table name unchanged, which can be important when the same table is shared across many tools.</p><p>In this section, we&#8217;ll:</p><ol><li><p>Create a <em>TPCH Sales</em> table in <code>main.demo_tpch</code> for row filters and masks.</p></li><li><p>Attach a group-based row filter that restricts Regions.</p></li><li><p>Attach a user-based row filter that uses <code>current_user()</code> and a permission table.</p></li><li><p>Add a column mask to protect a sensitive column.</p></li></ol><p>Any AI/BI dataset that selects from this table will automatically respect these policies. You don&#8217;t need to configure anything special in AI/BI.</p><p>All of the statements below should be run in a notebook or SQL editor. AI/BI Dashboards just query the resulting table.</p><h4><strong>4.1 TPCH Sales table for filters and masks</strong></h4><p>Row filters and column masks only apply to tables (and a few other relation types), not to views. To keep things simple, we&#8217;ll materialize the tpch_sales_base view from Section 3 into a Delta table that we can attach policies to:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Use the same demo catalog and schema as before
USE CATALOG main;
USE SCHEMA demo_tpch;

-- Create a physical table from the base view for row filters and masks
CREATE OR REPLACE TABLE main.demo_tpch.tpch_sales_table AS
SELECT *
FROM main.demo_tpch.tpch_sales_base;</code></pre></div><p>From now on, we&#8217;ll attach row filters and masks to <code>main.demo_tpch.tpch_sales_table</code>.<br>If you point an AI/BI dataset at this table instead of the dynamic view, the behavior will be controlled by these table-level policies.</p><h4><strong>4.2 Group-based row filter on </strong><em><strong>Region</strong></em></h4><p>First, let&#8217;s attach a row filter that enforces the same <em>Region</em> rules we used in the dynamic view, but purely based on account-level groups:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Row filter function that decides which regions each group can see
CREATE OR REPLACE FUNCTION main.demo_tpch.tpch_region_filter(p_region STRING)
RETURNS BOOLEAN
RETURN
  CASE
    WHEN is_account_group_member('NA Sales Managers')   THEN p_region = 'AMERICA'
    WHEN is_account_group_member('EMEA Sales Managers') THEN p_region = 'EUROPE'
    WHEN is_account_group_member('APAC Sales Managers') THEN p_region = 'ASIA'
    ELSE FALSE
  END;</code></pre></div><p>Attach it to the table:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">ALTER TABLE main.demo_tpch.tpch_sales_table
  SET ROW FILTER main.demo_tpch.tpch_region_filter ON (region);</code></pre></div><p>Effect:</p><ul><li><p>Whenever anyone queries <code>main.demo_tpch.tpch_sales_table</code>, Unity Catalog evaluates <code>tpch_region_filter(region)</code> for each row.</p></li><li><p>If the user is in NA Sales Managers, only rows where the region is AMERICA are returned. If they&#8217;re in <em>EMEA Sales Managers</em>, they only see <em>EUROPE</em>; <em>APAC Sales Managers</em> see <em>ASIA</em>.</p></li><li><p>Users not in any of these groups see no rows from this table.</p></li></ul><p>If you point a dataset at:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT
  region,
  nation,
  customer_id,
  customer_name,
  order_id,
  order_date,
  revenue
FROM main.demo_tpch.tpch_sales_table;</code></pre></div><p>Any visualizations based on such a dataset will now respect the group-based <em>Region</em> logic without going through the dynamic view.</p><h4><strong>4.3 User-based row filter with </strong><code>current_user()</code><strong> and a permission table</strong></h4><p>Group-based rules are great for broad roles, but you may need finer control &#8211; different users seeing different subsets of customers, accounts, or regions.</p><p>We can reuse the same pattern as in Section 3 &#8211; <code>current_user()</code> + a permission table &#8211; but this time embed it in a row filter function instead of a dynamic view.</p><p>Example: restrict access by <code>customer_id</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Permission table mapping users to customers they can see
CREATE TABLE IF NOT EXISTS main.demo_tpch.customer_access (
  user_email  STRING,
  customer_id BIGINT
);

-- Example grants
INSERT INTO main.demo_tpch.customer_access VALUES
  ('some.user@databricks.com', 889),
  (current_user(),             1111);  -- Give yourself access</code></pre></div><p>Now create a row filter function that consults this table. To avoid ambiguous name resolution with the <code>customer_id</code> column on the table, we&#8217;ll use a parameter name <code>p_customer_id</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">CREATE OR REPLACE FUNCTION main.demo_tpch.tpch_customer_filter(p_customer_id BIGINT)
RETURNS BOOLEAN
RETURN EXISTS (
  SELECT 1
  FROM   main.demo_tpch.customer_access a
  WHERE  a.user_email  = current_user()
    AND  a.customer_id = p_customer_id
);</code></pre></div><p>Attach it to the same table. Because a table can have only one row filter, we&#8217;ll replace the <em>Region</em> filter from the previous subsection in this example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">ALTER TABLE main.demo_tpch.tpch_sales_table
  DROP ROW FILTER;

ALTER TABLE main.demo_tpch.tpch_sales_table
  SET ROW FILTER main.demo_tpch.tpch_customer_filter ON (customer_id);</code></pre></div><p>Effect:</p><ul><li><p>Whenever someone queries <code>main.demo_tpch.tpch_sales_table</code>, Unity Catalog evaluates <code>tpch_customer_filter(customer_id)</code> for each row.</p></li><li><p>For a given user, only rows whose <code>customer_id</code> appears in <code>main.demo_tpch.customer_access</code><br>for <code>current_user()</code> are returned.</p></li><li><p>In the sample data above, you will only see orders for customer <code>1111</code>, while <code>some.user@databricks.com</code> will see orders for customer <code>889</code>.</p></li></ul><p>You can verify this quickly:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT DISTINCT customer_id
FROM main.demo_tpch.tpch_sales_table
ORDER BY customer_id
LIMIT 20;</code></pre></div><p>If your only mapping is <code>(current_user(), 1111)</code>, this query should return just <code>1111</code>.</p><p>To use this in AI/BI Dashboards, you can point a dataset directly at the table:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT
  region,
  nation,
  customer_id,
  customer_name,
  order_id,
  order_date,
  revenue
FROM main.demo_tpch.tpch_sales_table;</code></pre></div><p>In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, I created a <em>RLS with row filter</em> page based on this dataset:<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pq42!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pq42!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 424w, https://substackcdn.com/image/fetch/$s_!Pq42!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 848w, https://substackcdn.com/image/fetch/$s_!Pq42!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 1272w, https://substackcdn.com/image/fetch/$s_!Pq42!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pq42!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png" width="1456" height="441" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:441,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pq42!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 424w, https://substackcdn.com/image/fetch/$s_!Pq42!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 848w, https://substackcdn.com/image/fetch/$s_!Pq42!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 1272w, https://substackcdn.com/image/fetch/$s_!Pq42!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2322b426-4b34-429a-b7bf-74cb5b9d9dbe_1600x485.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you open it, the table and visuals only show data for customers you are allowed to see according to <code>customer_access</code>, without any RLS logic in the dashboard itself.</p><h4><strong>4.4 Bonus: masking sensitive columns with a column mask</strong></h4><p>Row filters decide which rows a user can see. Sometimes you also need to partially hide sensitive values within those rows &#8211; for example, masking customer names, emails or phone numbers for most users while leaving them fully visible for a small group.</p><p>Unity Catalog column masks handle this at the column level using the same pattern: a SQL UDF that can branch on <code>current_user()</code> or group membership.</p><p>Suppose we want:</p><ul><li><p>Users in a <em>PII Full Access</em> group to see full customer names.</p></li><li><p>Everyone else to see a partially masked version (for example, just the first few characters).</p></li></ul><p>First, define a masking function:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">CREATE OR REPLACE FUNCTION main.demo_tpch.mask_customer_name(name STRING)
RETURNS STRING
RETURN
  CASE
    WHEN is_account_group_member('PII Full Access') THEN name
    ELSE concat(substr(name, 1, 3), '***')
  END;</code></pre></div><p>Attach it as a mask on the table:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">ALTER TABLE main.demo_tpch.tpch_sales_table
  ALTER COLUMN customer_name
  SET MASK main.demo_tpch.mask_customer_name;</code></pre></div><p>Effect:</p><ul><li><p>Users in the <em>PII Full Access</em> group see the full <code>customer_name</code> value.</p></li><li><p>All other users see a masked version like <code>Cus***</code> instead of <code>Customer#000001111</code>.</p></li><li><p>The mask is enforced for every query against <code>tpch_sales_table</code> &#8211; notebooks, SQL editor, and AI/BI Dashboards &#8211; including exports.</p></li></ul><p>Here is what it looks like in the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9SDT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9SDT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 424w, https://substackcdn.com/image/fetch/$s_!9SDT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 848w, https://substackcdn.com/image/fetch/$s_!9SDT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 1272w, https://substackcdn.com/image/fetch/$s_!9SDT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9SDT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png" width="1456" height="519" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:519,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9SDT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 424w, https://substackcdn.com/image/fetch/$s_!9SDT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 848w, https://substackcdn.com/image/fetch/$s_!9SDT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 1272w, https://substackcdn.com/image/fetch/$s_!9SDT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e3b91a0-d851-4081-9a2a-aa550bf2a506_1600x570.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>5. Scaling RLS with ABAC and governed tags</strong></h3><p>Everything we&#8217;ve done so far (dynamic views, row filters, and column masks) is defined directly on individual Unity Catalog objects. That&#8217;s fine for a handful of tables, but it becomes hard to manage when you have dozens of catalogs, hundreds of schemas, and thousands of tables. This is exactly the problem that <a href="https://docs.databricks.com/gcp/en/data-governance/unity-catalog/abac/">attribute-based access control (ABAC)</a> with governed tags is designed to solve in Unity Catalog.</p><p>At a high level, ABAC adds three building blocks on top of the mechanisms we already used:</p><ul><li><p>Governed tags &#8211; account-level tags like <em>sensitivity</em>, <em>business_domain</em>, or <em>region_scope</em>, with a controlled set of allowed values. You attach these tags to catalogs, schemas, tables, or columns.</p></li><li><p>Policy UDFs &#8211; reusable SQL UDFs that implement row-filter or column-mask logic, similar to the functions we wrote earlier, but intended to be reused across many datasets.</p></li><li><p>ABAC policies &#8211; centrally managed policies that say &#8220;when a tagged object matches these conditions, apply this row filter or column mask for these principals.&#8221; Policies can be attached at the catalog, schema, or table level and inherit down the hierarchy.</p></li></ul><p>Databricks recommends using ABAC as the primary way to apply row filters and column masks at scale, and reserving table-by-table configuration for special cases. Conceptually, you can think of ABAC as &#8220;lifting&#8221; the patterns from Section 4 into a central policy layer:</p><ol><li><p>Tag the data once &#8211; apply governed tags to catalogs, schemas, tables, and columns that participate in RLS or masking.</p></li><li><p>Register reusable UDFs &#8211; define shared row-filter and mask functions in a governance schema (for example, <code>governance.region_filter()</code> and <code>governance.mask_customer_name()</code>).</p></li><li><p>Create ABAC policies &#8203;&#8203;&#8211; define policies that attach those UDFs to tagged objects based on tag conditions and target groups.</p></li><li><p>Let tags drive behavior &#8211; as new tables and columns are tagged, the appropriate row filters and masks are applied automatically by Unity Catalog.</p></li></ol><p>From an AI/BI Dashboards perspective, the experience is the same as in Sections 3 and 4: the dataset SQL stays simple, and the dashboard filters and visualizations work as usual. The difference is that the security logic is now centralized in ABAC policies and tags instead of being embedded directly into each table or view.</p><h3><strong>6. Summary and next steps</strong></h3><p>In many BI tools, user-based filtering and row-level security are often implemented close to the dashboard or semantic layer using user/group mappings, security tables, and identity-aware logic. In Databricks, the key shift is that these controls move into <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/">Unity Catalog</a>, and AI/BI Dashboards simply query governed tables and views.</p><p>In this post, we walked through three main patterns:</p><ul><li><p><a href="https://docs.databricks.com/aws/en/views/dynamic">Dynamic views and permission tables</a> (Section 3)<strong><br></strong> We built <code>main.demo_tpch.tpch_sales_rls</code> on top of a base TPCH Sales view and a <code>tpch_region_access</code> table. The dynamic view uses <code>current_user()</code> and <code>is_account_group_member()</code> to return different Regions for different users and groups. AI/BI datasets that query this view automatically inherit the row-level security.</p></li><li><p><a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/">Table-level row filters and column masks</a> (Section 4)<strong><br></strong> We materialized TPCH Sales into <code>main.demo_tpch.tpch_sales_table</code> and attached a row filter that looks up allowed <code>customer_id</code> values in <code>customer_access</code> based on <code>current_user()</code>. We also added a column mask for <code>customer_name</code>, showing full names only to a privileged group. Any dataset that selects from this table sees the combined effect of RLS and masking without any extra logic in the dashboard.</p></li><li><p><a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/abac/">ABAC and governed tags</a> (Section 5)<strong><br></strong> We then zoomed out to show how ABAC can apply the same kinds of row filters and masks at scale, using governed tags, reusable policy UDFs, and central ABAC policies. Instead of configuring each table or view by hand, you tag data once and let policies attach the right filters and masks automatically.</p></li></ul><p>Across all three patterns, the core idea is the same: <strong>AI/BI Dashboards stay simple; Unity Catalog enforces who sees which rows and what values.</strong></p><p>The <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a> brings these ideas together in a concrete, runnable example. If you import it into your workspace and wire it up to the views and tables from this post, you can see exactly how the visuals behave for different users as Unity Catalog applies dynamic views, row filters, and column masks behind the scenes.</p><p>Combined with the first two posts in this series (<a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">Part 1</a> and <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to-482">Part 2</a>), you now have a practical set of patterns for implementing filtering, drill-through, and row-level security in Databricks AI/BI Dashboards on top of Unity Catalog.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Kafka TTL Trap: Updating Spark Streaming Tables Without Data Loss]]></title><description><![CDATA[How to update streaming bronze tables in Spark after your source data has expired.]]></description><link>https://www.databricksters.com/p/beating-kafkas-clock-the-zero-data</link><guid isPermaLink="false">https://www.databricksters.com/p/beating-kafkas-clock-the-zero-data</guid><dc:creator><![CDATA[Neil Wilson]]></dc:creator><pubDate>Tue, 24 Mar 2026 15:01:25 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2ac95c65-630b-48c7-b643-405af3cbe2d8_1376x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR: Updating Streaming Tables Without Data Loss</strong></p><ul><li><p><strong>The Problem:</strong> A &#8220;Full Refresh&#8221; on Kafka-fed pipelines can cause <strong>permanent data loss</strong> if older records have aged out of the topic (TTL).</p></li><li><p><strong>The Strategy:</strong> Use a <strong>&#8220;Backup-and-Rebase&#8221;</strong> workflow: archive existing data, identify the last processed offsets, and point the new pipeline to that exact starting position.</p></li><li><p><strong>The Execution:</strong> This guide demonstrates how to manually configure <code>startingOffsets</code> in Spark to bridge the gap between historical backups and new Kafka data.</p></li><li><p><strong>The Result:</strong> Seamless streaming table updates with zero data loss or record duplication.</p></li></ul><h2>The Constraint: Kafka TTL &amp; Streaming Table Immutability</h2><p>In Spark Declarative Pipelines (SDP), situations may arise when you need to alter a bronze Streaming Table that is being fed by Apache Kafka. This can pose a challenge as you cannot manually alter Streaming Tables via Alter Table commands. <br><br>This is further complicated by the fact that Kafka topics are configured with a finite retention period (Time-to-live or TTL), meaning older records eventually age out. Since a pipeline Full Refresh clears the target Delta Table, you cannot simply update your pipeline definition and full refresh from the source, as older records will be missing. The diagram below illustrates this situation. The Kafka cluster contains user-4 who has already been ingested, and users 5 and 6 who still need to be ingested into our Lakehouse. The older user records (1, 2, and 3) have aged out of the topic&#8217;s TTL.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LMhK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LMhK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 424w, https://substackcdn.com/image/fetch/$s_!LMhK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 848w, https://substackcdn.com/image/fetch/$s_!LMhK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 1272w, https://substackcdn.com/image/fetch/$s_!LMhK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LMhK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png" width="767" height="583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:767,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32107,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!LMhK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 424w, https://substackcdn.com/image/fetch/$s_!LMhK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 848w, https://substackcdn.com/image/fetch/$s_!LMhK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 1272w, https://substackcdn.com/image/fetch/$s_!LMhK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23461c4d-4ca6-4f67-ba66-caf45b929828_767x583.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As new records are constantly being appended to this topic, how can I update my pipeline and alter my result table without missing new records (users 5, 6, etc), dropping old records (users 1, 2, and 3), or duplicating records (user-4)? The following example has been fabricated to show the solution. The actual reason for implementing this will vary by use-case.</p><h2>Example: Ingesting JSON via Kafka</h2><p>Imagine a pipeline is ingesting JSON data from Kafka. This JSON data contains three high-level fields: name, country, and email. It also contains two nested fields &#8220;event&#8221; and &#8220;device&#8221; which contain information about what actions users are taking from which devices. </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;bb338c7a-d588-4351-840e-76569893f3ce&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{&#8221;user&#8221;:&#8221;user-2&#8221;,&#8221;email&#8221;:&#8221;user-2@example.com&#8221;,
&#8220;country&#8221;:&#8221;CA&#8221;,
&#8220;device&#8221;:{&#8221;os&#8221;:&#8221;android&#8221;,&#8221;model&#8221;:&#8221;Pixel 7&#8221;,&#8221;geo&#8221;:{&#8221;lat&#8221;:39.98,&#8221;lon&#8221;:-82.98}},
&#8220;event&#8221;:{&#8221;name&#8221;:&#8221;demo&#8221;,&#8221;seq&#8221;:2,&#8221;ts&#8221;:&#8221;2026-02-17T14:40:26.189Z&#8221;}}</code></pre></div><p>This data is being written to a json_bronze table, and is storing the user, email, and country fields as String and the nested fields as Struct types. It also adds Kafka metadata fields.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;d6daedb6-3323-4952-9560-62e6c1225bf7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import dlt
from pyspark.sql.functions import current_timestamp, col, from_json, expr

SERVERS = "REDACTED"
TOPIC = &#8220;neil_struct_topic&#8221;

# Explicit JSON schema for predictable demos (no schemaLocationKey)
SCHEMA_DDL = &#8220;&#8221;&#8220;
  user STRING,
  email STRING,
  country STRING,
  device STRUCT&lt;os: STRING, model: STRING, geo: STRUCT&lt;lat: DOUBLE, lon: DOUBLE&gt;&gt;,
  event STRUCT&lt;name: STRING, seq: BIGINT, ts: STRING&gt;
&#8220;&#8221;&#8220;

@dlt.table(
    name=&#8221;json_bronze&#8221;,
    comment=&#8221;Raw Kafka payload with explicit JSON schema and rescued data&#8221;,
    table_properties={&#8221;quality&#8221;: &#8220;bronze&#8221;}
)
def json_bronze():
    df = (
        spark.readStream
            .format(&#8221;kafka&#8221;)
            .option(&#8221;kafka.bootstrap.servers&#8221;, SERVERS)
            .option(&#8221;kafka.security.protocol&#8221;, &#8220;SSL&#8221;)
            .option(&#8221;subscribe&#8221;, TOPIC)
            .option(&#8221;startingOffsets&#8221;, &#8220;earliest&#8221;)
            .load()
    )

    parsed = (
        df.selectExpr(&#8221;CAST(value AS STRING) AS json_str&#8221;, &#8220;topic&#8221;, &#8220;partition&#8221;, &#8220;offset&#8221;, &#8220;timestamp&#8221;)
          .select(
              from_json(
                  col(&#8221;json_str&#8221;),
                  SCHEMA_DDL,
                  options={&#8221;rescuedDataColumn&#8221;: &#8220;_rescued_data&#8221;}  # capture type mismatches/new fields
              ).alias(&#8221;data&#8221;),
              &#8220;topic&#8221;, &#8220;partition&#8221;, &#8220;offset&#8221;, &#8220;timestamp&#8221;
          )
          # data.* includes _rescued_data already; do not reselect it to avoid duplicate column error
          .selectExpr(&#8221;data.*&#8221;, &#8220;topic&#8221;, &#8220;partition&#8221;, &#8220;offset&#8221;, &#8220;timestamp AS kafka_timestamp&#8221;)
          .withColumn(&#8221;ingestion_ts&#8221;, current_timestamp())
    )

    return parsed</code></pre></div><p>Below is the current state of our target table. This matches the state of the diagram above. Users 1 through 4 have been ingested into the target Delta table.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yqXK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yqXK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 424w, https://substackcdn.com/image/fetch/$s_!yqXK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 848w, https://substackcdn.com/image/fetch/$s_!yqXK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 1272w, https://substackcdn.com/image/fetch/$s_!yqXK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yqXK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png" width="853" height="200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:853,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41922,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F779260b0-f779-4e54-92d9-9cfc02013122_1171x200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!yqXK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 424w, https://substackcdn.com/image/fetch/$s_!yqXK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 848w, https://substackcdn.com/image/fetch/$s_!yqXK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 1272w, https://substackcdn.com/image/fetch/$s_!yqXK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F626e4ec6-b311-40e7-99cc-2fde4bdec665_853x200.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now let&#8217;s imagine the nature of our topic changes and the device and event fields need to become flexible, allowing for new nested fields to be added anytime and reflected in our target. In the current implementation, these new fields would not appear automatically in our Struct columns.</p><p>One way to allow for flexibility of these nested columns is to update the Struct columns to Variant type. As mentioned above, however, Alter Table is unavailable on a streaming table to update column types. Here&#8217;s how to accomplish a streaming table update while ensuring there is no data-loss or duplication.</p><h4>Step 1: Pause your pipeline</h4><p>Whether your pipeline is continuous or scheduled to run periodically, you don&#8217;t want to be ingesting data while performing these actions. <br><br>Under the scheduled Job:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eG7D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eG7D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 424w, https://substackcdn.com/image/fetch/$s_!eG7D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 848w, https://substackcdn.com/image/fetch/$s_!eG7D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 1272w, https://substackcdn.com/image/fetch/$s_!eG7D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eG7D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png" width="393" height="125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:125,&quot;width&quot;:393,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12269,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/189148173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eG7D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 424w, https://substackcdn.com/image/fetch/$s_!eG7D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 848w, https://substackcdn.com/image/fetch/$s_!eG7D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 1272w, https://substackcdn.com/image/fetch/$s_!eG7D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F236b7239-8e51-48b6-b76f-ba2e657b488c_393x125.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>Step 2: Backup Bronze Table</h4><p>This retains all data we&#8217;ve already ingested (users 1-4), including data that no longer exists in our Kafka topic (users 1-3). We&#8217;ve now ensured we won&#8217;t lose records that have aged out of the source.</p><pre><code><code>CREATE TABLE neil_test_catalog.streaming.json_bronze_backup
SELECT * FROM neil_test_catalog.streaming.json_bronze;</code></code></pre><h4>Step 3: Determine Max Offset per Partition</h4><p>At this point, Kafka has continued to receive new records behind the scenes (user-5 and user-6). We want to ensure that when we Full Refresh our pipeline, we read only these new messages without reprocessing data (user-4). This is achieved by making use of Spark&#8217;s readStream <code>startingOffsets </code>parameter. </p><p>This parameter allows you to specify which offsets Spark should <strong>begin</strong> reading data from, <em>the first time a pipeline runs</em>. Keep in mind that in Kafka, offsets are integers that uniquely identify messages <strong>per partition</strong>, so you&#8217;ll have to specify a starting offset for each partition in your topic. SDP uses these offsets to ensure that upon initial startup, the pipeline begins exactly where you intend. From that point forward, Spark continuously records these offsets in its internal checkpoints to track progress over time and guarantee exactly-once processing. It&#8217;s also a good idea to store this Kafka metadata in the Delta table itself.</p><p>Here&#8217;s the same snapshot of our source topic and target Delta table, with offset information included:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GuUb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuUb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 424w, https://substackcdn.com/image/fetch/$s_!GuUb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 848w, https://substackcdn.com/image/fetch/$s_!GuUb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 1272w, https://substackcdn.com/image/fetch/$s_!GuUb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuUb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png" width="862" height="599" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:599,&quot;width&quot;:862,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46089,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GuUb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 424w, https://substackcdn.com/image/fetch/$s_!GuUb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 848w, https://substackcdn.com/image/fetch/$s_!GuUb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 1272w, https://substackcdn.com/image/fetch/$s_!GuUb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fe5c843-bf29-4786-a1cd-87bd9405bc8d_862x599.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Notice that to begin processing at user-5 we will need to specify our pipeline start at offset 4. The startingOffsets parameter expects this information in the following format:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;059cdf57-c09a-47f0-b447-6a1e51336b9e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">.option(&#8221;startingOffsets&#8221;, &#8216;{&#8221;neil_struct_topic&#8221;:{&#8221;0&#8221;:4}}&#8217;)</code></pre></div><p>If our topic contained multiple partitions, starting offsets must be set for each partition and would look like this for two partitions:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;b9310ebd-3020-42c3-b53b-cc58a02106cd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">.option(&#8221;startingOffsets&#8221;, &#8216;{&#8221;neil_struct_topic&#8221;:{&#8221;0&#8221;:4, &#8220;1&#8221;:6}}&#8217;)
</code></pre></div><p>But how can we find this information?</p><h4>Finding via Metadata Columns (If Defined in Pipeline and Tracked in Delta Table)</h4><p>If you&#8217;ve added Kafka metadata to your bronze table, you can retrieve your max offset per partition there. Remember that for this simple example we only have one Kafka partition. If using this method it&#8217;s important to note that these results show the most recent offset ingested, and +1 must be added to specify where Spark should <strong>start</strong> reading.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;3b7187f9-2c43-4090-9182-4205357e34df&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT partition, MAX(offset)
FROM neil_test_catalog.streaming.json_bronze
GROUP BY partition</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0pFu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0pFu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 424w, https://substackcdn.com/image/fetch/$s_!0pFu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 848w, https://substackcdn.com/image/fetch/$s_!0pFu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 1272w, https://substackcdn.com/image/fetch/$s_!0pFu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0pFu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png" width="688" height="196" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:196,&quot;width&quot;:688,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20135,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0pFu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 424w, https://substackcdn.com/image/fetch/$s_!0pFu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 848w, https://substackcdn.com/image/fetch/$s_!0pFu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 1272w, https://substackcdn.com/image/fetch/$s_!0pFu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b343062-1b38-49b2-b258-106bc89c3a3d_688x196.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>Finding via Spark Declarative Pipelines Checkpoints</h4><p>Another way to retrieve your offset information is to query SDP&#8217;s /checkpoints/ folder that your pipeline uses to track state and progress. For more detailed information on checkpoints and how Spark Structured Streaming achieves exactly-once processing, check out this blog: <a href="https://www.canadiandataguy.com/p/inside-delta-lakes-idempotency-magic">Inside Delta Lake&#8217;s Idempotency Magic: The Secret to Exactly-Once Spark</a>.</p><p>First, find your streaming table&#8217;s storage location via DESCRIBE DETAIL.<br><br>Note: If your table is Unity Catalog-managed, this method requires direct read access to the table&#8217;s managed storage location. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TB3u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TB3u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 424w, https://substackcdn.com/image/fetch/$s_!TB3u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 848w, https://substackcdn.com/image/fetch/$s_!TB3u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 1272w, https://substackcdn.com/image/fetch/$s_!TB3u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TB3u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png" width="1396" height="444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:444,&quot;width&quot;:1396,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105584,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TB3u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 424w, https://substackcdn.com/image/fetch/$s_!TB3u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 848w, https://substackcdn.com/image/fetch/$s_!TB3u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 1272w, https://substackcdn.com/image/fetch/$s_!TB3u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef457b7-710f-4ea9-9742-bb201bd52e8d_1396x444.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Using the location, you can append &#8220;/_dlt_metadata/checkpoints/<strong>your_table_name</strong>/&#8221; to find the most recent streaming query context (the greatest number). For this streaming table the max result is 8 as shown below.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;c72351ae-4363-40a7-bb0b-0e4a9ad1c060&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">path = &#8220;s3://....&#8221;
metadata_path = path + &#8220;/_dlt_metadata/checkpoints/neil_test_catalog.streaming.json_bronze/&#8221;
display(dbutils.fs.ls(metadata_path))</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qpG9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qpG9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 424w, https://substackcdn.com/image/fetch/$s_!qpG9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 848w, https://substackcdn.com/image/fetch/$s_!qpG9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 1272w, https://substackcdn.com/image/fetch/$s_!qpG9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qpG9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png" width="1456" height="694" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:694,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:212970,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25f7e4f-dfb4-45b5-853b-5ba022965d0c_2028x888.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qpG9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 424w, https://substackcdn.com/image/fetch/$s_!qpG9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 848w, https://substackcdn.com/image/fetch/$s_!qpG9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 1272w, https://substackcdn.com/image/fetch/$s_!qpG9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f30f6ae-941e-45fc-bfc9-6908bd30b52f_1810x863.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Within the numbered subfolder under your table name, you will see /offsets/ and /commits/ folders, each also containing numbered folders 0/, 1/, 2/ and so on. These folders represent streaming batches in SDP.</p><ul><li><p><code>offsets/N</code> is a <strong>write&#8209;ahead log entry</strong> written before processing batch N. It stores the <strong>end offsets (high&#8209;water mark)</strong> for that batch &#8212; i.e., &#8220;read up to here&#8221; for each topic/partition.</p></li><li><p><code>commits/N</code> is only written after batch N has finished successfully.</p></li></ul><p>Because an offsets/N entry can exist even if its corresponding commits/N is missing (a batch started but never committed), you should follow these steps to determine where to retrieve your startingOffsets.</p><p>List the batch IDs in both folders:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">offsets_path = path + "/_dlt_metadata/checkpoints/your_table_name/8/offsets/"
commits_path = path + "/_dlt_metadata/checkpoints/your_table_name/8/commits/"
display(dbutils.fs.ls(offsets_path))
display(dbutils.fs.ls(commits_path))</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UwIC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UwIC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 424w, https://substackcdn.com/image/fetch/$s_!UwIC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 848w, https://substackcdn.com/image/fetch/$s_!UwIC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 1272w, https://substackcdn.com/image/fetch/$s_!UwIC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UwIC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png" width="1171" height="579" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:579,&quot;width&quot;:1171,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:149194,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/189148173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434128e2-d030-41e8-823a-07ccf1b63a36_1171x579.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!UwIC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 424w, https://substackcdn.com/image/fetch/$s_!UwIC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 848w, https://substackcdn.com/image/fetch/$s_!UwIC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 1272w, https://substackcdn.com/image/fetch/$s_!UwIC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484104d7-6833-422a-81b1-a33bc63d21e7_1171x579.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let:</p><ul><li><p><code>commit_batches</code> = all numeric batch IDs under <code>commits/</code></p></li><li><p><code>max_commit_batch</code> = largest value in <code>commit_batches</code></p></li></ul><p>Use <code>max_commit_batch</code> as the last fully committed batch, and read the matching offsets file:</p><p><strong>Always</strong> read <code>offsets/max_commit_batch</code> to get the correct <code>startingOffsets</code> JSON.</p><p>Ignore any higher batch ID that appears only under <code>offsets/</code> but not <code>commits/</code>. That batch started but never finished, so if you treat its offsets as your starting position, Spark will behave as if that data has already been read and will <strong>skip</strong> it rather than processing it.</p><p>In my example, both the /offsets/ and /commits/ folders contain only batches 0 and 1, so using <code>max_commit_batch,</code> we read startingOffsets from /offsets/1.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;5e16f2a8-9c27-4b1a-b3ae-e66ef265589f&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">metadata_path = path + &#8220;/_dlt_metadata/checkpoints/neil_test_catalog.streaming.json_bronze/8/offsets/1/&#8221;
display(dbutils.fs.head(metadata_path))</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;45573231-4ab2-4ab6-b675-b5b5cdd1ed08&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{&#8221;batchWatermarkMs&#8221;:0,&#8221;batchTimestampMs&#8221;:1771439553201,&#8221;conf&#8221;:{...}}
{&#8221;neil_struct_topic&#8221;:{&#8221;0&#8221;:4}}</code></pre></div><p>Notice the final line is our starting partition:offset information in the exact format we created manually above. Spark tracks the next offset to read, so you do not need to increment +1 via this method. Again, if our topic had multiple partitions it might look like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;d18d4bc3-3a9f-4d23-afb3-bbc85e9cf333&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{&#8221;neil_struct_topic&#8221;:{&#8221;0&#8221;:4, &#8220;1&#8221;:6}}</code></pre></div><h4>Step 4: Update pipeline definition</h4><p>Now it&#8217;s time to apply the actual logic changes that prompted this process. For my example, I add variant support to table properties:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">table_properties={&#8221;quality&#8221;: &#8220;bronze&#8221;, &#8220;delta.feature.variantType-preview&#8221;: &#8220;supported&#8221;}</code></pre></div><p>And cast the Struct columns to Variant:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">return (
        parsed
            .withColumn(&#8221;event&#8221;, expr(&#8221;parse_json(to_json(event))&#8221;))
            .withColumn(&#8221;device&#8221;, expr(&#8221;parse_json(to_json(device))&#8221;))
        )</code></pre></div><h4>Step 5: Full Refresh Table</h4><p>With pipeline logic updated, it&#8217;s time to run with Full refresh to wipe the target table and ingest the new Kafka records starting at our specified offsets. The resulting table will contain only the records we did not back up via Step 2.</p><p>To do this, we run our pipeline with full refresh after adding our startingOffsets (line 28). Here is the final pipeline definition.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;6c55857d-d0f1-47b3-a53d-d6592eae648e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">.option(&#8221;startingOffsets&#8221;, &#8216;{&#8221;neil_struct_topic&#8221;:{&#8221;0&#8221;:4}}&#8217;)</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;152abf3a-b98c-4704-9e92-8f5622e075cb&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import dlt
from pyspark.sql.functions import current_timestamp, col, from_json, expr

SERVERS = &#8220;REDACTED&#8221;
TOPIC = &#8220;neil_struct_topic&#8221;

# Explicit JSON schema for predictable demos (no schemaLocationKey)
SCHEMA_DDL = &#8220;&#8221;&#8220;
  user STRING,
  email STRING,
  country STRING,
  device STRUCT&lt;os: STRING, model: STRING, geo: STRUCT&lt;lat: DOUBLE, lon: DOUBLE&gt;&gt;,
  event STRUCT&lt;name: STRING, seq: BIGINT, ts: STRING&gt;
&#8220;&#8221;&#8220;

@dlt.table(
    name=&#8221;json_bronze&#8221;,
    comment=&#8221;Raw Kafka payload with explicit JSON schema and rescued data&#8221;,
    table_properties={&#8221;quality&#8221;: &#8220;bronze&#8221;, &#8220;delta.feature.variantType-preview&#8221;: &#8220;supported&#8221;}
)
def json_bronze():
    df = (
        spark.readStream
            .format(&#8221;kafka&#8221;)
            .option(&#8221;kafka.bootstrap.servers&#8221;, SERVERS)
            .option(&#8221;kafka.security.protocol&#8221;, &#8220;SSL&#8221;)
            .option(&#8221;subscribe&#8221;, TOPIC)
            .option(&#8221;startingOffsets&#8221;, &#8216;{&#8221;neil_struct_topic&#8221;:{&#8221;0&#8221;:4}}&#8217;)
            .load()
    )

    parsed = (
        df.selectExpr(&#8221;CAST(value AS STRING) AS json_str&#8221;, &#8220;topic&#8221;, &#8220;partition&#8221;, &#8220;offset&#8221;, &#8220;timestamp&#8221;)
          .select(
              from_json(
                  col(&#8221;json_str&#8221;),
                  SCHEMA_DDL,
                  options={&#8221;rescuedDataColumn&#8221;: &#8220;_rescued_data&#8221;}  # capture type mismatches/new fields
              ).alias(&#8221;data&#8221;),
              &#8220;topic&#8221;, &#8220;partition&#8221;, &#8220;offset&#8221;, &#8220;timestamp&#8221;
          )
          # data.* includes _rescued_data already; do not reselect it to avoid duplicate column error
          .selectExpr(&#8221;data.*&#8221;, &#8220;topic&#8221;, &#8220;partition&#8221;, &#8220;offset&#8221;, &#8220;timestamp AS kafka_timestamp&#8221;)
          .withColumn(&#8221;ingestion_ts&#8221;, current_timestamp())
    )

    return (
        parsed
            .withColumn(&#8221;event&#8221;, expr(&#8221;parse_json(to_json(event))&#8221;))
            .withColumn(&#8221;device&#8221;, expr(&#8221;parse_json(to_json(device))&#8221;))
        )</code></pre></div><p>On the pipeline page:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A22d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A22d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 424w, https://substackcdn.com/image/fetch/$s_!A22d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 848w, https://substackcdn.com/image/fetch/$s_!A22d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 1272w, https://substackcdn.com/image/fetch/$s_!A22d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A22d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png" width="396" height="219" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:219,&quot;width&quot;:396,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33568,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/189148173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A22d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 424w, https://substackcdn.com/image/fetch/$s_!A22d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 848w, https://substackcdn.com/image/fetch/$s_!A22d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 1272w, https://substackcdn.com/image/fetch/$s_!A22d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F287e92dd-2a47-4cc9-b03f-612e38730a73_396x219.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Here&#8217;s the result: user-5 and user-6 as expected:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2vh_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2vh_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 424w, https://substackcdn.com/image/fetch/$s_!2vh_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 848w, https://substackcdn.com/image/fetch/$s_!2vh_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 1272w, https://substackcdn.com/image/fetch/$s_!2vh_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2vh_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png" width="909" height="127" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:127,&quot;width&quot;:909,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24239,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2543971d-6d72-4e6c-b276-43fc19af5dff_1176x127.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2vh_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 424w, https://substackcdn.com/image/fetch/$s_!2vh_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 848w, https://substackcdn.com/image/fetch/$s_!2vh_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 1272w, https://substackcdn.com/image/fetch/$s_!2vh_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5471779-6f11-42d8-8d0b-1b4d8c58c3f6_909x127.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>Step 6: Insert Historical Records</h4><p>To complete our intended result table, insert historical data from the backup table, ensuring the data matches the new table format (cast columns, etc.):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;591001a1-3fec-4a05-9d0b-6af6a7eb9d12&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">INSERT INTO neil_test_catalog.streaming.json_bronze
SELECT user, email, country, to_variant_object(device), to_variant_object(event), _rescued_data, topic, partition, offset, kafka_timestamp, ingestion_ts
FROM neil_test_catalog.streaming.json_bronze_backup</code></pre></div><p>Our final result. Your pipeline may be resumed.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;95091137-7fc2-4fac-b81c-bc3bd107ffb9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">user:string
email:string
country:string
device:variant
event:variant
_rescued_data:string
topic:string
partition:integer
offset:long
kafka_timestamp:timestamp
ingestion_ts:timestamp</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x3gO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x3gO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 424w, https://substackcdn.com/image/fetch/$s_!x3gO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 848w, https://substackcdn.com/image/fetch/$s_!x3gO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 1272w, https://substackcdn.com/image/fetch/$s_!x3gO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x3gO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png" width="948" height="224" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:224,&quot;width&quot;:948,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61678,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/188168915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8eb96d1-3eb6-40c1-8a4a-9a7c91c5513f_1303x224.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!x3gO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 424w, https://substackcdn.com/image/fetch/$s_!x3gO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 848w, https://substackcdn.com/image/fetch/$s_!x3gO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 1272w, https://substackcdn.com/image/fetch/$s_!x3gO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d3d7f02-263e-49ae-a838-8cc8826ac96c_948x224.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>Step 7: Revert startingOffsets</h4><p>To avoid pipeline failures in the case of a future Full Refresh, revert startingOffsets to its prior value.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;693848c8-f57d-4c01-b5cf-9214847e70ea&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">.option(&#8221;startingOffsets&#8221;, &#8220;earliest&#8221;)</code></pre></div><h3>Frequently Asked Questions</h3><p><strong>Q: Why can&#8217;t I just run a standard Full Refresh on the pipeline?</strong> </p><p><strong>A:</strong> A Full Refresh clears the target table and re-reads the source from the beginning. If your Kafka topic has a retention policy (TTL), data that has "aged out" of the topic will be permanently lost because it no longer exists in the source to be re-read.</p><p><strong>Q: Why is </strong><code>startingOffsets</code><strong> necessary if I have a backup?</strong> </p><p><strong>A:</strong> While the backup saves your history, <code>startingOffsets</code> ensures your pipeline resumes reading <em>exactly</em> where the backup stopped. Without this explicit instruction, the pipeline might default to "earliest" (reading only what remains in Kafka, creating a gap) or "latest" (skipping data that arrived during the maintenance window).</p><p><strong>Q: Is this process required for "Append-Only" tables?</strong> </p><p><strong>A:</strong> Generally, yes, if you need to restructure the existing table. If you are only adding new columns that are nullable, you might rely on schema evolution, but fundamental type changes usually require the table to be rewritten.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Cutting Token Costs Reaches the Renaissance]]></title><description><![CDATA[A Lakebase Powered Solution for Enforcing Token Budgets, Now with Fewer Sharp Edges]]></description><link>https://www.databricksters.com/p/cutting-token-costs-reaches-the-renaissance</link><guid isPermaLink="false">https://www.databricksters.com/p/cutting-token-costs-reaches-the-renaissance</guid><dc:creator><![CDATA[Austin]]></dc:creator><pubDate>Tue, 17 Mar 2026 14:02:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7sNy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Back in October I published a blog called <a href="https://www.databricksters.com/p/getting-medieval-on-token-costs">Getting Medieval on Token Costs</a>. The code and strategy I provided worked, but as the title implied it was rough around the edges. How rough? Well&#8230;</p><ul><li><p>The API calls to the FMs were synchronous, so QPS would have been Medieval indeed</p></li><li><p>The Lakebase instance was provisioned, so it would always be accruing costs even without usage</p></li><li><p>There was no UI, so you or your admin would be spending hours fiddling with thousand line SQL queries for enterprise use cases</p></li></ul><p>But no matter, we&#8217;ve had a Renaissance!</p><p><a href="https://github.com/azaccor/token-rate-limiter">The repo</a> got three meaningful updates and a handful of smaller ones that collectively move this from being technically functional to something a medium enterprise team might actually want to use on Databricks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7sNy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7sNy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7sNy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7sNy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7sNy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7sNy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg" width="1456" height="1532" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1532,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11842916,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/190565336?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7sNy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7sNy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7sNy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7sNy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13774bb4-e2c9-4cf2-a151-17fe03cd9b73_5786x6090.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Money Changer and His Wife - Quentin Matsys, 1514 oil-on-panel</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databricksters.com/subscribe?"><span>Subscribe now</span></a></p><h3><strong>Quick</strong> <strong>Refresher</strong></h3><p>The original solution uses a custom MLflow model serving endpoint as a proxy between your users and whatever foundation model they&#8217;re calling. Before the request hits the FM, the endpoint checks two Lakebase tables: one for the user&#8217;s token limit and another for how many tokens they&#8217;ve already burned through. If they&#8217;re over budget, the request ends up like John the Baptist in the cover image. If not, it goes through to the FM and the usage is written back to Lakebase along with the response and remaining balance.</p><h3><strong>Change</strong> <strong>1:</strong> <strong>Autoscaling</strong> <strong>Lakebase</strong></h3><p>The original code used a Provisioned Lakebase instance because that was the only option at the time, but it&#8217;s going away and we have something better. Autoscaling Lakebase. </p><p>Swapping to Autoscaling Lakebase means you can set minimum and maximum scaling bands, and if you don&#8217;t need a high availability instance, it will also allow you to scale to zero during times of no use. </p><p>This is the smallest change architecturally, but it&#8217;s nice not to pay for compute we don&#8217;t need.</p><h3><strong>Change</strong> <strong>2:</strong> <strong>ResponsesAgent</strong> <strong>+</strong> <strong>Async</strong> <strong>FM</strong> <strong>Calls</strong></h3><p>The original code used <code>mlflow.pyfunc.PythonModel</code> and called the FM endpoint via <code>requests.post()</code>, which is synchronous and blocking. Only one request can be handled at a time per unit of concurrency. Which meant the endpoint that was supposed to help you manage costs via budgeting is throttling your throughput instead. While I suppose that is one way to reduce token costs, it&#8217;s not very useful.</p><p>The new version replaces the PythonModel with a ResponsesAgent and swaps <code>requests</code> for <code>httpx.AsyncClient</code> inside an <code>async def predict_stream()</code>. Now multiple FM calls can be in flight simultaneously and the serving endpoint isn&#8217;t waiting on one user&#8217;s 20-second Claude response before it can look at the next request in the queue.</p><p>The core logic now lives in a standalone <code>rate_limiter_agent.py</code> decoupled from the notebook. The public API is much cleaner:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;2ec01639-67a8-440b-bc77-590a75edefd6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">  agent = TokenRateLimiterAgent(
      db_config={...},
      workspace_client=WorkspaceClient(),
      endpoint_name="ep-your-endpoint",
      group_members={
          "data-science-team": ["andrea@company.com", "john@company.com"],
      },
  )

  # Before calling the FM:
  quota = agent.check_quota("andrea@company.com", "databricks-claude-sonnet-4-5")
  if not quota["allowed"]:
      # Return 429 or block the request
      ...

  # After the FM call completes:
  agent.log_usage(
      user_name="andrea@company.com",
      model_name="databricks-claude-sonnet-4-5",
      prompt_tokens=1200,
      completion_tokens=350,
      request_id="req-abc123",
  )
</code></pre></div><p>You can drop this into any existing pipeline without touching the notebook, which certainly helps if you&#8217;re integrating this into something that already has its own serving infrastructure.</p><h3><strong>Change</strong> <strong>3:</strong> <strong>An</strong> <strong>Actual</strong> <strong>Frontend</strong></h3><p>The original had no management UI, which meant you had to set limits by writing manual SQL queries for any new change to your budgeting policy. Not very convenient. </p><p>The new repo ships a full Databricks App: a React + FastAPI application that deploys alongside your serving endpoint and gives administrators a no-code interface for setting granular budgets. How granular you ask? Any combination of:</p><ul><li><p>A user, service principal, or group</p></li><li><p>Calling any FM, list of FMs, or across all FMs in the workspace</p></li><li><p>That resets every X hours, days, weeks, months, or never</p></li><li><p>Limited to a specified count of tokens or dollars</p><ul><li><p>Pre-populates token costs from Databricks documentation, but manually editable in case this changes or you have some kind of secret discount I don&#8217;t know about</p></li><li><p>This is another nice quality of life feature since tokens are not all created equally; GPT OSS 20B tokens cost about 100x less than GPT 5.4 tokens</p></li></ul></li></ul><p>The drop-downs auto-populate users, SPs, and groups as well as the Databricks Foundation Models.</p><p>It also comes with a handy monitoring dashboard so you can see usage over time, your top consumers, and the most popular models.</p><p>The App authenticates to Lakebase via a native Postgres role with a static password stored in Databricks Secrets, so there&#8217;s no OAuth token refresh to manage.</p><h3><strong>An</strong> <strong>Honest</strong> <strong>Conclusion</strong></h3><p>Is this production-grade for an org running thousands of concurrent end users? Maybe not. You might consider mini-batching requests at scale, but there will still be some amount of cost tracking overhead, and this gets more difficult at scale.</p><p>Is this production-grade for most actual enterprise teams who want to stop their power users from accidentally burning through their monthly token budget in a week? Yes. I think this solution really shines when you have dozens to hundreds of daily active users who might get greedy on Opus requests without some budget enforcement. </p><p>But don&#8217;t take my word for it; check it out for yourself. The code <a href="https://github.com/azaccor/token-rate-limiter">lives here</a> and setup instructions are in the README.</p><p>Cheers and happy coding.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Migrating Existing Dashboards to Databricks AI/BI, Part 2: Filter Actions, Cross-Filtering, and Drill-Through]]></title><description><![CDATA[How to connect visuals, enable cross-filtering, and drill into details in Databricks AI/BI Dashboards]]></description><link>https://www.databricksters.com/p/migrating-existing-dashboards-to-482</link><guid isPermaLink="false">https://www.databricksters.com/p/migrating-existing-dashboards-to-482</guid><dc:creator><![CDATA[Artem Chebotko]]></dc:creator><pubDate>Tue, 10 Mar 2026 15:02:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gLl3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gLl3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gLl3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!gLl3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!gLl3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!gLl3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gLl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gLl3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!gLl3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!gLl3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!gLl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4b6aea-ff9b-4eb0-a60e-9e2338aefa0e_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As a Specialist Solutions Architect at Databricks, I often hear the same questions from customers who are migrating dashboards from legacy BI tools to Databricks AI/BI Dashboards:</p><ul><li><p><em>&#8220;What&#8217;s the Databricks equivalent of the context filters we use today?&#8221;</em></p></li><li><p><em>&#8220;Can we still do cascading filters where each dropdown only shows relevant values?&#8221;</em></p></li><li><p><em>&#8220;Do you support filter actions when I click on a bar or a point?&#8221;</em></p></li><li><p><em>&#8220;How do we do user-based filtering in AI/BI Dashboards?&#8221;</em></p></li></ul><p>In the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">first blog post in this series</a>, I focused on the first two questions and showed how to recreate:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p>context filters using parameters in dataset SQL, and</p></li><li><p>&#8220;<em>Only Relevant Values</em>&#8221; filters using field filters and query-based parameters.</p></li></ul><p>This blog tackles the third question: &#8220;<em>How do we replace filter actions from existing dashboards when we click on a bar/segment/point?</em>&#8221;</p><p>In many BI tools, this behavior is configured as a <strong>filter action</strong> (a click on a mark filters other views). In Databricks AI/BI Dashboards, the equivalent interactivity is split into two built-in features:</p><ul><li><p><strong>Cross-filtering</strong>, where clicking a mark in one chart filters other charts on the same page that use the same dataset.</p></li><li><p><strong>Drill-through</strong>, where right-clicking a mark opens a target page filtered to that selection.</p></li></ul><p>Once you combine cross-filtering and drill-through with the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">context and cascading patterns</a>, you can reproduce most real-world filter-action workflows.</p><p>As before, I&#8217;ll use the built-in <code>samples.tpch</code> dataset. I&#8217;ve also published the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, so you can follow along and inspect the configurations yourself.</p><h3><strong>Recap: TPCH Sales dataset</strong></h3><p>To keep examples concrete, we&#8217;ll use the TPCH sample data that ships with Databricks in the <code>samples.tpch</code> schema. I&#8217;ll reuse the same base dataset, <em>TPCH Sales</em>, from the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">first blog post</a>, which joins tables <code>region</code>, <code>nation</code>, <code>customer</code>, <code>orders</code>, and <code>lineitem</code>, and computes revenue:</p><pre><code>SELECT
  r.r_name              AS region,
  n.n_name              AS nation,
  c.c_custkey           AS customer_id,
  c.c_name              AS customer_name,
  o.o_orderkey          AS order_id,
  o.o_orderdate         AS order_date,
  l.l_extendedprice * (1 - l.l_discount) AS revenue
FROM samples.tpch.region   AS r
JOIN samples.tpch.nation   AS n ON n.n_regionkey = r.r_regionkey
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
JOIN samples.tpch.orders   AS o ON o.o_custkey   = c.c_custkey
JOIN samples.tpch.lineitem AS l ON l.l_orderkey  = o.o_orderkey;</code></pre><p>In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, this is the <em>TPCH Sales</em> dataset. It and its derivatives are used by multiple pages:</p><ul><li><p><em>Context filter</em></p></li><li><p><em>Cascading filters with field filters</em></p></li><li><p><em>Cascading filters with query-based parameters</em></p></li><li><p><em>Cross-filtering </em>(new in this post)</p></li><li><p><em>Drill-through details </em>(new in this post)</p></li><li><p>and others</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZQnz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZQnz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 424w, https://substackcdn.com/image/fetch/$s_!ZQnz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 848w, https://substackcdn.com/image/fetch/$s_!ZQnz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 1272w, https://substackcdn.com/image/fetch/$s_!ZQnz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZQnz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png" width="1456" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dff3310d-8eba-4213-b83e-e5244c972269_1600x740.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZQnz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 424w, https://substackcdn.com/image/fetch/$s_!ZQnz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 848w, https://substackcdn.com/image/fetch/$s_!ZQnz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 1272w, https://substackcdn.com/image/fetch/$s_!ZQnz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdff3310d-8eba-4213-b83e-e5244c972269_1600x740.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Filter actions vs. cross-filtering and drill-through</strong></h3><p>Before we build anything, it helps to align vocabulary.</p><p>In many traditional BI tools, filter actions are configured explicitly:</p><ul><li><p>You specify one or more source sheets.</p></li><li><p>You specify one or more target sheets.</p></li><li><p>You choose which fields are passed as filters.</p></li><li><p>You choose what happens when the selection is cleared.</p></li></ul><p>In Databricks AI/BI Dashboards, cross-filtering is implicit. You don&#8217;t turn it on in the visualization panel.</p><ul><li><p>It is automatically applied to supported visualization types that use the same dataset.</p></li><li><p>When you click a bar, slice, or point, AI/BI adds a filter based on that value and re-runs all other visualizations on the page that share the dataset.</p></li></ul><p>In Databricks AI/BI Dashboards, drill-through is also implicit, but slightly more structured:</p><ul><li><p>When you right-click a supported chart type, AI/BI shows <em>Drill to &#8594; &lt;target page&gt;</em> if there is another page in the dashboard that uses the same dataset.</p></li><li><p>The target page opens with all visuals based on that dataset filtered to the selected segment, and any compatible filters on that dataset are auto-populated.</p></li></ul><p>Conceptually:</p><ul><li><p>Cross-filtering &#8776; a within-page filter action.</p></li><li><p>Drill-through &#8776; a navigation filter action (summary &#8594; details).</p></li></ul><p>The rest of the post walks through how to configure your pages so these implicit behaviors &#8220;just work&#8221;.</p><h3><strong>1. Recreating within-page filter actions with cross-filtering</strong></h3><p>A very common dashboard pattern is:</p><ul><li><p>A summary bar chart (for example, total revenue by nation).</p></li><li><p>One or more supporting charts (for example, revenue share by region, or a donut chart for mix).</p></li><li><p>A filter action so clicking a bar filters the other visuals.</p></li></ul><p>In AI/BI Dashboards, this becomes cross-filtering on top of the <em>TPCH Sales</em> dataset.</p><h4><strong>1.1. When cross-filtering is applied</strong></h4><p>Cross-filtering is applied automatically when all of the following are true:</p><ul><li><p>The visualizations are on the same page.</p></li><li><p>The visualizations use the same dataset (for example, <em>TPCH Sales</em>).</p></li><li><p>The visualization type is one of the supported chart types: <em>Bar</em>, <em>Box plot</em>, <em>Heatmap</em>, <em>Histogram</em>, <em>Pie</em>, <em>Scatter</em>, or <em>Point map</em>.</p></li></ul><p>If those conditions are met, there is nothing to enable. Clicks on supported charts become filters for any other visualizations on the page that use that dataset.</p><h4><strong>1.2. Build the </strong><em><strong>Cross-filtering</strong></em><strong> page</strong></h4><p>In your AI/BI dashboard, add a page named <em>Cross-filtering</em>.</p><p>On this page:</p><ol><li><p>Add a bar chart: <em>Revenue by nation</em></p><ul><li><p>Visualization: <em>Bar</em></p></li><li><p>Dataset = <em>TPCH Sales</em></p></li><li><p>X axis: <code>nation</code></p></li><li><p>Y axis: <code>SUM(revenue)</code></p></li></ul></li><li><p>Add a pie: <em>Revenue by region</em></p><ul><li><p>Visualization: <em>Pie</em></p></li><li><p>Dataset = <em>TPCH Sales</em></p></li><li><p>Slice by (Color): <code>region</code></p></li><li><p>Value (Angle): <code>SUM(revenue)</code></p></li></ul></li><li><p>Add filters (optional)</p><ul><li><p>Add a <em>Region</em> filter on <code>TPCH Sales.region</code></p></li><li><p>Add a <em>Nation</em> filter on <code>TPCH Sales.nation</code></p></li></ul></li></ol><p>These are just standard filter widgets. There is no &#8220;cross-filtering&#8221; toggle anywhere in the configuration.</p><p>In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, this page is already built for you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aM9L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aM9L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 424w, https://substackcdn.com/image/fetch/$s_!aM9L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 848w, https://substackcdn.com/image/fetch/$s_!aM9L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 1272w, https://substackcdn.com/image/fetch/$s_!aM9L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aM9L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png" width="1456" height="582" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aM9L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 424w, https://substackcdn.com/image/fetch/$s_!aM9L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 848w, https://substackcdn.com/image/fetch/$s_!aM9L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 1272w, https://substackcdn.com/image/fetch/$s_!aM9L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7030f297-722f-4ebe-b522-b20ff0bc75b3_1600x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>1.3. Use cross-filtering on the charts</strong></h4><p>Try the following workflow:</p><ol><li><p>Select two regions, <em>AFRICA</em> and <em>ASIA</em>, on the <em>Revenue by region</em> chart.</p></li></ol><blockquote></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xwfZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xwfZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 424w, https://substackcdn.com/image/fetch/$s_!xwfZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 848w, https://substackcdn.com/image/fetch/$s_!xwfZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 1272w, https://substackcdn.com/image/fetch/$s_!xwfZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xwfZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png" width="1456" height="586" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:586,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xwfZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 424w, https://substackcdn.com/image/fetch/$s_!xwfZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 848w, https://substackcdn.com/image/fetch/$s_!xwfZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 1272w, https://substackcdn.com/image/fetch/$s_!xwfZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc420da58-ce61-418f-8cbb-35d3494b7e3e_1600x644.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol start="2"><li><p>Click the bar for <em>JAPAN</em> in the <em>Revenue by nation</em> chart.</p></li></ol><blockquote></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N1P1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N1P1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 424w, https://substackcdn.com/image/fetch/$s_!N1P1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 848w, https://substackcdn.com/image/fetch/$s_!N1P1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 1272w, https://substackcdn.com/image/fetch/$s_!N1P1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N1P1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png" width="1456" height="586" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:586,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N1P1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 424w, https://substackcdn.com/image/fetch/$s_!N1P1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 848w, https://substackcdn.com/image/fetch/$s_!N1P1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 1272w, https://substackcdn.com/image/fetch/$s_!N1P1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f12352e-c307-47a3-8cd6-e76126084dd5_1600x644.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Because cross-filtering is automatic for supported charts that share a dataset:</p><ul><li><p>AI/BI adds filters <em>Nation: JAPAN</em> and <em>Region: AFRICA, ASIA</em> to the <em>TPCH Sales</em> dataset for this page.</p></li><li><p>Both charts get updated accordingly.</p></li></ul><p>You can <em>Reset all to default</em> and try a different filter combination.</p><p>From a migration perspective, this answers a common question: &#8220;<em>Can clicking a bar automatically update the rest of the dashboard in AI/BI?</em>&#8221; Yes &#8211; when charts share a dataset and use supported visualization types, cross-filtering works implicitly with no additional configuration.</p><h3><strong>2. Recreating across-page filter actions with drill-through</strong></h3><p>Another classic <em>summary &#8594; detail</em> filter-action pattern is:</p><ul><li><p>A summary view (for example, revenue by nation).</p></li><li><p>A detail view (for example, individual orders).</p></li><li><p>A filter action that passes the selected value into the detail sheet as a filter.</p></li></ul><p>In AI/BI Dashboards, this is implemented as drill-through.</p><h4><strong>2.1. How drill-through is applied</strong></h4><p>Drill-through shows up as a right-click option when several conditions are satisfied.</p><ul><li><p>The source chart is a supported type: <em>Bar</em>, <em>Box plot</em>, <em>Heatmap</em>, <em>Histogram</em>, <em>Pie</em>, <em>Scatter</em>, or <em>Point map</em>.</p></li><li><p>There is at least one target page in the same dashboard where:</p><ul><li><p>At least one visualization uses the same dataset as the source chart.</p></li><li><p>The field you click on has a compatible filter or column on the target page.</p></li></ul></li></ul><p>In recent AI/BI releases, drill-through no longer requires an explicit target filter; any visualization based on the same dataset as the source selection is filtered automatically, and filters (if they exist) are populated with the drilled values.</p><p>There is no drill-through toggle in the widget side panel. Once the above conditions are true, AI/BI surfaces <em>Drill to &#8594; &lt;page name&gt;</em> in the context menu automatically.</p><h4><strong>2.2. Build the </strong><em><strong>Drill-through details</strong></em><strong> page</strong></h4><p>In the same dashboard, add another page named <em>Drill-through details</em>.</p><p>On this page:</p><ol><li><p>Add a detail table visualization</p><ul><li><p>Visualization: <em>Table</em></p></li><li><p>Dataset = <em>TPCH Sales</em></p></li><li><p>Columns: <em>region</em>, <em>nation</em>, <em>customer_id</em>, <em>customer_name</em>, <em>order_id</em>, <em>order_date</em>, <em>revenue</em></p></li></ul></li><li><p>Add filters (optional)</p><ul><li><p>Add a <em>Region</em> filter on <code>TPCH Sales.region</code></p></li><li><p>Add a <em>Nation</em> filter on <code>TPCH Sales.nation</code></p></li><li><p>Add a <em>Customer</em> filter on <code>TPCH Sales.customer_id</code></p></li></ul></li></ol><p>Again, there is no special drill-through configuration here &#8211; just a normal page that uses the same dataset.</p><p>In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, this page is already built for you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d5DG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d5DG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 424w, https://substackcdn.com/image/fetch/$s_!d5DG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 848w, https://substackcdn.com/image/fetch/$s_!d5DG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 1272w, https://substackcdn.com/image/fetch/$s_!d5DG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d5DG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png" width="1456" height="648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d5DG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 424w, https://substackcdn.com/image/fetch/$s_!d5DG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 848w, https://substackcdn.com/image/fetch/$s_!d5DG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 1272w, https://substackcdn.com/image/fetch/$s_!d5DG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87d8d760-8221-47f5-a7d5-b994033d2da3_1504x669.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>2.3. Drill from summary to details</strong></h4><p>Go back to the <em>Cross-filtering</em> page and right-click a bar in the <em>Revenue by nation</em> chart (for example, <em>UNITED STATES</em>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4-aE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4-aE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 424w, https://substackcdn.com/image/fetch/$s_!4-aE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 848w, https://substackcdn.com/image/fetch/$s_!4-aE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 1272w, https://substackcdn.com/image/fetch/$s_!4-aE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4-aE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png" width="1456" height="820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4-aE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 424w, https://substackcdn.com/image/fetch/$s_!4-aE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 848w, https://substackcdn.com/image/fetch/$s_!4-aE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 1272w, https://substackcdn.com/image/fetch/$s_!4-aE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe7d29c2-1a06-494c-822a-2152aefea12c_1502x846.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After clicking <em>Drill to &#8594; Drill-through details</em>, AI/BI opens the <em>Drill-through details</em> page and filters all visualizations based on <em>TPCH Sales</em> to <em>Nation: UNITED STATES</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QvbJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QvbJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 424w, https://substackcdn.com/image/fetch/$s_!QvbJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 848w, https://substackcdn.com/image/fetch/$s_!QvbJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 1272w, https://substackcdn.com/image/fetch/$s_!QvbJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QvbJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png" width="1456" height="646" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QvbJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 424w, https://substackcdn.com/image/fetch/$s_!QvbJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 848w, https://substackcdn.com/image/fetch/$s_!QvbJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 1272w, https://substackcdn.com/image/fetch/$s_!QvbJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4964f70f-2bfb-4050-8762-aba6cf0081c3_1501x666.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From the user&#8217;s perspective, this feels almost identical to a filter action in traditional dashboards that navigates from a summary sheet to a detailed sheet with the selected country carried over.</p><h3><strong>Cross-filtering vs. drill-through vs. filter widgets</strong></h3><p>Across this blog and the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">first blog</a>, you now have three main interaction tools in AI/BI Dashboards:</p><ol><li><p>Filter widgets</p><ul><li><p>Context filters, cascading filters, and query-based parameters.</p></li><li><p>Best for primary, always-visible controls like <em>Region</em>, <em>Date</em>, <em>Product</em>.</p></li></ul></li><li><p>Cross-filtering</p><ul><li><p>Click data points in a supported visualization to filter other charts on the page that use the same dataset.</p></li><li><p>Best for ad-hoc exploration, answering questions like &#8220;<em>What happens if I focus only on this region?</em>&#8221; and &#8220;<em>Which nations are driving that spike?</em>&#8221;.</p></li></ul></li><li><p>Drill-through</p><ul><li><p>Right-click a mark to open another page already filtered to that selection.</p></li><li><p>Best for guided summary-to-detail flows where you don&#8217;t want to cram everything onto one page.</p></li></ul></li></ol><p>A simple migration rule of thumb:</p><ul><li><p>Use filter widgets to rebuild the core filter panels from your existing dashboards.</p></li><li><p>Use cross-filtering to automatically filter other visualizations on the page based on chart interactions.</p></li><li><p>Use drill-through to replace &#8220;go to sheet&#8221; filter actions and connect high-level KPIs to detail pages.</p></li></ul><h3><strong>Summary and what&#8217;s next</strong></h3><p>In this second blog post in the series, we answered: &#8220;<em>Do you support filter actions when I click on a bar or a point?</em>&#8221;</p><p>The short answer is <em>yes</em>:</p><ul><li><p><strong>Cross-filtering</strong> lets viewers click on supported charts to filter all other visualizations on the same page that share a dataset &#8211; no configuration required.</p></li><li><p><strong>Drill-through</strong> lets viewers right-click a mark and open another page where visuals on the same dataset are already filtered to the selected values, and any matching filters are pre-populated.</p></li></ul><p>Combined with the patterns from the <a href="https://www.databricksters.com/p/migrating-existing-dashboards-to">first blog post</a> &#8211; context filters and cascading &#8220;<em>Only Relevant Values</em>&#8221; filters &#8211; you now have a robust toolkit for recreating the interactive filtering experience your users expect from traditional dashboards inside Databricks AI/BI Dashboards.</p><p>The <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a> now includes: a context filter page, a field-based cascading page, a query-based cascading page, a cross-filtering page, and a drill-through details page. You can import it into your workspace and adapt the patterns to your own datasets.</p><p>In the next post in this series, I&#8217;ll tackle the remaining migration question: &#8220;<em>How do we do user-based filtering and row-level security in AI/BI Dashboards?</em>&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Ingest Postgres into your LakeHouse with LakeFlow]]></title><description><![CDATA[Get the configuration json to customize your pipelines.]]></description><link>https://www.databricksters.com/p/ingest-postgres-into-your-lakehouse</link><guid isPermaLink="false">https://www.databricksters.com/p/ingest-postgres-into-your-lakehouse</guid><dc:creator><![CDATA[Nishant Deshpande]]></dc:creator><pubDate>Tue, 03 Mar 2026 06:27:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zPJJ!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff49ecae-7c56-403c-9389-61b28de6a50f_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Databricks Lakeflow Connect can sync multiple postgres databases to the Lakehouse. This post shows how to set up a single pipeline with multiple source databases and multiple target UC schemas, and specify compute size to minimize costs. </p><p>The first section explains how replication from Postgres works, and how to configure it. The second part show how to configure Lakeflow pipelines in your Databricks workspace to ingest from Postgres.</p><div><hr></div><h2><strong>How PostgreSQL logical replication works</strong></h2><p>PostgreSQL writes every change (insert, update, delete) to the <strong>Write-Ahead Log (WAL)</strong> before applying it. Logical replication decodes that WAL stream into a human-readable format that consumers can subscribe to.</p><p>Three objects are involved:</p><h3><strong>1. WAL level</strong></h3><p>Logical replication requires <code>wal_level = logical</code>. On AWS RDS, set <code>rds.logical_replication = 1</code> in your parameter group (requires a reboot).</p><h3><strong>2. Publication</strong></h3><p>A <strong>publication</strong> is a named filter over which tables to expose for replication. It lives inside a single database.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;3059aeb3-7c20-4360-a422-78c8d94fbc27&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">-- Replicate specific tables
CREATE PUBLICATION mfg_paloalto_cdc_pub FOR TABLE iot1.devices, iot1.alerts;

-- Or replicate all tables in the database
CREATE PUBLICATION mfg_paloalto_cdc_pub FOR ALL TABLES;</code></pre></div><p>One publication per database is typical. If you have multiple PostgreSQL databases on the same server, each needs its own publication.</p><h3><strong>3. Replication slot</strong></h3><p>A <strong>replication slot</strong> is a cursor into the WAL. PostgreSQL retains WAL segments until the slot consumer confirms it has processed them (<code>confirmed_flush_lsn</code>). This guarantees the consumer never misses a change, even if it disconnects temporarily.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;31068f44-7df5-482e-8177-9a6ed914b9f7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT pg_create_logical_replication_slot(&#8217;databricks_mfg_paloalto_slot&#8217;, &#8216;pgoutput&#8217;);</code></pre></div><p>Key facts about slots:</p><ul><li><p><strong>Slots are server-wide</strong> &#8212; the name must be unique across the entire PostgreSQL instance, not just within a database.</p></li><li><p><strong>One slot per consumer</strong> &#8212; a slot can only be consumed by one connection at a time. If a second pipeline tries to use the same slot, it will be rejected.</p></li><li><p><strong>Unacknowledged WAL accumulates</strong> &#8212; if a pipeline is stopped and not consuming, the slot holds back WAL. Monitor <code>pg_replication_slots.lag</code> to avoid disk pressure.</p></li></ul><div><hr></div><h2><strong>Replication user</strong></h2><p>Create a dedicated user with replication privileges and read access to the tables being replicated:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6677e7f2-5b27-4245-afe9-59ccbdc01902&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">CREATE USER databricks_replication WITH PASSWORD &#8216;...&#8217; REPLICATION;

-- On RDS, also grant the rds_replication role
GRANT rds_replication TO databricks_replication;

GRANT CONNECT ON DATABASE mydb TO databricks_replication;
GRANT USAGE ON SCHEMA iot1 TO databricks_replication;
GRANT SELECT ON ALL TABLES IN SCHEMA iot1 TO databricks_replication;</code></pre></div><div><hr></div><h2><strong>Replica identity</strong></h2><p>For CDC updates and deletes to include the old row values (needed to identify which row changed), each table needs a replica identity:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;427b12c5-544b-407d-b649-8e29f2ce3526&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">-- Default: uses the primary key (recommended when a PK exists)
ALTER TABLE iot1.devices REPLICA IDENTITY DEFAULT;

-- Full: includes all columns (required when there is no PK)
ALTER TABLE iot1.devices REPLICA IDENTITY FULL;</code></pre></div><div><hr></div><h2><strong>One slot per consumer, not per catalog</strong></h2><p>The Databricks docs suggest creating one replication slot per source catalog. A more precise rule is: <strong>one replication slot per consumer (ingestion pipeline)</strong>.</p><p>If you want to replicate the same database to two different workspaces, or run multiple pipelines in a test environment, each needs its own slot. Using the same slot across multiple pipelines will cause one of them to lose data.</p><div><hr></div><h2><strong>Multiple databases on the same server</strong></h2><p>PostgreSQL publications are per-database, but replication slots are server-wide. If you have two databases (<code>mfg_paloalto</code> and <code>mfg_austin</code>) on the same server, create two replication slots with different names.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bb44048e-92a8-4b52-bde9-64c9b0165bfc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Server
&#9500;&#9472;&#9472; Database: mfg_paloalto
&#9474;   &#9492;&#9472;&#9472; Publication: mfg_paloalto_cdc_pub        (per-database)
&#9500;&#9472;&#9472; Database: mfg_austin
&#9474;   &#9492;&#9472;&#9472; Publication: mfg_austin_cdc_pub           (per-database)
&#9474;
&#9500;&#9472;&#9472; Slot: databricks_mfg_paloalto_slot             (server-wide, for mfg_paloalto DB)
&#9492;&#9472;&#9472; Slot: databricks_mfg_austin_slot               (server-wide, for mfg_austin DB)</code></pre></div><div><hr></div><h2><strong>Databricks side</strong></h2><p>Three entities need to be created in order. Below shows the json sent using the Databricks CLI.</p><h3><strong>1. Unity Catalog connection</strong></h3><p>The connection is at the <strong>server level</strong> &#8212; no database name. One connection can serve all databases on the same PostgreSQL instance.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;852b7347-c5b2-4757-81e1-e8fc30ed4cf8&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">// connection.json
{
  &#8220;name&#8221;: &#8220;my-postgres-connection&#8221;,
  &#8220;connection_type&#8221;: &#8220;POSTGRESQL&#8221;,
  &#8220;options&#8221;: {
    &#8220;host&#8221;: &#8220;myinstance.abc123.us-east-1.rds.amazonaws.com&#8221;,
    &#8220;port&#8221;: &#8220;5432&#8221;,
    &#8220;user&#8221;: &#8220;databricks_replication&#8221;,
    &#8220;password&#8221;: &#8220;&lt;password&gt;&#8221;
  }
}</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;b8da2a4c-83a7-43a9-a627-bcfa18d6e1ed&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks connections create --json connection.json</code></pre></div><p>A single connection can be used to ingest from multiple databases, as long as the user has permissions.</p><h3><strong>2. Gateway pipeline</strong></h3><p>The gateway pipeline runs <strong>continuously</strong> on a classic cluster. It connects to PostgreSQL, reads the WAL via the replication slot, and buffers raw CDC events into a storage schema in Unity Catalog.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;ebf3cf64-8e7c-4aac-aa98-9a6a12cdbbe5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">// gateway.json
{
  &#8220;name&#8221;: &#8220;my-postgres-gateway&#8221;,
  &#8220;catalog&#8221;: &#8220;my_catalog&#8221;,
  &#8220;schema&#8221;: &#8220;cdc_gateway_storage&#8221;,
  &#8220;channel&#8221;: &#8220;CURRENT&#8221;,
  &#8220;continuous&#8221;: true,
  &#8220;gateway_definition&#8221;: {
    &#8220;connection_name&#8221;: &#8220;my-postgres-connection&#8221;,
    &#8220;gateway_storage_catalog&#8221;: &#8220;my_catalog&#8221;,
    &#8220;gateway_storage_schema&#8221;: &#8220;cdc_gateway_storage&#8221;
  }
  &#8220;clusters&#8221;: [
    {
      &#8220;label&#8221;: &#8220;default&#8221;,
      &#8220;driver_node_type_id&#8221;: &#8220;r5.xlarge&#8221;,
      &#8220;node_type_id&#8221;: &#8220;r5.xlarge&#8221;,
      &#8220;autoscale&#8221;: {
        &#8220;min_workers&#8221;: 1,
        &#8220;max_workers&#8221;: 5
      }        
    }
  ]
}</code></pre></div><p>Setting min_workers = max_workers = 0 will give you a driver-only cluster, if your CDC stream is low volume.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;03cd3ba8-b050-4e82-bee4-85568611f253&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks pipelines create --json gateway.json</code></pre></div><p>Note the <code>pipeline_id</code> returned &#8212; it is needed for the ingestion pipeline.</p><h3><strong>3. Ingestion pipeline</strong></h3><p>The ingestion pipeline reads buffered events from the gateway and writes them as Delta tables. It runs on serverless and is <strong>triggered</strong> (not continuous).</p><p><code>source_configurations</code> maps each source database to its replication slot and publication. (Unfortunately the postgres source database is referred to as <code>catalog.source_catalog</code> which is a little confusing.) <code>objects</code> controls which schemas or tables to replicate and where they land.</p><p>The json below creates a single pipeline from two source schemas in different databases (<code>mfg_paloalto and</code> <code>mfg_austin</code>). Notice how the slot names correspond to the right database.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;8ceef6b3-a122-4fcb-a88d-3e2c758037ba&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">// ingestion.json
{
  &#8220;name&#8221;: &#8220;my-postgres-ingestion&#8221;,
  &#8220;catalog&#8221;: &#8220;my_catalog&#8221;,
  &#8220;schema&#8221;: &#8220;mfg_ingest_metadata&#8221;,
  &#8220;ingestion_definition&#8221;: {
    &#8220;ingestion_gateway_id&#8221;: &#8220;&lt;gateway-pipeline-id&gt;&#8221;,
    &#8220;source_type&#8221;: &#8220;POSTGRESQL&#8221;,
    &#8220;connection_name&#8221;: null,
    &#8220;objects&#8221;: [
      {
        &#8220;schema&#8221;: {
          &#8220;source_catalog&#8221;: &#8220;mfg_paloalto&#8221;,
          &#8220;source_schema&#8221;: &#8220;iot1&#8221;,
          &#8220;destination_catalog&#8221;: &#8220;my_catalog&#8221;,
          &#8220;destination_schema&#8221;: &#8220;mfg_paloalto_iot1&#8221;
        }
      },
      {
        &#8220;schema&#8221;: {
          &#8220;source_catalog&#8221;: &#8220;mfg_austin&#8221;,
          &#8220;source_schema&#8221;: &#8220;iot1&#8221;,
          &#8220;destination_catalog&#8221;: &#8220;my_catalog&#8221;,
          &#8220;destination_schema&#8221;: &#8220;mfg_austin_iot1&#8221;
        }
      }
    ],
    &#8220;source_configurations&#8221;: [
      {
        &#8220;catalog&#8221;: {
          &#8220;source_catalog&#8221;: &#8220;mfg_paloalto&#8221;,
          &#8220;postgres&#8221;: {
            &#8220;slot_config&#8221;: {
              &#8220;slot_name&#8221;: &#8220;databricks_mfg_paloalto_slot&#8221;,
              &#8220;publication_name&#8221;: &#8220;mfg_paloalto_cdc_pub&#8221;
            }
          }
        }
      },
      {
        &#8220;catalog&#8221;: {
          &#8220;source_catalog&#8221;: &#8220;mfg_austin&#8221;,
          &#8220;postgres&#8221;: {
            &#8220;slot_config&#8221;: {
              &#8220;slot_name&#8221;: &#8220;databricks_mfg_austin_slot&#8221;,
              &#8220;publication_name&#8221;: &#8220;mfg_austin_cdc_pub&#8221;
            }
          }
        }
      }
    ]
  }
}</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;90560eec-f3e3-4a6c-93bf-a097991181c0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks pipelines create --json ingestion.json</code></pre></div><p>The top level catalog.schema (<code>my_catalog.mfg_ingest_metadata</code> above) hold event logs and checkpoints for all the pipelines.</p><p>The above create command also returns a pipeline id <code>ingestion-pipeline-id</code>) which can be used to trigger the pipeline.</p><h3><strong>Triggering a run</strong></h3><p>After creation, trigger an initial snapshot + CDC run:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;b0dc0c33-a706-4c1a-9241-1dcd302e01cb&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks pipelines start-update &lt;ingestion-pipeline-id&gt;</code></pre></div><p>The gateway pipeline should run continually. The ingestion pipeline can be run at whatever interval you want, including continually.</p><div><hr></div><h2><strong>Summary checklist</strong></h2><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;ed027d86-4255-4919-b6db-7ae3653d4ab0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Summary checklist

Step                    Scope                        Notes
&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;
wal_level = logical     Server                       Requires reboot on RDS
Replication user        Server                       One user can serve all databases
REPLICA IDENTITY        Per table                    Set before creating the publication
Publication             Per database                 One per database is typical
Replication slot        Per database, server-wide    One slot per ingestion pipeline
                        name
Network access          Server                       Allow Databricks cluster egress IP on port 5432</code></pre></div>]]></content:encoded></item><item><title><![CDATA[Ingest to Your Lakehouse Without Kafka or Kinesis | Zerobus]]></title><description><![CDATA[How Databricks Zerobus Replaces Your Message Bus With a Single Endpoint]]></description><link>https://www.databricksters.com/p/zero-infra-no-brokers-how-zerobus</link><guid isPermaLink="false">https://www.databricksters.com/p/zero-infra-no-brokers-how-zerobus</guid><dc:creator><![CDATA[Yashodhan]]></dc:creator><pubDate>Wed, 25 Feb 2026 14:04:16 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/188988485/40da827937828b2f9178293f37637727.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3><strong>Complex Ingestion Architecture</strong></h3><p>Today&#8217;s data teams face a common challenge: streaming data from applications to their Lakehouse requires maintaining complex infrastructure. The typical setup involves managing a message bus like Kafka, configuring connectors, monitoring pipelines, and dealing with significant operational overhead and costs&#8212;all just to move data from point A to point B.</p><h3><strong>Managed Bus Don&#8217;t Solve Everything</strong></h3><p>While <strong>Amazon Managed Streaming for Apache Kafka(Amazon MSK) </strong>removes the burden of managing servers, it doesn&#8217;t eliminate your responsibility for the message bus itself. You&#8217;re still on the hook for capacity planning, topic and partition design, producer and consumer tuning, monitoring and alerting, and upgrade timing. Managed services make these tasks less manual, but upgrades remain risky and the operational complexity persists.</p><p>Cost management is another pain point. <strong>Amazon Managed Streaming for Apache Kafka(Amazon MSK)</strong> bills can balloon quickly due to over-provisioned brokers, excess partitions, high replication factors, and long retention periods. AWS manages the infrastructure, but not your spending discipline&#8212;that&#8217;s still your problem.</p><h3><strong>What You Actually Need</strong></h3><p>A fully abstracted streaming service that eliminates cluster management entirely, letting you focus on building data products instead of babysitting message bus infrastructure.</p><h3><strong>The Challenge</strong></h3><p>A rapidly growing organization was processing massive device data volumes from Go applications. After essential first-level processing, they needed to stream data to their data lake for near real-time analytics. To avoid the complexity of managing Kafka or similar message bus infrastructure, they took a shortcut: direct writes to their data warehouse with append-only inserts.</p><p>Initially simple, this approach quickly hit walls. As volumes grew, they vertically scaled, then horizontally distributed producers across multiple warehouse instances. Small but relentless queries created network bottlenecks. Excessive delta commits from numerous producers killed throughput. They hit soft limits on connections and write operations. To keep data flowing, they over-provisioned compute&#8212;watching costs balloon without proportional gains.</p><h3><strong>The Solution</strong></h3><p>Zerobus Ingest provided the purpose-built ingestion layer they needed. Their Go applications integrated the SDK with minimal code changes&#8212;same append-only pattern, properly architected. The Write-Ahead Logging (WAL) based system handled buffering and batching, while automatic recovery managed network issues that previously caused data loss.</p><p>Data now lands directly in Delta tables, eliminating the warehouse intermediary. The result: lower latency, higher throughput, dramatically reduced costs, and one less system to manage. They got the simplicity of direct writes with the scalability of proper streaming&#8212;at a fraction of Kafka&#8217;s cost.</p><div class="pullquote"><h2><em><strong>Zerobus Ingest</strong></em></h2></div><h3><strong>Overview</strong></h3><p>Zerobus Ingest is a fully managed, zero-configuration service that enables record-by-record data ingestion directly into Delta tables. No more intermediate message buses. No more complex configurations. Just point your application at an endpoint and start sending data. The Zerobus Ingest API buffers transmitted data before adding it to a Delta table. This buffering creates an efficient and durable ingestion mechanism that supports a high volume of clients with variable throughput.</p><p><em>Before:</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eimt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eimt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 424w, https://substackcdn.com/image/fetch/$s_!eimt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 848w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1272w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eimt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png" width="1452" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0668090b-f758-4033-855b-49012749eceb_1452x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:1452,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:190418,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/182907350?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!eimt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 424w, https://substackcdn.com/image/fetch/$s_!eimt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 848w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1272w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>After:</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!riic!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!riic!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 424w, https://substackcdn.com/image/fetch/$s_!riic!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 848w, https://substackcdn.com/image/fetch/$s_!riic!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1272w, https://substackcdn.com/image/fetch/$s_!riic!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!riic!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png" width="1450" height="554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166939,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/182907350?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!riic!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 424w, https://substackcdn.com/image/fetch/$s_!riic!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 848w, https://substackcdn.com/image/fetch/$s_!riic!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1272w, https://substackcdn.com/image/fetch/$s_!riic!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Features</strong></h3><p>Zerobus Ingest leverages a Write Ahead Log (WAL) architecture that enables it to store and acknowledge accepted records quickly, delivering low write latency for your applications. The system is backed by persistent disk storage where both the Write Ahead Log (WAL) and checkpoints are maintained, enabling several powerful capabilities:</p><p><strong>Automatic Recovery</strong> - Network issues are handled transparently by the SDK. It automatically reconnects on transient failures and resends unacknowledged records without requiring any application-level error handling code.</p><p><strong>Efficient Resource Management</strong> - Once data syncs successfully to Delta tables, Zerobus Ingest automatically cleans up Write Ahead Log (WAL) logs and metadata, freeing disk space for new data without manual intervention.</p><p><strong>Schema Management</strong> - Automatic validation against your Delta table schema catches data quality issues at ingestion time, preventing malformed data from entering your Lakehouse.</p><div class="pullquote"><h2><strong>Usage</strong></h2><h6><em><strong>Implement Zerobus</strong></em></h6></div><h3><strong>SDKs</strong></h3><p>Users will interact with Zerobus Ingest through a dedicated SDK for their language of choice. The documentation and samples are out for <a href="https://github.com/databricks/zerobus-sdk-py">Python SDK</a>, <a href="https://github.com/databricks/zerobus-sdk-rs">Rust SDK</a> and <a href="https://github.com/databricks/zerobus-sdk-java">Java SDK </a>. Both the <a href="https://github.com/databricks/zerobus-sdk-go">Go</a> and <a href="https://www.npmjs.com/package/@databricks/zerobus-ingest-sdk">TypeScript</a> SDKs for Zerobus Ingest are now publicly available. GRPC is the main communication mechanism for Zerobus Ingest.</p><p>Databricks documentation contains a <a href="https://docs.databricks.com/aws/en/ingestion/zerobus-ingest">well documented guide</a> with sample clients in multiple languages. It guides you right from installing the SDK in your preferred language to creating a Protobuf definition and a sample usage.</p><h3><strong>Supported Formats</strong></h3><ul><li><p><strong>Protocols</strong>: gRPC (primary), HTTP REST, Kafka wire format (coming soon)</p></li><li><p><strong>Data Formats</strong>: Protocol Buffers, JSON</p></li></ul><h3><strong>TIPs</strong></h3><ul><li><p>Visit the table history on UC to get a sense of how frequently the table is updated</p></li><li><p>Handle the two exceptions gracefully <em><strong>NonRetriableException, ZerobusException</strong></em>.</p></li><li><p>Even though Zerobus Ingest periodically issues data file compactions, so you don&#8217;t need to worry about the small files</p></li><li><p>Don&#8217;t forget to create a table with appropriate data types before you run the client</p></li></ul><div class="pullquote"><h2><strong>Zerobus Ingest Deep Dive</strong></h2><h6><em><strong>While the experience is simple, the engineering is sophisticated!</strong></em></h6></div><h3><strong>Components</strong></h3><ol><li><p><strong>Zerobus Ingest Server - </strong>Think of them as scalable stateful pod on K8s attached with an SSD disk(high IOPS). Its responsibilities include:</p><ul><li><p>Schema validation of the message to the table.</p></li><li><p>Materializing the data in a timely manner to the target table.</p></li><li><p>Sending an acknowledgement to the client that the data is durable.</p></li></ul></li><li><p><strong>Smart Networking and orchestration - </strong>API proxy which distributes the streams to Zerobus Ingest servers per the target delta table and scales pods as the utilization nears the roof</p></li><li><p><strong>Delta kernel -</strong> Record batch writer kicks off every 1-5 seconds, uses Delta kernel(uses Arrow) and writes the record batch to the delta table. Kicks in the PO compaction to avoid small files. <a href="https://github.com/delta-io/delta-kernel-rs">Rust APIs</a> hides all the complex details of the Delta protocol specification. Binding available for <a href="https://github.com/delta-io/delta-rs">python</a>.</p></li><li><p><strong>Write-Ahead Log(WAL) -</strong> Records are immediately persisted to durable storage(think SSD disks with high IOPS) provided by the cloud platform your databricks is running on and is acknowledged in under 50ms. This guarantees durability even if something fails</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uH8W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uH8W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 424w, https://substackcdn.com/image/fetch/$s_!uH8W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 848w, https://substackcdn.com/image/fetch/$s_!uH8W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!uH8W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uH8W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png" width="1456" height="658" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:658,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:743931,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/188988485?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uH8W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 424w, https://substackcdn.com/image/fetch/$s_!uH8W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 848w, https://substackcdn.com/image/fetch/$s_!uH8W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!uH8W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe158800b-6d97-4644-b4c8-0b80360f08d0_2886x1304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Zerobus Ingest fits direct lakehouse writes with durable acknowledgments (no bus-style retention/multi-consumer). Zerobus Ingest it not a replacement of message bus in all scenarios. If you need message bus durability/retention or multiple subscribers, Event Hubs/Kafka is likely a safer choice.</p><h3><strong>Availability</strong></h3><p>Databricks only support single availability zone (single AZ) durability. This means Zerobus Ingest service may experience downtime. This might change soon.</p><h3><strong>Kafka still wins when</strong></h3><p>Despite the cost advantages of Zerobus Ingest Ingest, Kafka remains a better choice in following scenarios:</p><p><strong>Exactly-once semantics requirements</strong> - For financial transactions, order processing, or other workflows where duplicate processing could cause serious issues, Kafka&#8217;s exactly-once delivery guarantees are critical. While Zerobus Ingest roadmap includes this feature, organizations that need it today must still rely on Kafka.</p><p><strong>Ultra-low latency fan-out</strong> - If your use case requires multiple consumers reading the same stream with different processing logic, Kafka&#8217;s pub-sub model excels. Zerobus Ingest currently lacks the subscriber/consumer model that makes Kafka so powerful for fan-out patterns where one stream feeds multiple downstream applications.</p><h3><strong>Other Limitations</strong></h3><p>As of writing,</p><ul><li><p>Zerobus Ingest provides <strong>at-least-once delivery semantics</strong>, meaning each message will be delivered one or more times. It does not yet support <strong>exactly-once</strong> semantics. However, the duplicates can be handled using other Databricks and delta features.</p></li><li><p>Zerobus Ingest currently supports <strong>writing only to managed</strong> Delta tables</p></li><li><p><strong>Schema evolution</strong> on target tables is <strong>not yet supported</strong> in Zerobus Ingest, so the table schema must match the incoming message structure.</p></li><li><p>Each individual message is limited to a <strong>maximum size of 10 MB</strong> when processed through Zerobus Ingest.</p></li></ul><h3><strong>What&#8217;s Next</strong></h3><p>Databricks is actively enhancing Zerobus Ingest with several key features in development. The roadmap includes <strong>exactly-once delivery semantics</strong> for stronger consistency guarantees, <strong>MQTT</strong> protocol support to broaden IoT and device connectivity options, comprehensive <strong>CDC pipeline capabilities</strong> that will handle updates and deletes in addition to inserts and <strong>subscriber/consumer</strong> model to enable more flexible data consumption patterns.</p><p>Enjoy streaming in a cost efficient and simplified manner!</p><div class="pullquote"><h2><strong>Conclusion</strong></h2></div><p>Zerobus Ingest offers a compelling alternative to message bus in a lot of scenarios. While Kafka remains essential for complex streaming architectures, Zerobus Ingest closes the gap for straightforward ingestion use cases&#8212;delivering the reliability you need at a fraction of the cost and complexity.</p><p>The cost savings extend beyond infrastructure. Kafka expertise commands premium salaries, and maintaining distributed message bus systems requires dedicated engineering time that could be spent on higher-value work. Zerobus Ingest&#8217;s simplicity means junior engineers can manage what previously required highly skilled distributed systems expertise. When you factor in reduced operational overhead, lower training costs, and faster time-to-production, the economics become even more compelling. Sometimes the best architecture isn&#8217;t the most sophisticated&#8212;it&#8217;s the one that solves your problem.</p><h2>FAQ</h2><h5><strong>What is Zerobus Ingest?</strong></h5><p>A fully managed ingestion service for sending application data directly into Delta tables without operating a separate message bus.</p><h5><strong>When is Zerobus Ingest a good fit?</strong></h5><p>It works best for high-volume, append-only ingestion where the primary goal is to land data in the Lakehouse quickly and simply.</p><h5><strong>Does Zerobus Ingest replace a message bus for every use case?</strong></h5><p>No. If you need long retention, multiple downstream subscribers, or mature exactly-once semantics today, a message bus is still the better fit.</p><h5><strong>What delivery semantics does Zerobus Ingest provide today?</strong></h5><p>It currently provides <strong>at-least-once</strong> delivery, so downstream designs should account for possible duplicates.</p><h5><strong>What tables and formats are supported?</strong></h5><p>The draft states that Zerobus Ingest currently writes only to <strong>managed Delta tables</strong> and supports <strong>Protocol Buffers</strong> and <strong>JSON</strong>.</p><h5><strong>What are the main implementation requirements?</strong></h5><p>You need to create the target table in advance, align the incoming schema with the table schema, and handle SDK exceptions appropri</p><p></p><h2>Reference</h2><p><a href="https://www.databricks.com/blog/announcing-general-availability-zerobus-ingest-part-lakeflow-connect">Stream 10+ GB/sec to your lakehouse in under 5 seconds with zero infrastructure overhead</a></p><p><a href="https://community.databricks.com/t5/technical-blog/deep-dive-on-zerobus-ingest-now-ga/ba-p/148385">Deep dive on Zerobus Ingest, now GA</a></p><h2></h2>]]></content:encoded></item><item><title><![CDATA[Observability for Any Agent, Anywhere: Production-Ready Tracing with MLflow & OpenTelemetry on Databricks]]></title><description><![CDATA[MLflow OpenTelemetry traces in Unity Catalog create a continuous improvement flywheel for AI agents through analytics, evals, and monitoring.]]></description><link>https://www.databricksters.com/p/observability-for-any-agent-anywhere</link><guid isPermaLink="false">https://www.databricksters.com/p/observability-for-any-agent-anywhere</guid><dc:creator><![CDATA[Anoop Sunke]]></dc:creator><pubDate>Fri, 20 Feb 2026 16:03:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d936aaad-6c40-4be8-8c7a-0e4ae848188d_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Executive Summary</h2><ul><li><p><strong>The Problem:</strong> AI agents generate massive volumes of trace data, but traditional observability tools make that data expensive to retain, difficult to govern, and hard to use in evaluation and analytics workflows.</p></li><li><p><strong>The Solution:</strong> MLflow now supports writing OpenTelemetry (OTEL) traces directly to Unity Catalog tables via a fully managed, serverless ingestion path.</p></li><li><p><strong>The Benefit: </strong>By landing traces directly in the Lakehouse, teams get governed, analytics-ready observability data with long-term retention, unified evaluation and monitoring workflows, and no OTEL infrastructure to operate.</p></li><li><p><strong>The Outcome: </strong>Production traces become immediately usable for analysis and evaluation, enabling faster iteration loops between real-world usage, model evaluation, and continuous improvement.</p></li></ul><h2>Why AI Tracing Breaks Traditional Observability</h2><p>As AI applications move into production, traces become one of the clearest ways to understand how agents actually behave by capturing prompts, tool calls, responses, latency, and execution paths. Without strong tracing, it&#8217;s hard to understand why agents behave the way they do, making debugging, evaluation, and governance much more difficult.</p><p>The challenge isn&#8217;t that observability platforms can&#8217;t ingest this data. It&#8217;s that AI traces quickly become valuable beyond debugging. Teams want to retain them longer, analyze them with SQL, join them with business and model data, and reuse them for evaluation and monitoring. When traces live only inside observability systems, that flexibility is limited, governance becomes fragmented, and moving data into analytics workflows often requires extra pipelines and duplication, especially when sensitive prompt data is involved.</p><h2>MLflow and OTEL Trace Ingestion</h2><p>Databricks now <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/trace-unity-catalog">supports</a> writing MLflow traces directly to Unity Catalog using the OpenTelemetry (OTEL) format. In practice, this means traces can be ingested in real time and stored in Delta tables, where they benefit from the same scalability, governance, and tooling as the rest of your data.</p><p>This changes how teams can use trace data:</p><ul><li><p><strong>Real-time ingestion with practical retention:</strong> Traces can be written as they&#8217;re generated at high throughput (GBs/sec) and retained long-term without the cost pressure typically associated with observability platforms.</p></li><li><p><strong>Analyze and govern using the Lakehouse:</strong> Once traces are tables, you can treat them like any other dataset: query them with SQL, build dashboards, run ETL pipelines, use tools like <a href="https://docs.databricks.com/aws/en/genie/">Genie</a>, and apply governance controls such as PII masking.</p></li><li><p><strong>Use the full MLflow evaluation stack:</strong> Persisting traces in Unity Catalog removes typical experiment constraints (such as <a href="https://docs.databricks.com/aws/en/resources/limits">trace caps</a>), making it easier to run large offline evaluations, monitor production systems, and continuously improve quality as workloads grow.</p></li></ul><h3>The Engineering trade-off: SaaS vs. Lakehouse</h3><p>So why not rely entirely on a SaaS observability tool?</p><ol><li><p><strong>Retention economics: </strong>Agents generate massive text payloads. Storing this data in Delta Lake on object storage is often significantly more cost-effective than SaaS-based retention models.</p></li><li><p><strong>The PII deadlock: </strong>Sending raw prompts to third-party platforms can create InfoSec friction. Keeping traces inside Unity Catalog helps maintain data sovereignty and simplifies governance.</p></li><li><p><strong>Analytics, not just telemetry:</strong> SaaS tools are strong for operational metrics like latency, but the Lakehouse gives you something different: an analytics and AI engine. You can join traces with business data &#8212; revenue, conversions, customer outcomes &#8212; to understand real impact, not just system health. Furthermore, the Lakehouse enables you to apply AI directly to your traces, allowing for advanced use cases like classifying user interactions as &#8216;good&#8217; or &#8216;bad,&#8217; and building evaluation frameworks to continuously improve system quality.</p></li></ol><h2>Architecture: Serverless OpenTelemetry Ingestion</h2><p>MLflow tracing can use the OpenTelemetry (OTEL) standard, which separates instrumentation from storage. In a typical OTEL deployment, teams are responsible for running collector fleets, scaling agents, handling backpressure, and managing reliability.</p><p>Databricks removes that operational layer by providing a managed OpenTelemetry endpoint, transparently powered by <a href="https://docs.databricks.com/aws/en/ingestion/zerobus-overview">Zerobus</a>. Zerobus is a serverless ingestion engine that enables applications to stream data directly into Delta tables using a gRPC API. Applications can easily export spans, logs, and metrics from <strong>any OTEL-compatible client</strong> directly to Unity Catalog tables, where the data is stored in Delta format.  Zerobus acts as the telemetry pipeline, handling ingestion and durability so teams don&#8217;t have to operate their own collectors.</p><p>From there, traces become first-class data in the Lakehouse, powering MLflow evaluations and monitoring, ad-hoc SQL analysis, dashboards, and downstream analytics. This creates a continuous improvement <strong>flywheel</strong> where production behavior feeds evaluation and analysis, which in turn drives faster iteration and better agent performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AIlN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AIlN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 424w, https://substackcdn.com/image/fetch/$s_!AIlN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 848w, https://substackcdn.com/image/fetch/$s_!AIlN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 1272w, https://substackcdn.com/image/fetch/$s_!AIlN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AIlN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png" width="1456" height="777" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:777,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3766120,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/188328490?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AIlN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 424w, https://substackcdn.com/image/fetch/$s_!AIlN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 848w, https://substackcdn.com/image/fetch/$s_!AIlN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 1272w, https://substackcdn.com/image/fetch/$s_!AIlN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feba10508-cf89-4511-8266-a232bac5f7e3_1920x1025.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Tutorial: Wiring Traces into the Lakehouse</strong></h2><h3>Sample agent: Support manager assistant</h3><p>For this blog, we&#8217;ll create a simple support manager assistant that we can use to demonstrate tracing end-to-end. The agent can be deployed outside of Databricks, as we&#8217;ve done here, highlighting that trace ingestion is decoupled from where the agent runs.</p><p>We built a LangGraph agent powered by a <a href="https://docs.databricks.com/aws/en/machine-learning/foundation-model-apis/supported-models#-anthropic-claude-sonnet-4">Databricks-hosted Claude Sonnet 4 model</a> for reasoning and response generation. The agent calls a Genie Space as a tool, which you can deploy <a href="https://www.databricks.com/resources/demos/tutorials/aibi-customer-support-review-dashboards-and-genie?itm_data=demo_center&amp;itm_source=www&amp;itm_category=resources&amp;itm_page=tutorials&amp;itm_location=Data%20Warehouse%20and%20BI&amp;itm_component=card&amp;itm_offer=aibi-customer-support-review-dashboards-and-genie">here</a>.</p><p>When a user asks a data-driven question, the agent invokes Genie through the MCP tool API. Genie translates the request into SQL, executes it against the support dataset, and returns the result. The agent then summarizes the findings and provides actionable takeaways for a support manager.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ye7I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ye7I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 424w, https://substackcdn.com/image/fetch/$s_!ye7I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 848w, https://substackcdn.com/image/fetch/$s_!ye7I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 1272w, https://substackcdn.com/image/fetch/$s_!ye7I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ye7I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png" width="667" height="111" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:111,&quot;width&quot;:667,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/188328490?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ye7I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 424w, https://substackcdn.com/image/fetch/$s_!ye7I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 848w, https://substackcdn.com/image/fetch/$s_!ye7I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 1272w, https://substackcdn.com/image/fetch/$s_!ye7I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99fd4f6b-3002-49d2-bd64-dba9754731fc_667x111.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Setting up MLflow tracing with UC</h3><p>Before instrumenting the agent, we first configure MLflow to store traces in Unity Catalog. This involves creating the underlying OpenTelemetry tables and linking them to an MLflow experiment so traces can be searched, analyzed, and annotated from the UI. Start by identifying (or creating) a SQL warehouse and an MLflow experiment, then use the MLflow Python library to create the Unity Catalog tables and link the schema to the experiment. For full steps, follow the docs <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/trace-unity-catalog">here</a>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;db906a00-42d5-4fd1-addd-154efbb0f3dd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">import os
import mlflow
from mlflow.entities import UCSchemaLocation
from mlflow.tracing.enablement import set_experiment_trace_location

mlflow.set_tracking_uri("databricks")

os.environ["MLFLOW_TRACING_SQL_WAREHOUSE_ID"] = "&lt;warehouse-id&gt;"

experiment_name = "&lt;experiment-name&gt;"
catalog_name = "&lt;catalog&gt;"
schema_name = "&lt;schema&gt;"

experiment_id = mlflow.create_experiment(name=experiment_name)

set_experiment_trace_location(
    location=UCSchemaLocation(
        catalog_name=catalog_name,
        schema_name=schema_name,
    ),
    experiment_id=experiment_id,
)</code></pre></div><p>This setup creates Unity Catalog tables for spans, logs, and metrics. Once traces begin flowing, the MLflow service also creates Databricks views that transform the underlying OpenTelemetry data into an MLflow-friendly format for easier querying and analysis. These include:</p><ul><li><p><strong>mlflow_experiment_trace_otel_spans</strong>: detailed execution steps for each request</p></li><li><p><strong>mlflow_experiment_trace_otel_logs</strong>: structured events such as metadata, tags, and assessments</p></li><li><p><strong>mlflow_experiment_trace_otel_metrics</strong>: numerical telemetry captured during execution</p></li><li><p><strong>mlflow_experiment_trace_metadata</strong>: MLflow tags, metadata, and assessments grouped by trace ID</p></li><li><p><strong>mlflow_experiment_trace_unified</strong>: a consolidated view that assembles all trace data into a single record per trace. For better performance at scale, consider converting it to a materialized view with incremental refresh.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0mPM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0mPM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 424w, https://substackcdn.com/image/fetch/$s_!0mPM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 848w, https://substackcdn.com/image/fetch/$s_!0mPM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 1272w, https://substackcdn.com/image/fetch/$s_!0mPM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0mPM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png" width="790" height="276" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bd12fad-1472-4d95-86d6-af315e542030_790x276.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:276,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0mPM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 424w, https://substackcdn.com/image/fetch/$s_!0mPM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 848w, https://substackcdn.com/image/fetch/$s_!0mPM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 1272w, https://substackcdn.com/image/fetch/$s_!0mPM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd12fad-1472-4d95-86d6-af315e542030_790x276.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After configuring the trace destination, agent instrumentation remains the same. You can do automatic and/or manual tracing as described <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/app-instrumentation/">here</a>. In our example, we rely on <code>mlflow.langchain.autolog()</code> to capture the detailed LangGraph execution (model calls and tool calls). We also wrap the entrypoint with <code>@mlflow.trace</code> to establish a request-level root span, allowing each invocation to be observed as a single end-to-end execution.</p><h3>Inspecting a sample trace</h3><p>Now that the agent is instrumented and traces are flowing into Unity Catalog, let&#8217;s look at a real execution.</p><p>For this example, we asked the Support Manager Assistant:</p><blockquote><p>&#8220;Which support engineer should I put up for promotion?&#8221;</p></blockquote><p>The agent evaluated the request, called the Genie space multiple times to gather supporting data, and returned a recommendation based on performance metrics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AcEM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AcEM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 424w, https://substackcdn.com/image/fetch/$s_!AcEM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 848w, https://substackcdn.com/image/fetch/$s_!AcEM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 1272w, https://substackcdn.com/image/fetch/$s_!AcEM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AcEM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png" width="1210" height="560" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:1210,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AcEM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 424w, https://substackcdn.com/image/fetch/$s_!AcEM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 848w, https://substackcdn.com/image/fetch/$s_!AcEM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 1272w, https://substackcdn.com/image/fetch/$s_!AcEM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a476c30-e51f-4298-977e-ad5a85543aa4_1210x560.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While the response looks straightforward, the trace reveals the underlying execution path that produced it. In the MLflow experiment, we can see each of the tool calls as well as the reasoning logic of our claude sonnet model. We can see that it called the genie space tool three times before putting together a final answer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6G7s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6G7s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 424w, https://substackcdn.com/image/fetch/$s_!6G7s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 848w, https://substackcdn.com/image/fetch/$s_!6G7s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 1272w, https://substackcdn.com/image/fetch/$s_!6G7s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6G7s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png" width="488" height="623" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:623,&quot;width&quot;:488,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6G7s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 424w, https://substackcdn.com/image/fetch/$s_!6G7s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 848w, https://substackcdn.com/image/fetch/$s_!6G7s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 1272w, https://substackcdn.com/image/fetch/$s_!6G7s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d552252-5c34-46b2-a430-cc0f8c7b7504_488x623.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can click through each of the individual steps to study the inputs and outputs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ejSG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ejSG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 424w, https://substackcdn.com/image/fetch/$s_!ejSG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 848w, https://substackcdn.com/image/fetch/$s_!ejSG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 1272w, https://substackcdn.com/image/fetch/$s_!ejSG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ejSG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png" width="1056" height="581" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:581,&quot;width&quot;:1056,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ejSG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 424w, https://substackcdn.com/image/fetch/$s_!ejSG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 848w, https://substackcdn.com/image/fetch/$s_!ejSG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 1272w, https://substackcdn.com/image/fetch/$s_!ejSG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a107d80-9a43-49d6-8042-c0bc5fc89184_1056x581.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Because traces are stored as Delta tables, they can be queried like any other dataset. We can start with the <code>mlflow_experiment_trace_unified</code> view, where we will find a record that includes the request, response, trace metadata, and an array of the spans.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mhqL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mhqL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 424w, https://substackcdn.com/image/fetch/$s_!mhqL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 848w, https://substackcdn.com/image/fetch/$s_!mhqL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 1272w, https://substackcdn.com/image/fetch/$s_!mhqL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mhqL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png" width="779" height="438" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:438,&quot;width&quot;:779,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/188328490?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mhqL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 424w, https://substackcdn.com/image/fetch/$s_!mhqL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 848w, https://substackcdn.com/image/fetch/$s_!mhqL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 1272w, https://substackcdn.com/image/fetch/$s_!mhqL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151eb393-06db-488f-9f9f-f8552c1dc125_779x438.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Beyond Debugging: Analytics on Trace Data</h2><p>Now that traces are stored in Unity Catalog, they become immediately available for both batch and streaming analytics.</p><h3>Governance in Unity Catalog</h3><p>Prompts and responses, however, often contain sensitive information, so treating trace data as governed data is critical. By storing it in Unity Catalog, traces inherit fine-grained access controls, from catalog and schema permissions to column masking and row-level filtering,  enabling secure, production-ready analytics without limiting flexibility.</p><p>Once access is established, teams can securely run ad-hoc analytics by querying the underlying tables and views with SQL, as we did above. We can also build ETL pipelines, in addition to dashboards and genie spaces, for actionable business insights.</p><h3>Dashboards</h3><p>One of the most powerful aspects of having traces in Unity Catalog is that we aren&#8217;t locked into a vendor&#8217;s rigid, pre-canned views. Because the traces are in Delta tables, we can build custom dashboards that reflect our specific business logic, not just generic system health.</p><p>Using AI/BI Dashboards, we built an<strong> <a href="https://github.com/brunohub/mlflow-traces-observability/tree/main">AI Operations Center</a> </strong>that sits directly on top of our trace tables. This dashboard provides a unified view of our application performance, costs, and reliability. Instead of learning a proprietary query language, we just wrote standard SQL (with the help of <a href="https://www.databricks.com/blog/introducing-databricks-assistant-data-science-agent">AI</a>) to extract exactly what we needed.</p><p>Here are some key capabilities this unlocked:</p><p><strong>Custom Cost &amp; Token Analysis</strong> <br>Generic &#8220;cost&#8221; metrics are rarely accurate because every team negotiates different rates or uses fine-tuned models with unique pricing. Since we control the SQL, we embedded our specific pricing logic directly into the query. Our dashboard tracks token usage by model type (e.g., GPT-4o vs. Claude 4 Sonnet) and applies our contract-specific rates to calculate a precise <strong>Estimated Cost per Trace</strong>. This lets us spot expensive outliers immediately&#8212;like a single complex query that costs $0.50 due to a retrieval loop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!czXE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!czXE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 424w, https://substackcdn.com/image/fetch/$s_!czXE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 848w, https://substackcdn.com/image/fetch/$s_!czXE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 1272w, https://substackcdn.com/image/fetch/$s_!czXE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!czXE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png" width="1041" height="708" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:708,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!czXE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 424w, https://substackcdn.com/image/fetch/$s_!czXE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 848w, https://substackcdn.com/image/fetch/$s_!czXE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 1272w, https://substackcdn.com/image/fetch/$s_!czXE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004f1220-a2ad-48cd-97a9-c8e627211d33_1041x708.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Component-Level Performance</strong></p><p>High-level latency metrics often hide the real culprit. Is the bottleneck the LLM or is it the Genie space retrieval? We built a <strong>&#8220;Tool Performance&#8221;</strong> widget that breaks down latency (P50, P99) and error rates for every individual tool in our agent (e.g., retrieve_docs vs. generate_response). This allows us to pinpoint exactly which step in the chain is degrading the user experience.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lJfx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lJfx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 424w, https://substackcdn.com/image/fetch/$s_!lJfx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 848w, https://substackcdn.com/image/fetch/$s_!lJfx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 1272w, https://substackcdn.com/image/fetch/$s_!lJfx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lJfx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png" width="1310" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1310,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lJfx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 424w, https://substackcdn.com/image/fetch/$s_!lJfx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 848w, https://substackcdn.com/image/fetch/$s_!lJfx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 1272w, https://substackcdn.com/image/fetch/$s_!lJfx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89f860e1-86b0-4285-8c96-cfc77d290f24_1310x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Genie spaces</h3><p>Both business and technical stakeholders often want to explore agent behavior without writing SQL. By exposing trace tables through Genie, teams can enable natural-language analysis over their telemetry data, allowing users to ask questions about performance, tool usage, latency, and model behavior directly. In our example, this could include questions such as:</p><ul><li><p>What types of requests require escalation?</p></li><li><p>Are tool retries increasing?</p></li><li><p>Which queries trigger the most complex execution paths?</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xlNf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xlNf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 424w, https://substackcdn.com/image/fetch/$s_!xlNf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 848w, https://substackcdn.com/image/fetch/$s_!xlNf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 1272w, https://substackcdn.com/image/fetch/$s_!xlNf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xlNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png" width="920" height="437" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:437,&quot;width&quot;:920,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xlNf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 424w, https://substackcdn.com/image/fetch/$s_!xlNf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 848w, https://substackcdn.com/image/fetch/$s_!xlNf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 1272w, https://substackcdn.com/image/fetch/$s_!xlNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d855164-9878-4f7a-90c5-3540ab887ef9_920x437.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>ETL pipelines</h3><p>Because traces are stored as Delta tables, they can feed downstream ETL pipelines just like any other dataset. By enabling <a href="https://docs.databricks.com/aws/en/delta/delta-change-data-feed">Change Data Feed (CDF)</a>, teams can process trace data incrementally, either in batch or streaming, without repeatedly scanning entire tables.</p><p>This makes it possible to operationalize observability. For example, a pipeline could monitor trace patterns and trigger alerts when latency exceeds defined thresholds, tool failures spike, or token usage deviates from expected baselines. These signals can then feed dashboards, notification systems, or automated remediation workflows.</p><p>Importantly, this complements real-time protections such as <a href="https://docs.databricks.com/aws/en/ai-gateway/overview-serving-endpoints#ai-guardrails">AI Guardrails</a>. While guardrails enforce policy at request time, ETL pipelines create a feedback loop, helping teams analyze trends, refine policies, and continuously improve agent performance.</p><p></p><h2>Closing the Loop: From Production Traces to Evaluation</h2><p>Once traces are available, they can power the full MLflow 3 <a href="https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/">evaluation stack</a>, enabling teams to measure, improve, and maintain the quality of their AI applications across the entire lifecycle. Evaluation and monitoring build directly on tracing, allowing the same telemetry captured during development, testing, and production to be scored using LLM judges and custom metrics.</p><h3>Evaluate during development using AI Judges</h3><p>MLflow allows us to run evaluations against an evaluation dataset, applying built-in or custom judges to score response quality. One effective approach is to bootstrap this dataset from real traces. Because these prompts originate from actual user interactions, they better represent the scenarios your agent must handle compared to synthetic test cases.</p><p>Below, we create an evaluation dataset from recently captured traces. MLflow uses a SQL warehouse to search and materialize dataset records, so be sure to configure the warehouse ID in your environment.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;bc030680-8dd0-46fd-8220-29d726db3488&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import os
import mlflow
import mlflow.genai.datasets
import time

# Required for dataset operations
os.environ["MLFLOW_TRACING_SQL_WAREHOUSE_ID"] = MLFLOW_TRACING_SQL_WAREHOUSE_ID

DATASET_NAME = f"{CATALOG_NAME}.{SCHEMA_NAME}.support_management_chatbot_traces"

# Create (or load) the dataset
try:
    eval_dataset = mlflow.genai.datasets.create_dataset(name=DATASET_NAME)
except Exception:
    eval_dataset = mlflow.genai.get_dataset(name=DATASET_NAME)

# Pull recent traces (example - from yesterday)
yesterday = int((time.time() - 60 * 60 * 24) * 1000)

traces_df = mlflow.search_traces(
    filter_string=f"attributes.timestamp_ms &gt; {yesterday}",
    order_by=["attributes.timestamp_ms DESC"],
)

# Merge traces into the dataset
eval_dataset = eval_dataset.merge_records(traces_df[["inputs"]])</code></pre></div><p>With the dataset in place, we can define the judges that will score our application. MLflow provides a set of built-in judges, and also allows us to define custom guidelines tailored to our agent&#8217;s expected behavior.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;542a1e40-6a26-40e3-a001-a638c4f625fc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">from mlflow.genai.scorers import RelevanceToQuery, Safety, Guidelines

# Define judges
agent_judges = [
    RelevanceToQuery(),
    Guidelines(
        name="analytical_correctness",
        guidelines="The response must correctly interpret the data and avoid unsupported conclusions.",
    ),
    Guidelines(
        name="actionable_support_insights",
        guidelines="The response must provide at least one concrete, data-backed recommendation.",
    ),
    Guidelines(
        name="performance_management",
        guidelines="The response should not recommend admonishing or firing employees.",
    ),
    Safety(),
]

# Run evaluation
eval_results = mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=predict_fn,
    scorers=agent_judges,
)

eval_results</code></pre></div><p>And we can now see the results in the MLflow experiment.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JMne!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JMne!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 424w, https://substackcdn.com/image/fetch/$s_!JMne!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 848w, https://substackcdn.com/image/fetch/$s_!JMne!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 1272w, https://substackcdn.com/image/fetch/$s_!JMne!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JMne!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png" width="1332" height="319" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:319,&quot;width&quot;:1332,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69873,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/188328490?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JMne!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 424w, https://substackcdn.com/image/fetch/$s_!JMne!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 848w, https://substackcdn.com/image/fetch/$s_!JMne!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 1272w, https://substackcdn.com/image/fetch/$s_!JMne!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855155-6d0d-4355-91aa-bddada2a1bdc_1332x319.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Production monitoring</h3><p>Development evaluations help us validate behavior before release, but production monitoring shows us how the application performs with real users. MLflow can automatically evaluate live traces using the same judges, helping us quickly detect regressions, drift, and emerging failure patterns. This turns evaluation from a one-time task into an ongoing practice as the application evolves.</p><p></p><h2>Frequently Asked Questions (FAQ)</h2><ul><li><p><strong>Can I use this for agents running outside of Databricks?</strong></p><p>Yes, the agent can be running anywhere. In fact the support assistant agent example that was used for this blog is deployed locally.</p></li><li><p><strong>What are the throughput and storage limits of this solution?</strong></p><p>The ingestion throughput <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/trace-unity-catalog#-limitations">limit is 200 QPS</a> today. There is no limit on storage. Previous limits on traces per experiment are no longer applicable. If you need higher throughput limits, please reach out to your Databricks account team.</p></li><li><p><strong>What can I do to ensure my search queries, MLflow experiment experience, and downstream analytics remain performant?</strong></p><p>Consider optimizing the OTEL tables using Z-ordering as described <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/observe-with-traces/query-dbsql#performance-considerations">here</a>.</p></li><li><p><strong>How does this handle PII found in user prompts?</strong></p><p>This feature does not apply any special handling to PII. However, the data is stored in Unity Catalog, where you can leverage governance capabilities, such as fine-grained access controls, column masking, and row filtering, to manage and restrict downstream access.</p></li></ul><p></p><h2>Get started</h2><p>To get started, follow along with the <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/trace-unity-catalog">documentation</a>.</p>]]></content:encoded></item><item><title><![CDATA[Genie Integration with Google Chat]]></title><description><![CDATA[Talk to your data from Google Chat - powered by Databricks Genie!]]></description><link>https://www.databricksters.com/p/genie-integration-with-google-chat</link><guid isPermaLink="false">https://www.databricksters.com/p/genie-integration-with-google-chat</guid><dc:creator><![CDATA[Ambarish]]></dc:creator><pubDate>Tue, 10 Feb 2026 16:03:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_uxC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_uxC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_uxC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!_uxC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!_uxC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!_uxC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_uxC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_uxC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!_uxC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!_uxC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!_uxC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40949259-0752-4094-8d2c-0afb55d160ed_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1><strong>Integrate Databricks Genie with Google Chat in 30 mins!</strong></h1><p>With this integration, your team can query business data directly from Google Chat using natural language. No SQL knowledge needed, no switching between tools.</p><h2><strong>Features</strong></h2><ul><li><p><strong>Conversational AI in Google Chat</strong> - Ask questions like &#8220;What were total sales last quarter?&#8221; and get instant, data-backed answers</p></li><li><p><strong>Rich Card Responses</strong> - Results are displayed as formatted Google Chat Cards with data tables, generated SQL</p></li><li><p><strong>Response Feedback Loop</strong> - Thumbs up/down buttons send ratings back to Genie so space authors can review and improve</p></li><li><p><strong>Direct Messages and @Mentions</strong> - Works in 1:1 DMs with the bot or by @mentioning it in any Chat Space</p></li><li><p><strong>No Infrastructure Required</strong> - Runs entirely on Google Apps Script (serverless), no servers to manage, no Databricks App to deploy</p></li><li><p><strong>Secure by Default</strong> - Credentials stored in Apps Script Properties Service, data governed by Unity Catalog</p></li></ul><h2><strong>Architecture</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CiOu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CiOu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 424w, https://substackcdn.com/image/fetch/$s_!CiOu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 848w, https://substackcdn.com/image/fetch/$s_!CiOu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 1272w, https://substackcdn.com/image/fetch/$s_!CiOu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CiOu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png" width="1011" height="339" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:339,&quot;width&quot;:1011,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CiOu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 424w, https://substackcdn.com/image/fetch/$s_!CiOu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 848w, https://substackcdn.com/image/fetch/$s_!CiOu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 1272w, https://substackcdn.com/image/fetch/$s_!CiOu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33307ed-dbc7-47a6-acc5-d69bbfbfa887_1011x339.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Prerequisites</strong></h2><ol><li><p><strong>Databricks Workspace</strong></p><ul><li><p>Active Databricks workspace</p></li><li><p>Genie space ID</p></li></ul></li><li><p><strong>Google Workspace with the Enterprise plan</strong></p><ul><li><p>Enable Google Chat API in the project</p></li><li><p>Enable App Script in the same project</p></li></ul></li></ol><h2><strong>Setup Instructions</strong></h2><p><strong>Set up Genie Space in Databricks Workspace and note the few parameters</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h4>Genie Details</h4><p>If you already have a Genie Space, skip to the next step. Otherwise:</p><ol><li><p>In your Databricks workspace, navigate to <strong>Genie</strong> in the left sidebar</p></li><li><p>Click <strong>New</strong> in the upper-right corner of the screen</p></li><li><p>Add your data sources. Then, click <strong>Create</strong></p></li><li><p>From the <strong>Settings</strong> tab, note the <strong>Space ID</strong> (a 32-character string) -- you will need this later</p></li></ol><blockquote><p><strong>Tip:</strong> The quality of your Genie Space directly affects the quality of your answers. Add company-specific context, test with expected user questions, and iterate on your table annotations.</p></blockquote><h4>Generate a Databricks Token</h4><blockquote><p>Apps Script needs a token to authenticate with the Genie API.</p></blockquote><ol><li><p>In your Databricks workspace, click your username (top right) then <strong>Settings</strong></p></li><li><p>Go to <strong>Developer</strong> then <strong>Access Tokens</strong></p></li><li><p>Click <strong>Generate New Token</strong>, give it a description like &#8220;Google Chat Genie Bot&#8221;</p></li><li><p><strong>Copy the token immediately</strong> -- you will not see it again</p></li></ol><blockquote><p>Also note your <strong>workspace URL</strong> (e.g., https://your-instance.cloud.databricks.com).</p></blockquote><p><strong>For production:</strong> Use OAuth M2M with a service principal instead of a personal access token for better security and automated token rotation.</p><h3><strong>Google Workspace Setup</strong></h3><h4>Google Cloud Project Setup</h4><ol><li><p>Go to the <strong>Google Cloud Console</strong> (console.cloud.google.com)</p></li><li><p>Create a new project (or select an existing one)</p></li><li><p>Navigate to <strong>APIs and Services</strong> then <strong>Library</strong></p></li><li><p>Search for <strong>Google Chat API</strong> and click <strong>Enable</strong></p></li><li><p>Go to <strong>APIs and Services</strong> then <strong>OAuth consent screen</strong></p></li><li><p>Select <strong>Internal</strong>, fill in the app name (e.g., &#8220;Genie Data Bot&#8221;), and save</p></li></ol><h4>Create the Apps Script Project</h4><ol><li><p>Go to <strong>script.google.com</strong> and click <strong>New project</strong></p></li><li><p>Rename the project to <strong>Genie Chat Bot</strong></p></li><li><p>Download all files from the repo <a href="https://github.com/adgitdemo/ad_databricks/tree/main/genie-google-chat-app">https://github.com/adgitdemo/ad_databricks/tree/main/genie-google-chat-app</a>  and create the corresponding .gs files in the Apps Script editor:</p></li></ol><ul><li><p><strong>Code.gs</strong> -- Chat event handlers (onMessage, onAddedToSpace, submitFeedback)</p></li><li><p><strong>appscript.json </strong>- App Script configuration</p></li><li><p>(Recommended) Go to <strong>Project Settings</strong> (gear icon) then under the &#8220;<strong>Script Properties</strong>&#8220; section, click Add script property (or Edit script properties):</p><ul><li><p>DATABRICKS_TOKEN = your token from Step 1</p></li><li><p>GENIE_SPACE_ID = your space ID from Step 1</p></li><li><p>DATABRICKS_HOST = your workspace URL (no trailing slash)</p></li></ul></li><li><p>The code reads from Script Properties automatically.</p></li></ul><h4>Configure the Google Chat API</h4><ol><li><p>In Apps Script, click <strong>Deploy-&gt;New Deployment, </strong>select<strong> Add-On </strong>configuration, and deploy. Copy the <strong>Deployment ID </strong>once process completed.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GrCk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GrCk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 424w, https://substackcdn.com/image/fetch/$s_!GrCk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 848w, https://substackcdn.com/image/fetch/$s_!GrCk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 1272w, https://substackcdn.com/image/fetch/$s_!GrCk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GrCk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png" width="1456" height="1143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2042c1f-1024-4938-a367-12965751174e_1496x1174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1143,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GrCk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 424w, https://substackcdn.com/image/fetch/$s_!GrCk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 848w, https://substackcdn.com/image/fetch/$s_!GrCk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 1272w, https://substackcdn.com/image/fetch/$s_!GrCk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2042c1f-1024-4938-a367-12965751174e_1496x1174.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nBd1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nBd1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 424w, https://substackcdn.com/image/fetch/$s_!nBd1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 848w, https://substackcdn.com/image/fetch/$s_!nBd1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 1272w, https://substackcdn.com/image/fetch/$s_!nBd1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nBd1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png" width="1098" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1098,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nBd1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 424w, https://substackcdn.com/image/fetch/$s_!nBd1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 848w, https://substackcdn.com/image/fetch/$s_!nBd1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 1272w, https://substackcdn.com/image/fetch/$s_!nBd1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1e5698-52a8-4aea-816b-d7b166cd7e19_1098x788.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol start="2"><li><p>Back in the <strong>Google Cloud Console</strong>, go to <strong>APIs and Services</strong> then <strong>Google Chat API</strong> then <strong>Configuration</strong></p></li><li><p>Fill in:</p></li></ol><blockquote><p>   - <strong>App name:</strong> db-genie-1</p><p>   - <strong>Avatar URL:</strong> (optional -- your company logo or Databricks icon)</p><p>   - <strong>Description:</strong> Databricks Genie Bot</p><p>   - <strong>Functionality:</strong> Check &#8220;Join spaces and group conversations&#8221;</p><p>   - <strong>Connection settings:</strong> Select <strong>Apps Script</strong> and paste your Head Deployment ID in the <strong>Deployment ID </strong>field</p><p>   - <strong>Triggers </strong>specify the apps script functions to handle interactions</p><p>   - <strong>Visibility:</strong> Select &#8220;Specific people and groups&#8221; and add your team or test users</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CGrX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CGrX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 424w, https://substackcdn.com/image/fetch/$s_!CGrX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 848w, https://substackcdn.com/image/fetch/$s_!CGrX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 1272w, https://substackcdn.com/image/fetch/$s_!CGrX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CGrX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png" width="1104" height="1338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1338,&quot;width&quot;:1104,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CGrX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 424w, https://substackcdn.com/image/fetch/$s_!CGrX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 848w, https://substackcdn.com/image/fetch/$s_!CGrX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 1272w, https://substackcdn.com/image/fetch/$s_!CGrX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcaf5dd3-515a-4972-b0ae-9eb3aaf4fb4b_1104x1338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KeI3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KeI3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 424w, https://substackcdn.com/image/fetch/$s_!KeI3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 848w, https://substackcdn.com/image/fetch/$s_!KeI3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!KeI3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KeI3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png" width="1112" height="1390" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1390,&quot;width&quot;:1112,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KeI3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 424w, https://substackcdn.com/image/fetch/$s_!KeI3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 848w, https://substackcdn.com/image/fetch/$s_!KeI3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!KeI3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35efd478-7c4b-4549-8437-905d5f98aa79_1112x1390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Add Chat App/Bot App to Google Chat</h4><p>1. Open the Space</p><blockquote><p>Go into the Space where you want the app.</p></blockquote><p>2. Click the Space name at the top</p><blockquote><p>A menu opens.</p></blockquote><p>3. Choose &#8220;Manage apps&#8221;</p><blockquote><p>Then click Add apps.</p></blockquote><p>4. Search + Add the app</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n8tR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n8tR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 424w, https://substackcdn.com/image/fetch/$s_!n8tR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 848w, https://substackcdn.com/image/fetch/$s_!n8tR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 1272w, https://substackcdn.com/image/fetch/$s_!n8tR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n8tR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png" width="715" height="153" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:153,&quot;width&quot;:715,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n8tR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 424w, https://substackcdn.com/image/fetch/$s_!n8tR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 848w, https://substackcdn.com/image/fetch/$s_!n8tR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 1272w, https://substackcdn.com/image/fetch/$s_!n8tR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46348ae2-ceb1-4c68-8d9d-9f54a6aedc38_715x153.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2><strong>Usage</strong></h2><p>Once deployed, your team can interact with the bot in two ways:</p><p><strong>Direct Messages</strong> -- Open a DM with Genie Data Bot and type your question directly. Great for ad-hoc data exploration.</p><p><strong>@Mentions in Spaces</strong> -- In any Google Chat Space where the bot is added, type @db-genie-1 followed by your question. Responses appear in the space, and anyone can ask follow-ups in the thread.</p><h2><strong>Demo Time</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pWah!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pWah!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 424w, https://substackcdn.com/image/fetch/$s_!pWah!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 848w, https://substackcdn.com/image/fetch/$s_!pWah!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 1272w, https://substackcdn.com/image/fetch/$s_!pWah!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pWah!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png" width="1456" height="700" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:700,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pWah!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 424w, https://substackcdn.com/image/fetch/$s_!pWah!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 848w, https://substackcdn.com/image/fetch/$s_!pWah!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 1272w, https://substackcdn.com/image/fetch/$s_!pWah!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d73461-fee1-4f0f-a608-d781c66b6020_1600x769.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NujH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NujH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 424w, https://substackcdn.com/image/fetch/$s_!NujH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 848w, https://substackcdn.com/image/fetch/$s_!NujH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 1272w, https://substackcdn.com/image/fetch/$s_!NujH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NujH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png" width="1307" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1307,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NujH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 424w, https://substackcdn.com/image/fetch/$s_!NujH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 848w, https://substackcdn.com/image/fetch/$s_!NujH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 1272w, https://substackcdn.com/image/fetch/$s_!NujH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76614994-e3bb-493e-8215-a0828fa16a30_1307x843.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Tips and Considerations</strong></h2><ul><li><p><strong>Answer quality depends on your Genie Space.</strong> Add detailed table/column descriptions, sample SQL, and company-specific context for best results.</p></li><li><p><strong>Sync timeout:</strong> Apps Script has a 30-second synchronous response limit for Chat. If your Genie queries take longer, consider an optional async pattern using time-driven triggers and the Advanced Chat Service.</p></li><li><p><strong>Rate limits:</strong> The Genie API allows approximately 5 queries per minute per workspace during Public Preview.</p></li><li><p><strong>Row limits:</strong> Genie returns up to 5,000 rows per query. The Chat card displays up to 20 rows for readability, with a link to view full results in Genie.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Migrating Existing Dashboards to Databricks AI/BI, Part 1: Context and Cascading Filters]]></title><description><![CDATA[How to implement context filters and &#8220;only relevant values&#8221; behavior in Databricks AI/BI Dashboards]]></description><link>https://www.databricksters.com/p/migrating-existing-dashboards-to</link><guid isPermaLink="false">https://www.databricksters.com/p/migrating-existing-dashboards-to</guid><dc:creator><![CDATA[Artem Chebotko]]></dc:creator><pubDate>Tue, 03 Feb 2026 18:02:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!72WV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!72WV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!72WV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!72WV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!72WV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!72WV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!72WV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/289c7692-0274-4274-a173-9db55df49c08_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!72WV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!72WV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!72WV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!72WV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289c7692-0274-4274-a173-9db55df49c08_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As a Specialist Solutions Architect at Databricks, I regularly work with customers who are migrating critical analytics from existing BI tools to Databricks AI/BI Dashboards &#8211; and the first questions I usually get are about filters.</p><p><strong>Teams want to know</strong>:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p><em>&#8220;What&#8217;s the Databricks equivalent of the context filters we use today?&#8221;</em></p></li><li><p><em>&#8220;Can we still do cascading filters where each dropdown only shows relevant values?&#8221;</em></p></li><li><p><em>&#8220;Do you support filter actions when I click on a bar or a point?&#8221;</em></p></li><li><p><em>&#8220;How do we do user-based filtering in AI/BI Dashboards?&#8221;</em></p></li></ul><p>These aren&#8217;t cosmetic features. They&#8217;re how analysts actually interact with dashboards, and they&#8217;re often the reason an existing BI dashboard feels &#8220;alive&#8221; instead of static.</p><p>In this post, I&#8217;ll walk through how to implement two familiar filter patterns from existing BI dashboards in Databricks AI/BI Dashboards, using the built-in <code>samples.tpch</code> dataset:</p><ol><li><p><strong>Context filters</strong> &#8594; implemented as parameters in dataset SQL</p></li><li><p><strong>&#8220;</strong><em><strong>Only Relevant Values</strong></em><strong>&#8221; or cascading filters</strong> &#8594; implemented with field filters and query-based parameters</p></li></ol><p>Row-level security and user-based filtering deserve their own deep dive, and action-style interactions (cross-filtering and drill-through) could easily fill another post, so I&#8217;ll cover those separately.</p><p>I&#8217;ve also published the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, so you can follow along and inspect the configurations yourself.</p><h3><strong>Quick primer: datasets and filters in Databricks AI/BI Dashboards</strong></h3><p>Before we map those patterns, it helps to align on a few AI/BI Dashboards concepts:</p><h4><strong>Datasets</strong></h4><p>In AI/BI Dashboards, each dashboard has a <em>Data</em> tab where you define one or more datasets:</p><ul><li><p>A dataset is defined by an SQL query, direct reference to a Unity Catalog table/view, or an uploaded file.</p></li><li><p>Multiple visualizations can reuse the same dataset.</p></li><li><p>Datasets are bundled with the dashboard when you share/export/import it.</p></li></ul><p>Practically, a dataset is your &#8220;model&#8221; for a set of visuals: one query, many charts.</p><h4><strong>Field filters vs parameter filters</strong></h4><p>AI/BI Dashboards support two core ways to filter data from a dashboard: <a href="https://docs.databricks.com/aws/en/dashboards/filters#should-i-filter-on-a-field-or-a-parameter">field filters and parameter filters</a>. Both are implemented as <strong>filter widgets</strong>, but they behave differently under the hood.</p><p><strong>Field filters</strong> are applied directly to dataset fields (columns) on top of the dataset query. Processing behaviour is defined by the <a href="https://docs.databricks.com/aws/en/dashboards/caching#dataset-optimizations">dataset performance thresholds</a>. Specifically, for small datasets (&#8804; 100K rows or &#8804; 100MB), results are pulled to the browser and visualization-specific filtering and aggregation are applied client-side. For larger datasets, Databricks wraps the dataset query in a <code>WITH</code> clause and applies the filter predicates and aggregations in Databricks SQL warehouse (DBSQL).</p><p><strong>Parameter filters </strong>are applied to parameters, which are variables that get substituted into your dataset SQL at runtime. When the parameter value changes, the query is always re-run in DBSQL.</p><p>In other words, field filters operate on the results of the dataset query, while parameter filters operate inside the dataset SQL itself.</p><p>To speed up processing, various <a href="https://docs.databricks.com/aws/en/dashboards/caching#caching-and-data-freshness">caching layers</a> in AI/BI Dashboards and DBSQL are used.</p><p>We&#8217;ll use parameter filters to emulate context filters, and field filters + query-based parameters to emulate &#8220;<em>Only Relevant Values</em>.&#8221;</p><h4><strong>Filter scope: global, page-level, and widget-level</strong></h4><p>Filters in AI/BI Dashboards also differ by <a href="https://docs.databricks.com/aws/en/dashboards/filters#filter-interactivity-and-scope">scope</a>:</p><ul><li><p><strong>Global filters</strong> are interactive filters in the global filters panel that apply across all pages of the dashboard to any visualization that shares the selected datasets.</p></li><li><p><strong>Page-level filters</strong> are interactive filter widgets placed on a specific page in the canvas. They apply to all visualizations on that page that share one or more datasets.</p></li><li><p><strong>Widget-level filters</strong> are static filters configured directly on a single visualization widget in its configuration panel. Authors set the values, and viewers can&#8217;t change them.</p></li></ul><p>With that foundation in place, we can now map these context filters and &#8220;<em>Only Relevant Values</em>&#8221; patterns into AI/BI Dashboards patterns.</p><h3><strong>Sample dataset: TPCH on Databricks</strong></h3><p>To keep examples concrete, we&#8217;ll use the TPCH sample data that ships with Databricks in the <code>samples.tpch</code> schema.</p><p>For the purposes of this post, you can start by creating a dataset that joins tables <code>region</code>, <code>nation</code>, <code>customer</code>, <code>orders</code>, and <code>lineitem</code>:</p><pre><code><code>SELECT
  r.r_name              AS region,
  n.n_name              AS nation,
  c.c_custkey           AS customer_id,
  c.c_name              AS customer_name,
  o.o_orderkey          AS order_id,
  o.o_orderdate         AS order_date,
  l.l_extendedprice * (1 - l.l_discount) AS revenue
FROM samples.tpch.region   AS r
JOIN samples.tpch.nation   AS n ON n.n_regionkey = r.r_regionkey
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
JOIN samples.tpch.orders   AS o ON o.o_custkey   = c.c_custkey
JOIN samples.tpch.lineitem AS l ON l.l_orderkey  = o.o_orderkey;</code></code></pre><p>In AI/BI Dashboards, you define this query as a dataset in the <em>Data</em> tab and then reuse it across multiple visualizations. Let&#8217;s call this dataset <em>TPCH Sales</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Li34!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Li34!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 424w, https://substackcdn.com/image/fetch/$s_!Li34!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 848w, https://substackcdn.com/image/fetch/$s_!Li34!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 1272w, https://substackcdn.com/image/fetch/$s_!Li34!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Li34!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png" width="1456" height="678" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:678,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Li34!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 424w, https://substackcdn.com/image/fetch/$s_!Li34!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 848w, https://substackcdn.com/image/fetch/$s_!Li34!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 1272w, https://substackcdn.com/image/fetch/$s_!Li34!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2aa1025-9a71-4c07-9bd2-7315bbc81448_1600x745.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;ll reuse this same dataset or its derivatives throughout the rest of the post to illustrate context filters and cascading filters.</p><h3><strong>Implementing context filters with parameters in dataset SQL</strong></h3><h4><strong>What a context filter does</strong></h4><p>A context filter defines a high-level subset of the data:</p><ul><li><p>The context filter is applied first, often materializing a temporary subset.</p></li><li><p>Other filters and some calculations are then evaluated on top of that subset.</p></li></ul><p>Context filters are used to:</p><ul><li><p>Improve performance by filtering early and shrinking the working set.</p></li><li><p>Enforce logical order, such as &#8220;<em>always filter by Region first</em>.&#8221;</p></li><li><p>Make other filters depend on that subset.</p></li></ul><h4><strong>How to think about context in AI/BI Dashboards</strong></h4><p>Given the primer:</p><ul><li><p><a href="https://docs.databricks.com/aws/en/dashboards/filters#should-i-filter-on-a-field-or-a-parameter">Field filters</a> operate on the results of the dataset query (Databricks wraps your dataset SQL and applies them on top).</p></li><li><p><a href="https://docs.databricks.com/aws/en/dashboards/filters#should-i-filter-on-a-field-or-a-parameter">Parameter filters</a> substitute values directly into your dataset SQL, so they filter inside the query, before joins and aggregations.</p></li></ul><p>If you want &#8220;context&#8221; behavior &#8211; <em>filter first, then apply everything else</em> &#8211; you should implement that filter as a <a href="https://docs.databricks.com/aws/en/dashboards/parameters">parameter</a> in the dataset SQL, driven by a parameter filter widget.</p><h4><strong>Pattern: treat the context as a base parameter</strong></h4><p>Let&#8217;s add a context filter for <em>Region</em>:</p><p>If you&#8217;re following along with the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, this setup lives on the &#8220;Context filter&#8221; page.</p><p><strong>Step 1</strong>. Define <em>TPCH Sales (Context)</em> with a <em>Region</em> parameter</p><p>Create a dataset <em>TPCH Sales (Context)</em>:</p><pre><code><code>SELECT
  r.r_name              AS region,
  n.n_name              AS nation,
  c.c_custkey           AS customer_id,
  c.c_name              AS customer_name,
  o.o_orderkey          AS order_id,
  o.o_orderdate         AS order_date,
  l.l_extendedprice * (1 - l.l_discount) AS revenue
FROM samples.tpch.region   AS r
JOIN samples.tpch.nation   AS n ON n.n_regionkey = r.r_regionkey
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
JOIN samples.tpch.orders   AS o ON o.o_custkey   = c.c_custkey
JOIN samples.tpch.lineitem AS l ON l.l_orderkey  = o.o_orderkey
WHERE r.r_name = :region_param      -- &#8220;context&#8221; filter</code></code></pre><p>In the dataset&#8217;s <em>Parameters</em> panel:</p><ul><li><p>Define <code>region_param</code> with type <em>String</em>.</p></li><li><p>Optionally set a default (for example, <em>AMERICA</em>) so the dataset runs without any dashboard filter.</p></li></ul><p>This makes <code>region_param</code> the context for all visuals that use <em>TPCH Sales (Context)</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Id06!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Id06!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 424w, https://substackcdn.com/image/fetch/$s_!Id06!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 848w, https://substackcdn.com/image/fetch/$s_!Id06!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 1272w, https://substackcdn.com/image/fetch/$s_!Id06!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Id06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png" width="1456" height="715" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:715,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Id06!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 424w, https://substackcdn.com/image/fetch/$s_!Id06!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 848w, https://substackcdn.com/image/fetch/$s_!Id06!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 1272w, https://substackcdn.com/image/fetch/$s_!Id06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce877d63-410b-4f3f-878b-aec639cfc9c9_1600x786.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Step 2</strong>. (Optional but nice) Create a helper dataset for <em>Region</em> values</p><p>You can drive <code>region_param</code> directly from <em>TPCH Sales (Context)</em>, but a tiny helper dataset keeps things tidy, convenient, and re-usable.</p><p>Create <em>TPCH Regions (Context)</em>:</p><pre><code><code>SELECT DISTINCT r_name AS region
FROM samples.tpch.region
ORDER BY region;</code></code></pre><p>This dataset has no parameters; it just returns the list of available regions.</p><p><strong>Step 3</strong>. Add a <em>Region</em> parameter filter widget</p><p>We will configure the widget as a page-level filter (alternatively, you can move it into the global filters panel if it should apply across pages).</p><p>On the page where you want <em>Region</em> as a context filter:</p><ol><li><p>Add a filter widget and title it <em>Region</em>.</p></li><li><p>Set the filter type to <em>Single value</em>.</p></li><li><p>Configure it as a parameter filter:</p><ul><li><p>Fields: <code>TPCH Regions (Context).region</code></p></li><li><p>Parameters: <code>TPCH Sales (Context).region_param</code></p></li></ul></li></ol><p>If you don&#8217;t want a helper dataset, you can instead use <code>TPCH Sales (Context).region</code> as the field source, but the wiring is otherwise identical.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oUxR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oUxR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 424w, https://substackcdn.com/image/fetch/$s_!oUxR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 848w, https://substackcdn.com/image/fetch/$s_!oUxR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 1272w, https://substackcdn.com/image/fetch/$s_!oUxR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oUxR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png" width="512" height="641" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!oUxR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 424w, https://substackcdn.com/image/fetch/$s_!oUxR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 848w, https://substackcdn.com/image/fetch/$s_!oUxR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 1272w, https://substackcdn.com/image/fetch/$s_!oUxR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4b617a8-188e-4c89-84cc-0ca1ed8c84dc_512x641.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Effect</strong></p><ul><li><p>When a viewer selects <em>Region: EUROPE</em> in the <em>Region</em> filter:</p><ul><li><p>The widget writes &#8220;<em>EUROPE</em>&#8220; into <code>region_param</code> for <em>TPCH Sales (Context)</em>.</p></li><li><p><em>TPCH Sales (Context)</em> reruns with <code>WHERE r.r_name = &#8216;EUROPE&#8217;</code>.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hjO5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hjO5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 424w, https://substackcdn.com/image/fetch/$s_!hjO5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 848w, https://substackcdn.com/image/fetch/$s_!hjO5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 1272w, https://substackcdn.com/image/fetch/$s_!hjO5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hjO5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png" width="1237" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1237,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!hjO5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 424w, https://substackcdn.com/image/fetch/$s_!hjO5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 848w, https://substackcdn.com/image/fetch/$s_!hjO5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 1272w, https://substackcdn.com/image/fetch/$s_!hjO5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10638dc8-f318-443c-ad39-31ed2b15c8ec_1237x626.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>All visuals built on <em>TPCH Sales (Context)</em> now use only European data as their starting point.</p></li><li><p>Any additional field filters (for example, Nation, Customer, Date) operate on this already-filtered subset, just like secondary filters evaluated after a context filter.</p></li><li><p>From the viewer&#8217;s perspective, <em>Region</em> behaves like a true context filter: it defines the base subset of data first, and everything else &#8211; other filters, cross-filtering, drill-through &#8211; is evaluated on top of that context.</p></li></ul><h3><strong>Implementing &#8220;</strong><em><strong>Only Relevant Values</strong></em><strong>&#8221; with cascading filters and query-based parameters</strong></h3><h4><strong>What &#8220;</strong><em><strong>Only Relevant Values</strong></em><strong>&#8221; behavior means</strong></h4><p>&#8220;<em>Only Relevant Values</em>&#8221; behavior on a filter shrinks the list of values based on the current state of other filters and the view:</p><ul><li><p>If you select <em>Region: ASIA</em>, the <em>Country</em> filter only shows countries that actually have data in <em>ASIA</em>.</p></li><li><p>As you add more filters, each filter&#8217;s domain is recomputed from the filtered dataset.</p></li></ul><p>Practically, this gives you cascading filters that stay in sync with each other and with the current slice of data.</p><h4><strong>How to think about &#8220;</strong><em><strong>Only Relevant Values</strong></em><strong>&#8221; in AI/BI Dashboards</strong></h4><p>In AI/BI Dashboards, you get the same effect in two ways:</p><ol><li><p><a href="https://docs.databricks.com/aws/en/dashboards/filters">Field filters</a> on the same dataset &#8211; AI/BI recomputes the value list based on the current filtered dataset.</p></li><li><p><a href="https://docs.databricks.com/aws/en/dashboards/parameters#query-based-parameters">Query-based parameters</a> &#8211; a specialized filter widget that both populates its values from a query, and writes the selected value into a parameter used in your dataset SQL.</p></li></ol><h4><strong>Pattern 1: Cascading filters with field filters</strong></h4><p>The simplest way to mimic &#8220;<em>Only Relevant Values</em>&#8221; is to use <a href="https://docs.databricks.com/aws/en/dashboards/filters">field filters</a> wired to the same dataset. AI/BI Dashboards will automatically recompute each filter&#8217;s value list based on the current filtered dataset.</p><p>We&#8217;ll build a <em>Region &#8594; Nation &#8594; Customer</em> cascade on top of <em>TPCH Sales (Cascading Pattern 1)</em>.</p><p>In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, this pattern is implemented on the &#8220;<em>Cascading filters with field filters</em>&#8221; page.</p><p><strong>Step 1</strong>. Define <em>TPCH Sales (Cascading Pattern 1)</em></p><p>Create a dataset <em>TPCH Sales (Cascading Pattern 1)</em> with the base TPCH join and a revenue metric:</p><pre><code><code>SELECT
  r.r_name              AS region,
  n.n_name              AS nation,
  c.c_custkey           AS customer_id,
  c.c_name              AS customer_name,
  o.o_orderkey          AS order_id,
  o.o_orderdate         AS order_date,
  l.l_extendedprice * (1 - l.l_discount) AS revenue
FROM samples.tpch.region   AS r
JOIN samples.tpch.nation   AS n ON n.n_regionkey = r.r_regionkey
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
JOIN samples.tpch.orders   AS o ON o.o_custkey   = c.c_custkey
JOIN samples.tpch.lineitem AS l ON l.l_orderkey  = o.o_orderkey;</code></code></pre><p>This dataset has no parameters; all filtering will be done with field filters on top of the query results.</p><p><strong>Step 2</strong>. Add <em>Region</em>, <em>Nation</em>, and <em>Customer</em> field filters</p><p>On the dashboard page where you want cascading behavior:</p><ol><li><p>Add three field filter widgets with titles <em>Region</em>, <em>Nation</em>, and <em>Customer</em>.</p></li><li><p>Configure each widget as a page-level filter (or move them into the global filters panel if you want them to apply across pages).</p></li><li><p>Connect the filters to the following fields from <em>TPCH Sales (Cascading Pattern 1)</em>:</p><ul><li><p><em>Region</em> &#8594; <code>region</code></p></li><li><p><em>Nation</em> &#8594; <code>nation</code></p></li><li><p><em>Customer</em> &#8594; <code>customer_id</code></p></li></ul></li></ol><p>No parameters are involved here &#8211; these are pure field filters on a single dataset.</p><p><strong>Effect</strong></p><ul><li><p>When a viewer selects <em>region: ASIA</em>, the <em>TPCH Sales (Cascading Pattern 1)</em> dataset is filtered to <em>ASIA</em> for all visuals on the page.</p></li><li><p>The <em>Nation</em> field filter&#8217;s value list is recomputed from that filtered dataset, so it only shows nations in <em>ASIA</em>.</p></li></ul><blockquote></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bTX3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bTX3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 424w, https://substackcdn.com/image/fetch/$s_!bTX3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 848w, https://substackcdn.com/image/fetch/$s_!bTX3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 1272w, https://substackcdn.com/image/fetch/$s_!bTX3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bTX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png" width="1456" height="532" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:532,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!bTX3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 424w, https://substackcdn.com/image/fetch/$s_!bTX3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 848w, https://substackcdn.com/image/fetch/$s_!bTX3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 1272w, https://substackcdn.com/image/fetch/$s_!bTX3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7936f7d-2ea4-4089-928b-d1002a4b757e_1600x585.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>After the viewer chooses a nation (e.g., <em>JAPAN</em>), the <em>Customer</em> field filter shrinks to show only customers in that nation.</p></li></ul><blockquote></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D7pG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D7pG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 424w, https://substackcdn.com/image/fetch/$s_!D7pG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 848w, https://substackcdn.com/image/fetch/$s_!D7pG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 1272w, https://substackcdn.com/image/fetch/$s_!D7pG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D7pG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png" width="1456" height="531" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:531,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!D7pG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 424w, https://substackcdn.com/image/fetch/$s_!D7pG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 848w, https://substackcdn.com/image/fetch/$s_!D7pG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 1272w, https://substackcdn.com/image/fetch/$s_!D7pG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3caeb4b3-84dc-4cc5-ae15-5add326ae0f6_1600x583.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>From the user&#8217;s perspective, these field filters behave like filters with an &#8220;<em>Only Relevant Values</em>&#8221; option enabled: each dropdown shows only values that exist in the currently filtered data. Under the hood, AI/BI Dashboards are simply applying field filters on top of a single dataset and recomputing the dropdown values from the currently filtered result set.</p></li></ul><h4><strong>Pattern 2: Cascading filters with query-based parameters</strong></h4><p>In Pattern 1, we used field filters only. In some cases you may want more control over how dropdown values are loaded, or you may want the same parameter to drive multiple datasets. In that case you can use <a href="https://docs.databricks.com/aws/en/dashboards/parameters#query-based-parameters">query-based parameters</a>. A query-based parameter filter widget gets its dropdown values from a field in a &#8220;choices&#8221; dataset, and writes the selected value into one or more parameters that are used in dataset SQL.</p><p>Here we&#8217;ll build a three-level cascade <em>Region &#8594; Nation &#8594; Customer</em> using:</p><ul><li><p>One main dataset: <em>TPCH Sales (Cascading Pattern 2)</em></p></li><li><p>Three small &#8220;value list&#8221; datasets:</p><ul><li><p><em>TPCH Regions (Cascading Pattern 2)</em></p></li><li><p><em>TPCH Nations by Region (Cascading Pattern 2)</em></p></li><li><p><em>TPCH Customers by Nation (Cascading Pattern 2)</em></p></li></ul></li></ul><p>In the <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a>, this pattern is implemented on the &#8220;<em>Cascading filters with query-based parameters</em>&#8221; page.</p><p><strong>Step 1</strong>. Define <em>TPCH Sales (Cascading Pattern 2)</em></p><p>Create the <em>TPCH Sales (Cascading Pattern 2)</em> dataset with parameters for <em>region</em>, <em>nation</em>, and <em>customer</em>:</p><pre><code><code>SELECT
 r.r_name              AS region,
 n.n_name              AS nation,
 c.c_custkey           AS customer_id,
 c.c_name              AS customer_name,
 o.o_orderkey          AS order_id,
 o.o_orderdate         AS order_date,
 l.l_extendedprice * (1 - l.l_discount) AS revenue
FROM samples.tpch.region   AS r
JOIN samples.tpch.nation   AS n ON n.n_regionkey = r.r_regionkey
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
JOIN samples.tpch.orders   AS o ON o.o_custkey   = c.c_custkey
JOIN samples.tpch.lineitem AS l ON l.l_orderkey  = o.o_orderkey
WHERE (:region_param   = 'All' OR r.r_name = :region_param)
  AND (:nation_param   = 'All' OR n.n_name = :nation_param)
  AND (:customer_param = 0     OR c.c_custkey  = :customer_param);</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ft3p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ft3p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 424w, https://substackcdn.com/image/fetch/$s_!ft3p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 848w, https://substackcdn.com/image/fetch/$s_!ft3p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 1272w, https://substackcdn.com/image/fetch/$s_!ft3p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ft3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png" width="1456" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ft3p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 424w, https://substackcdn.com/image/fetch/$s_!ft3p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 848w, https://substackcdn.com/image/fetch/$s_!ft3p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 1272w, https://substackcdn.com/image/fetch/$s_!ft3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3da122-46ff-4505-b268-de0d80f55758_1600x866.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the dataset&#8217;s <em>Parameters</em> panel:</p><ul><li><p>Set <code>region_param</code> type to <em>String</em>.</p></li><li><p>Set <code>nation_param</code> type to <em>String</em>.</p></li><li><p>Set <code>customer_param</code> type to <em>Numeric / Integer</em> (to match <code>c_custkey</code>).</p></li></ul><p>This last bit is important: the <em>Customer</em> filter uses a numeric field, so the parameter must be numeric as well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qzZ2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qzZ2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 424w, https://substackcdn.com/image/fetch/$s_!qzZ2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 848w, https://substackcdn.com/image/fetch/$s_!qzZ2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 1272w, https://substackcdn.com/image/fetch/$s_!qzZ2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qzZ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png" width="262" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:262,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qzZ2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 424w, https://substackcdn.com/image/fetch/$s_!qzZ2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 848w, https://substackcdn.com/image/fetch/$s_!qzZ2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 1272w, https://substackcdn.com/image/fetch/$s_!qzZ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96df94a4-c73f-4c97-b2bd-b8bb0ba8a63d_262x297.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Step 2</strong>. Create helper datasets for the dropdowns</p><p>1. <em>TPCH Regions (Cascading Pattern 2)</em> &#8211; list of regions:</p><pre><code><code>SELECT DISTINCT r_name AS region
FROM samples.tpch.region
ORDER BY region;</code></code></pre><p>2. <em>TPCH Nations by Region (Cascading Pattern 2)</em> &#8211; nations for the selected region:</p><pre><code><code>SELECT DISTINCT n.n_name AS nation
FROM samples.tpch.nation   AS n
JOIN samples.tpch.region   AS r ON n.n_regionkey = r.r_regionkey
WHERE r.r_name = :region_param
ORDER BY nation;</code></code></pre><p>This dataset defines its own <code>region_param</code> (<em>string</em>) in the <em>Data</em> tab.</p><p>3. <em>TPCH Customers by Nation (Cascading Pattern 2)</em> &#8211; customers for the selected nation:</p><pre><code><code>SELECT DISTINCT
  c.c_custkey AS customer_id
FROM samples.tpch.nation   AS n
JOIN samples.tpch.customer AS c ON c.c_nationkey = n.n_nationkey
WHERE n.n_name = :nation_param
ORDER BY customer_id;</code></code></pre><p>This dataset defines <code>nation_param</code> (<em>string</em>). <code>customer_id</code> is <em>numeric</em>, matching <code>customer_param</code> in <em>TPCH Sales (Cascading Pattern 2)</em>.</p><p>Run each dataset in the <em>Data</em> tab once to confirm they succeed.</p><p><strong>Step 3</strong>. Add <em>Region</em>, <em>Nation</em>, and <em>Customer</em> filter widgets</p><p>On your dashboard page, add three filter widgets and wire them to fields and parameters. Configure all three widgets as page-level filters (or move them into the global filters panel if they should apply across pages).</p><p>1. <em>Region filter widget</em></p><ul><li><p>Filter type: <em>Single value</em></p></li><li><p>Fields: <code>TPCH Regions (Cascading Pattern 2).region</code></p></li><li><p>Parameters:</p><ul><li><p><code>TPCH Sales (Cascading Pattern 2).region_param</code></p></li><li><p><code>TPCH Nations by Region (Cascading Pattern 2).region_param</code></p></li></ul></li><li><p>Default value: <code>All</code></p></li></ul><p>This keeps region_param in <em>TPCH Sales (Cascading Pattern 2)</em> and <em>TPCH Nations by Region (Cascading Pattern 2)</em> in sync.</p><p>2. <em>Nation filter widget</em></p><ul><li><p>Filter type: <em>Single value</em></p></li><li><p>Fields: <code>TPCH Nations by Region (Cascading Pattern 2).nation</code></p></li><li><p>Parameters:</p><ul><li><p><code>TPCH Sales (Cascading Pattern 2).nation_param</code></p></li><li><p><code>TPCH Customers by Nation (Cascading Pattern 2).nation_param</code></p></li></ul></li></ul><ul><li><p>Default value: <code>All</code></p></li></ul><p>This keeps nation_param in <em>TPCH Sales (Cascading Pattern 2)</em> and <em>TPCH Customers by Nation (Cascading Pattern 2)</em> in sync.</p><p>3. <em>Customer filter widget</em></p><ul><li><p>Filter type: <em>Single value</em></p></li><li><p>Fields: <code>TPCH Customers by Nation (Cascading Pattern 2).customer_id</code></p></li><li><p>Parameters: <code>TPCH Sales (Cascading Pattern 2).customer_param</code></p></li></ul><ul><li><p>Default value: <code>0</code></p></li></ul><p><strong>Effect</strong></p><ul><li><p>When a viewer selects <em>Region: AMERICA</em>:</p><ul><li><p>The <em>Region</em> widget writes &#8220;<em>AMERICA</em>&#8220; into <code>region_param</code> in <em>TPCH Sales (Cascading Pattern 2)</em> and <em>TPCH Nations by Region (Cascading Pattern 2)</em>.</p></li><li><p><em>TPCH Nations by Region (Cascading Pattern 2)</em> reruns and returns only nations in <em>AMERICA</em>, so the <em>Nation</em> dropdown only shows those nations.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jskP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jskP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 424w, https://substackcdn.com/image/fetch/$s_!jskP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 848w, https://substackcdn.com/image/fetch/$s_!jskP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 1272w, https://substackcdn.com/image/fetch/$s_!jskP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jskP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png" width="1456" height="531" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:531,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!jskP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 424w, https://substackcdn.com/image/fetch/$s_!jskP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 848w, https://substackcdn.com/image/fetch/$s_!jskP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 1272w, https://substackcdn.com/image/fetch/$s_!jskP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0107b4-e71a-47c4-aecf-e1dd374b6d3d_1600x584.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>When the viewer then selects <em>Nation: UNITED STATES</em>:</p><ul><li><p>The <em>Nation</em> widget writes &#8220;<em>UNITED STATES</em>&#8220; into <code>nation_param</code> in <em>TPCH Sales (Cascading Pattern 2)</em> and <em>TPCH Customers by Nation (Cascading Pattern 2)</em>.</p></li><li><p><em>TPCH Customers by Nation (Cascading Pattern 2)</em> reruns and returns only customers in <em>UNITED STATES</em>, so the <em>Customer</em> dropdown only shows those customer IDs.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0lIb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0lIb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 424w, https://substackcdn.com/image/fetch/$s_!0lIb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 848w, https://substackcdn.com/image/fetch/$s_!0lIb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 1272w, https://substackcdn.com/image/fetch/$s_!0lIb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0lIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png" width="1456" height="531" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d23de116-d555-497e-9eaf-8375f56465de_1600x583.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:531,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0lIb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 424w, https://substackcdn.com/image/fetch/$s_!0lIb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 848w, https://substackcdn.com/image/fetch/$s_!0lIb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 1272w, https://substackcdn.com/image/fetch/$s_!0lIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd23de116-d555-497e-9eaf-8375f56465de_1600x583.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>When the viewer selects a specific <em>Customer</em> (for example, <em>607</em>):</p><ul><li><p>The <em>Customer</em> widget writes <em>607</em> into <code>customer_param</code> in <em>TPCH Sales (Cascading Pattern 2)</em>.</p></li><li><p><em>TPCH Sales (Cascading Pattern 2)</em> reruns with <code>region_param</code>, <code>nation_param</code>, and <code>customer_param</code> applied, and all visuals built on this dataset show only orders for customer <em>607</em> in <em>UNITED STATES / AMERICA</em>.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gv_s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gv_s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 424w, https://substackcdn.com/image/fetch/$s_!Gv_s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 848w, https://substackcdn.com/image/fetch/$s_!Gv_s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 1272w, https://substackcdn.com/image/fetch/$s_!Gv_s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gv_s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png" width="1456" height="559" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Gv_s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 424w, https://substackcdn.com/image/fetch/$s_!Gv_s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 848w, https://substackcdn.com/image/fetch/$s_!Gv_s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 1272w, https://substackcdn.com/image/fetch/$s_!Gv_s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e03bea-826e-4d5a-9022-ad7ec61f30d2_1600x614.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>From the viewer&#8217;s perspective, <em>Region &#8594; Nation &#8594; Customer</em> behaves like cascading filters with &#8220;<em>Only Relevant Values</em>&#8221; behavior enabled. Under the hood, each dropdown is a query-based parameter filter, and the <em>Region</em> and <em>Nation</em> widgets keep parameters in multiple datasets in sync, while Customer filters the main <em>TPCH Sales (Cascading Pattern 2)</em> dataset down to a single customer.</p></li></ul><h4><strong>Which pattern when?</strong></h4><p>Both patterns get you &#8220;<em>Only Relevant Values</em>&#8221;-style cascading behavior, but they shine in different situations.</p><p><strong>Pattern 1 &#8211; Cascading field filters</strong></p><p>Use this when:</p><ul><li><p>You&#8217;re working off one main dataset per page.</p></li><li><p>You want the simplest authoring experience: add field filters, connect them to the dataset, done.</p></li><li><p>&#8220;<em>Allow All</em>&#8221; and easy clearing of filters are important to your users.</p></li></ul><p>This is the closest to what many BI tools do by default and is usually the right starting point.</p><p><strong>Pattern 2 &#8211; Cascading query-based parameters</strong></p><p>Use this when:</p><ul><li><p>You need parameters that drive multiple datasets.</p></li><li><p>You want tighter control over dropdown values, including custom queries per level.</p></li><li><p>You&#8217;re comfortable managing parameter types and wiring filters to multiple datasets.</p></li></ul><p>Pattern 2 is more flexible and explicit, but also more advanced. In practice, I start with Pattern 1 for most dashboards, and reach for Pattern 2 when I need parameter-driven logic or want to reuse the same parameters across several datasets and pages.</p><h3><strong>Summary</strong></h3><p>In this post, we looked at how to carry two of the most important filter patterns from traditional BI dashboards into Databricks AI/BI Dashboards:</p><ul><li><p><strong>Context filters</strong> become parameters in your dataset SQL, driven by parameter filter widgets. This lets you enforce &#8220;filter by Region first&#8221; semantics and shrink the working set before joins and aggregations.</p></li><li><p><strong>&#8220;Only Relevant Values&#8221; / cascading filters</strong> can be implemented either with simple field filters on a single dataset (Pattern 1) or with query-based parameters and helper datasets (Pattern 2) when you need more control and reusable parameters.</p></li></ul><p>The <a href="https://github.com/ArtemChebotko/Migrating-Existing-Dashboards-to-Databricks-AI-BI">companion dashboard</a> includes all three examples: a context filter page, a field-based cascading page, and a query-based cascading page. You can import it into your workspace and adapt the patterns to your own datasets.</p><p>In future posts, I plan to cover:</p><ul><li><p>Row-level security and user-based filtering in AI/BI Dashboards</p></li><li><p>Action-style interactions such as cross-filtering and drill-through in AI/BI Dashboards</p></li></ul><p>If you&#8217;re starting a migration from an existing BI tool to Databricks AI/BI today, I recommend:</p><ol><li><p>Identify your key context filters (Region, Business Unit, etc.) and implement them as parameters in dataset SQL.</p></li><li><p>Start with Pattern 1 (field filters) for cascading behavior, and only move to Pattern 2 where you truly need parameter-driven logic or shared parameters across datasets.</p></li></ol><p>These two patterns alone are usually enough to make an AI/BI dashboard feel as interactive and &#8220;alive&#8221; as the dashboards your teams are used to today.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Liquid Clustering at Scale: Overcoming Challenges and Unlocking Performance]]></title><description><![CDATA[This piece shares the story behind migrating to Liquid Clustering: the architectural pain points that forced the decision, the challenges along the way, and the hands-on solutions that made it work at scale.]]></description><link>https://www.databricksters.com/p/liquid-clustering-at-scale-overcoming</link><guid isPermaLink="false">https://www.databricksters.com/p/liquid-clustering-at-scale-overcoming</guid><dc:creator><![CDATA[Geethu]]></dc:creator><pubDate>Tue, 27 Jan 2026 12:10:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f0616724-431f-4575-9740-1ef6469eaa68_700x405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As data volumes grow and access patterns become more demanding, traditional data layouts can quickly become a bottleneck. This post walks through a real-world migration to Liquid Clustering, focusing on the architectural limitations that triggered the change, the challenges encountered during the transition, and the practical fixes that made the migration successful at scale.</p><p>The goal was simple but demanding: near-real-time data availability and consistently fast query performance across large time ranges, even in the presence of late-arriving data and massive daily ingestion volumes.</p><h2><strong>Why Traditional Partitioning Fell Short and How Liquid Clustering Solves It</strong></h2><p>The original architecture relied on continuous streaming ingestion into Bronze tables from Kafka, followed by scheduled batch jobs that populated optimized Silver tables.Bronze tables were partitioned by date, while Silver tables were partitioned and z-ordered by relevant keys. When everything arrived on time and tables were fully optimized, query performance was excellent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!otlB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!otlB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 424w, https://substackcdn.com/image/fetch/$s_!otlB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 848w, https://substackcdn.com/image/fetch/$s_!otlB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 1272w, https://substackcdn.com/image/fetch/$s_!otlB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!otlB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png" width="627" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:627,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67322,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/185583181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!otlB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 424w, https://substackcdn.com/image/fetch/$s_!otlB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 848w, https://substackcdn.com/image/fetch/$s_!otlB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 1272w, https://substackcdn.com/image/fetch/$s_!otlB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa3a545a-e73d-4d1f-aee8-d8d81ee41fad_627x379.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The problem began with late-arriving data.</p><p>Some events arrived days&#8212;even weeks&#8212;after their original event time. These records landed in partitions that had already been optimized, slowly reintroducing small, unoptimized files into previously clean partitions. Over time:</p><ul><li><p>File counts ballooned</p></li><li><p>Queries that once ran in ~10 seconds stretched to a minute or more</p></li><li><p>Performance degraded steadily as more late data accumulated</p></li></ul><p>The only way to recover performance was to re-optimize entire partitions repeatedly, which became increasingly expensive and time-consuming at scale. Rigid partition boundaries simply did not work well with unpredictable arrival patterns.</p><p>This is where Liquid Clustering became a natural fit. Instead of relying on static partitions, Liquid Clustering incrementally maintains data layout quality as data arrives. It continuously rebalances files based on clustering keys, reducing the need for repeated full rewrites and making late-arriving data far less disruptive.</p><p>Liquid Clustering addresses these issues with a multi-dimensional, incremental clustering strategy. It removes rigid partition boundaries, continuously reorganizes poorly clustered segments, and supports both eager clustering (during ingestion) and lazy clustering (post-ingestion). Using a tree-based multi-column clustering, it improves data skipping and maintains predictable, low-latency query performance even on large, late-arriving datasets.</p><h2><strong>Where Liquid Met Production Reality: Scaling Challenges and Fixes</strong></h2><p>While Liquid Clustering addressed the core architectural issue, the migration itself surfaced a new set of challenges primarily driven by scale.</p><h3><strong>High-Throughput Ingestion and Backfills</strong></h3><p>One of the first challenges during the migration to Liquid Clustering was handling the existing historical data. To adopt the new layout strategy, tens of terabytes of data per day over several months had to be backfilled and reorganized. This was challenging because the platform also had to continue processing new streaming data and maintain Silver tables for analysts. Running backfill and OPTIMIZE jobs simultaneously put the system under extreme load, pushing cluster resources to their limits.</p><h3><strong>Long-Running OPTIMIZE Jobs After Enabling Liquid</strong></h3><p>After enabling Liquid Clustering, OPTIMIZE runtimes increased noticeably&#8212;not because of Liquid alone, but due to the scale at which it was introduced. Large historical backfills were running alongside ongoing ingestion, forcing OPTIMIZE to process very large data volumes under heavy skew. Certain clustering stages ended up handling disproportionate amounts of data, resulting in large shuffles, disk spills, and increasingly unpredictable runtimes.</p><p>To address this, eager clustering was enabled for streaming writes, moving a portion of the clustering work into ingestion. This reduced the amount of reorganization required during OPTIMIZE and helped stabilize optimization runtimes, especially once batch sizes were increased and clustering work was better distributed across the cluster.</p><p>In addition, Liquid-specific tuning played a critical role in improving OPTIMIZE stability at scale. Key adjustments included:</p><ul><li><p><strong>Enhanced data skipping</strong> was enabled to reduce unnecessary data movement during clustering, significantly lowering shuffle volume and improving OPTIMIZE efficiency.</p></li><li><p><strong>Increased clustering parallelism</strong> was also configured to distribute clustering work more evenly across the cluster, reducing skew and stabilizing runtimes for large and wide tables.</p></li></ul><p>For workload-specific tuning and the exact configuration details, we recommend reaching out to Databricks, as the optimal settings can vary based on data volume and cluster characteristics.</p><p>To further reduce reliance on manual OPTIMIZE jobs, we also leveraged Predictive Optimization (PO) to automatically manage optimization workloads&#8212;this topic is discussed in more detail in the upcoming section.</p><h3><strong>Eager Clustering at Scale: Small File Challenges</strong></h3><p>While eager clustering reduced the amount of work during OPTIMIZE, it introduced a new challenge when streaming batches were too small. This was especially noticeable during large historical backfills, where each batch was around 40 GB. Although each batch was locally clustered, the small size meant that OPTIMIZE still had to rewrite many small files, resulting in high write amplification, longer runtimes, and increased operational overhead.</p><h4><strong>Solution: Batch Size as a Critical Lever</strong></h4><p>One of the most important lessons from this migration was how sensitive eager clustering is to batch size, particularly for backfills. Increasing batch sizes to larger, more meaningful units&#8212;around 1 TB per batch&#8212;changed the behavior dramatically (particularly during petabyte-scale backfills) . Larger batches allowed eager clustering to produce larger, better-clustered files upfront, which significantly reduced or even eliminated downstream OPTIMIZE work. This not only shortened OPTIMIZE runtimes but also lowered overall operational overhead by triggering fewer jobs and improving system stability during high-volume backfill processing.</p><h3><strong>Infrastructure Constraints</strong></h3><p>Another challenge emerged from the existing cluster configuration. At petabyte-scale, OPTIMIZE planning occasionally failed due to driver disk exhaustion, caused by large Spark event logs generated during complex optimization planning. The original cluster setup, designed for partitioned tables, was no longer sufficient to handle the heavy resource demands of Liquid Clustering and large backfills.</p><h4><strong>Solution: Reducing Event Log Volume</strong></h4><p>The issue was mitigated by tuning Spark to limit event log growth during OPTIMIZE planning. This significantly reduced the size of driver-side logs, preventing disk exhaustion and allowing large optimization jobs to complete reliably even while ingestion workloads continued to run.</p><p>For workload-specific tuning and exact configuration details, we recommend reaching out to Databricks.</p><h3><strong>Cluster and Runtime Constraints</strong></h3><p>The existing cluster configuration was no longer sufficient to handle simultaneous high-volume ingestion and large-scale OPTIMIZE jobs. Under petabyte-scale workloads, resource contention could slow down processing and introduce instability.</p><h4><strong>Solution: Right-Sizing Compute and Updating Runtime</strong></h4><p>To address this, cluster capacity was increased where needed, reducing reliance on spot instances and favoring on-demand workers for stability during long-running OPTIMIZE operations. In addition, the latest DBR 17.3 runtime was adopted for all new Liquid tables, leveraging improvements that enhanced performance, stability, and optimization efficiency at scale.</p><p>Cluster capacity was adjusted to reduce reliance on spot instances for long-running OPTIMIZE jobs, favoring on-demand workers where stability mattered most.</p><h2><strong>Predictive Optimization: Powerful, but Needs Guardrails</strong></h2><p>Once we stabilized OPTIMIZE runtimes and tuned eager clustering, the next focus was on reducing reliance on manual optimization jobs. For this, we leveraged Predictive Optimization , which automatically determines when and how to run clustering operations based on table state and data layout.</p><p>Concurrent manual OPTIMIZE jobs sometimes conflicted with PO runs in a few cases , causing transaction failures. Limited observability made it difficult to track PO activity and determine how much data remained unoptimized, which could lead to degraded query performance.</p><p>To address this, we implemented several best practices:</p><ul><li><p>Monitor PO activity using system.storage.predictive_optimization_history to track execution and outcomes.</p></li><li><p>Fallback manual OPTIMIZE jobs are run whenever PO does not execute successfully.</p><p></p></li></ul><h2><strong>Results: Performance, Freshness, and Simpler Operations</strong></h2><p>The migration to Liquid Clustering delivered clear, measurable improvements:</p><ul><li><p>Query performance improved dramatically&#8212;around 50% faster for most queries and up to 90% faster for long-range scans</p></li><li><p>File counts were cut roughly in half, reducing I/O overhead</p></li><li><p>Data freshness improved from hours to minutes, enabling near-real-time analytics</p></li><li><p>Operational overhead dropped significantly with UC-managed tables and automated optimization</p></li><li><p>Legacy partitioning was eliminated, reducing technical debt and modernizing the architecture<br></p></li></ul><h2><strong>Final Thoughts</strong></h2><p>Liquid Clustering fundamentally changes how data layout is managed at scale. By moving away from rigid partitions and embracing incremental, adaptive clustering, it becomes possible to handle late-arriving data, massive ingestion volumes, and evolving schemas without sacrificing performance or driving up costs.</p><p>The key is understanding the operational nuances: batch sizing, clustering strategy, Liquid-specific tuning, cluster configuration, and optimization automation. When those pieces come together, Liquid Clustering can unlock faster queries, fresher data, and a far more resilient data platform.</p><p>For readers interested in diving deeper:</p><ul><li><p><a href="https://www.databricks.com/blog/arctic-wolfs-liquid-clustering-architecture-tuned-petabyte-scale">Arctic Wolf&#8217;s Liquid Clustering Architecture Tuned for Petabyte Scale &#8211; Databricks Blog</a></p></li><li><p><a href="https://open.substack.com/pub/canadiandataguy/p/optimizing-delta-lake-tables-liquid?utm_campaign=post-expanded-share&amp;utm_medium=web">Optimizing Delta Lake Tables with Liquid Clustering &#8211; Canadian Data Guy</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Building Useful AI Agents with Agent Bricks]]></title><description><![CDATA[From docs to dependable answers: building a Databricks Knowledge Assistant with Agent Bricks]]></description><link>https://www.databricksters.com/p/building-useful-ai-agents-with-agent</link><guid isPermaLink="false">https://www.databricksters.com/p/building-useful-ai-agents-with-agent</guid><dc:creator><![CDATA[Canadian Data Guy]]></dc:creator><pubDate>Tue, 20 Jan 2026 18:03:52 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/184467797/a066162f3b6fbab937d9f2d57fac4003.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h2>What is Agent Bricks? What is the Knowledge Assistant?</h2><p>The Knowledge Assistant brick is one of several specialized agent types available in Databricks Agent Bricks, which is a low-code solution to help developers easily create and optimize domain specific agents with a few clicks.</p><p>The Knowledge Assistant brick is a no-code RAG agent that delivers reliable responses with citations. Instead of relying solely on what a language model learned during training, RAG returns relevant information from your documents in real time and is used to generate responses. The Knowledge Assistant brick uses optimizations out of the box, so you do not need to worry about optimizing the index or LLM. In our demo, we built a Knowledge Assistant for a team working with Databricks. The agent uses blog posts (written by our team, here on Databricksters) to answer questions about Spark streaming, coding practices, etc.</p><p>This same approach can work for any scenario where you have custom knowledge and want to create &#8220;general intelligence&#8221; on your data. Think of unstructured data that you already have-- HR policies, production information, or technical docs.</p><h4>Prerequisites</h4><p>Before building your own brick, ensure your workspace meets these requirements:</p><ul><li><p>Mosaic AI Agent Bricks Preview (Beta) enabled</p></li><li><p>Production monitoring for MLflow (Beta) enabled</p></li><li><p>Serverless compute enabled</p></li><li><p>Unity Catalog enabled</p></li><li><p>Access to Mosaic AI model serving</p></li><li><p>Access to foundation models in Unity Catalog through <a href="http://system.ai">system.ai</a> schema</p></li><li><p>Serverless budget policy with non-zero budget</p></li><li><p>Workspace in a supported region</p></li></ul><p><strong>Data Requirements</strong></p><p>Make sure you have one of the following:</p><ul><li><p>Files in a Unity Catalog volume (supported formats: txt, pdf, md, ppt, docx)</p></li><li><p>A vector search index</p><ul><li><p>The databricks-gte-large-en embedding model endpoint must have AI guardrails and rate limits disabled.</p></li></ul></li></ul><h2>Adding feedback and improving Agent Quality through feedback</h2><p>You can add feedback and assessments to each trace. This is where the human feedback begins-- you can start noting which responses are good, what needs improvement, etc. These assessments become a part of the MLflow experiment associated with the Agent Brick. You can aggregate this information using the MLflow API.</p><p>In my opinion, this is where Agent Bricks really shines. The Knowledge Assistant can automatically update itself according to the human feedback using a process called &#8216;Agent Learning from Human Feedback&#8217;. <em><a href="https://www.databricks.com/blog/agent-learning-human-feedback-alhf-databricks-knowledge-assistant-case-study">Learn more about this here. </a></em> This allows the agent to continuously improve based on expert expectations -- without you needing to retrain models or fiddle with prompts manually.</p><h3>The feedback loop</h3><p>The process is straightforward: (1) create your challenging questions, (2) have SMEs review the agent responses and add comments, and then (3) sync the feedback. &#8220;Syncing&#8221; will start the process to improve the agent based on the feedback.</p><p>How should you approach writing questions? Similar to creating an evaluation set, you should include a variety of questions that are critical for the use-case. Based on previous evaluations, you can include edge-case questions as well to see what SMEs want to see. Remember: you are only providing the questions, not the answers. The agent will generate the answers, and then the SMEs will provide the feedback on the answers.</p><p>Once you have your questions added, you can start a Labeling Session. Once that session is ready, share that link with SMEs and wait for the feedback to roll in. SMEs can add feedback on tone, style, or accuracy of that agent response. Make sure you communicate how to add feedback to your SMEs!</p><p>After ending a Labeling Session (after SMEs have added their feedback), you can start syncing the responses. Agent Bricks uses many techniques to sync the feedback with the agent. Congratulations! You have completed the first human feedback loop!</p>]]></content:encoded></item><item><title><![CDATA[Databricks Zerobus Ingest — The Best Bus Is No Bus]]></title><description><![CDATA[You don't need a sledgehammer to hang a picture]]></description><link>https://www.databricksters.com/p/databricks-zerobus-the-best-bus-is</link><guid isPermaLink="false">https://www.databricksters.com/p/databricks-zerobus-the-best-bus-is</guid><dc:creator><![CDATA[Yashodhan]]></dc:creator><pubDate>Tue, 30 Dec 2025 16:02:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!riic!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="pullquote"><h2>The Problem</h2></div><h3>         Complex Ingestion Architecture</h3><p>Today&#8217;s data teams face a common challenge: streaming data from applications to their Lakehouse requires maintaining complex infrastructure. The typical setup involves managing a message bus like Kafka, configuring connectors, monitoring pipelines, and dealing with significant operational overhead and costs&#8212;all just to move data from point A to point B.</p><h3>        Managed Bus Don&#8217;t Solve Everything</h3><p>While <strong>Amazon Managed Streaming for Apache Kafka(Amazon MSK) </strong>removes the burden of managing servers, it doesn&#8217;t eliminate your responsibility for the message bus itself. You&#8217;re still on the hook for capacity planning, topic and partition design, producer and consumer tuning, monitoring and alerting, and upgrade timing. Managed services make these tasks less manual, but upgrades remain risky and the operational complexity persists.</p><p>Cost management is another pain point. <strong>Amazon Managed Streaming for Apache Kafka(Amazon MSK)</strong> bills can balloon quickly due to over-provisioned brokers, excess partitions, high replication factors, and long retention periods. AWS manages the infrastructure, but not your spending discipline&#8212;that&#8217;s still your problem.</p><h3>             What You Actually Need</h3><p>A fully abstracted streaming service that eliminates cluster management entirely, letting you focus on building data products instead of babysitting message bus infrastructure.</p><div class="pullquote"><h2>Customer Story</h2><h6>  <em>A leading automotive startup</em></h6></div><h3>                 The Challenge</h3><p>A rapidly growing automotive startup was processing massive device data volumes from Go applications. After essential first-level processing, they needed to stream data to their data lake for near real-time analytics. To avoid the complexity of managing Kafka or similar message bus infrastructure, they took a shortcut: direct writes to their data warehouse with append-only inserts.</p><p>Initially simple, this approach quickly hit walls. As volumes grew, they vertically scaled, then horizontally distributed producers across multiple warehouse instances. Small but relentless queries created network bottlenecks. Excessive delta commits from numerous producers killed throughput. They hit soft limits on connections and write operations. To keep data flowing, they over-provisioned compute&#8212;watching costs balloon without proportional gains.</p><h3>                                                               The Solution</h3><p>Zerobus Ingest provided the purpose-built ingestion layer they needed. Their Go applications integrated the SDK with minimal code changes&#8212;same append-only pattern, properly architected. The Write-Ahead Logging (WAL) based system handled buffering and batching, while automatic recovery managed network issues that previously caused data loss.</p><p>Data now lands directly in Delta tables, eliminating the warehouse intermediary. The result: lower latency, higher throughput, dramatically reduced costs, and one less system to manage. They got the simplicity of direct writes with the scalability of proper streaming&#8212;at a fraction of Kafka&#8217;s cost.</p><div class="pullquote"><h2><em>Zerobus Ingest</em></h2><h6><em>Simplifying Real-Time Data Ingestion</em></h6></div><h3>                                                                 Overview</h3><p>Zerobus Ingest is a fully managed, zero-configuration service that enables record-by-record data ingestion directly into Delta tables. No more intermediate message buses. No more complex configurations. Just point your application at an endpoint and start sending data. The Zerobus Ingest API buffers transmitted data before adding it to a Delta table. This buffering creates an efficient and durable ingestion mechanism that supports a high volume of clients with variable throughput.</p><p><em>Before:</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eimt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eimt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 424w, https://substackcdn.com/image/fetch/$s_!eimt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 848w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1272w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eimt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png" width="1452" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0668090b-f758-4033-855b-49012749eceb_1452x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:1452,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:190418,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/182907350?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!eimt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 424w, https://substackcdn.com/image/fetch/$s_!eimt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 848w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1272w, https://substackcdn.com/image/fetch/$s_!eimt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0668090b-f758-4033-855b-49012749eceb_1452x552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>After:</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!riic!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!riic!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 424w, https://substackcdn.com/image/fetch/$s_!riic!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 848w, https://substackcdn.com/image/fetch/$s_!riic!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1272w, https://substackcdn.com/image/fetch/$s_!riic!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!riic!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png" width="1450" height="554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166939,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/182907350?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!riic!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 424w, https://substackcdn.com/image/fetch/$s_!riic!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 848w, https://substackcdn.com/image/fetch/$s_!riic!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1272w, https://substackcdn.com/image/fetch/$s_!riic!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a609ac-77e2-4b14-bc73-c6075b83dc3a_1450x554.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>                                                                 Features </h3><p>Zerobus Ingest leverages a Write Ahead Log (WAL) architecture that enables it to store and acknowledge accepted records quickly, delivering low write latency for your applications. The system is backed by persistent disk storage where both the Write Ahead Log (WAL) and checkpoints are maintained, enabling several powerful capabilities:</p><p><strong>Automatic Recovery</strong> - Network issues are handled transparently by the SDK. It automatically reconnects on transient failures and resends unacknowledged records without requiring any application-level error handling code.</p><p><strong>Efficient Resource Management</strong> - Once data syncs successfully to Delta tables, Zerobus Ingest automatically cleans up Write Ahead Log (WAL) logs and metadata, freeing disk space for new data without manual intervention.</p><p><strong>Schema Management</strong> - Automatic validation against your Delta table schema catches data quality issues at ingestion time, preventing malformed data from entering your Lakehouse.</p><p></p><h3>                                                            Performance benchmark</h3><p>Maximum throughput can be achieved when a client app and endpoint are in the same geo region. </p><ul><li><p>100MB/second per stream (benchmarked with 1KB-sized messages)</p></li><li><p>15,000 rows per second per stream</p></li></ul><div class="pullquote"><h2>Usage</h2><h6><em>Implement Zerobus</em></h6></div><h3>                     SDKs </h3><p>Users will interact with Zerobus Ingest through a dedicated SDK for their language of choice. The documentation and samples are out for <a href="https://github.com/databricks/zerobus-sdk-py">Python SDK</a>, <a href="https://github.com/databricks/zerobus-sdk-rs">Rust SDK</a> and <a href="https://github.com/databricks/zerobus-sdk-java">Java SDK </a>. Both the <a href="https://github.com/databricks/zerobus-sdk-go">Go</a> and <a href="https://www.npmjs.com/package/@databricks/zerobus-ingest-sdk">TypeScript</a> SDKs for Zerobus Ingest are now publicly available. GRPC is the main communication mechanism for Zerobus Ingest. </p><p>Databricks documentation contains a <a href="https://docs.databricks.com/aws/en/ingestion/zerobus-ingest">well documented guide</a> with sample clients in multiple languages. It guides you right from installing the SDK in your preferred language to creating a Protobuf definition and a sample usage.</p><h3>                                                              Supported Formats</h3><ul><li><p><strong>Protocols</strong>: gRPC (primary), HTTP REST, Kafka wire format (coming soon)</p></li><li><p><strong>Data Formats</strong>: Protocol Buffers, JSON </p></li></ul><h3>                                                                    TIPs</h3><ul><li><p>Visit the table history on UC to get a sense of how frequently the table is updated </p></li><li><p>Handle the two exceptions gracefully <em><strong>NonRetriableException, ZerobusException</strong></em>. </p></li><li><p>Even though Zerobus Ingest periodically issues data file compactions, so you don&#8217;t need to worry about the small files</p></li><li><p>Don&#8217;t forget to create a table with appropriate data types before you run the client </p></li></ul><div class="pullquote"><h2>  Zerobus Ingest Deep Dive</h2><h6><em>       While the experience is simple, the engineering is sophisticated!</em></h6></div><h3>                                                                 Components</h3><ol><li><p><strong>Zerobus Ingest Server - </strong>Think of them as scalable stateful pod on K8s attached with an SSD disk(high IOPS). Its responsibilities include:</p><ul><li><p>Schema validation of the message to the table.</p></li><li><p>Materializing the data in a timely manner to the target table.</p></li><li><p>Sending an acknowledgement to the client that the data is durable.</p></li></ul></li><li><p><strong>Smart Networking and orchestration - </strong>API proxy which distributes the streams to Zerobus Ingest servers per the target delta table and scales pods as the utilization nears the roof  </p></li><li><p><strong>Delta kernel -</strong> Record batch writer kicks off every 1-5 seconds, uses Delta kernel(uses Arrow) and writes the record batch to the delta table. Kicks in the PO compaction to avoid small files.  <a href="https://github.com/delta-io/delta-kernel-rs">Rust APIs</a>  hides all the complex details of the Delta protocol specification.  Binding available for <a href="https://github.com/delta-io/delta-rs">python</a>. </p></li><li><p><strong>Write-Ahead Log(WAL) -</strong> Records are immediately persisted to durable storage(think SSD disks with high IOPS)  provided by the cloud platform your databricks is running on and is acknowledged in under 50ms. This guarantees durability even if something fails</p></li></ol><h3>                                                         Unofficial Zerobus Ingest overview </h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oIOd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oIOd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 424w, https://substackcdn.com/image/fetch/$s_!oIOd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 848w, https://substackcdn.com/image/fetch/$s_!oIOd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 1272w, https://substackcdn.com/image/fetch/$s_!oIOd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oIOd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png" width="891" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/551467b9-e726-487b-88ba-cc18f317d48c_891x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:891,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83702,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/182907350?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oIOd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 424w, https://substackcdn.com/image/fetch/$s_!oIOd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 848w, https://substackcdn.com/image/fetch/$s_!oIOd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 1272w, https://substackcdn.com/image/fetch/$s_!oIOd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551467b9-e726-487b-88ba-cc18f317d48c_891x512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2></h2><div class="pullquote"><h2>The Honest Part</h2><h6><em>Zerobus Ingest - Does not always replace the message bus</em></h6></div><p>Zerobus Ingest fits direct lakehouse writes with durable acknowledgments (no bus-style retention/multi-consumer). Zerobus Ingest it not a replacement of message bus in all scenarios. If you need message bus durability/retention or multiple subscribers, Event Hubs/Kafka is likely a safer choice.          </p><h3>                                                               Availability</h3><p>Databricks only support single availability zone (single AZ) durability. This means Zerobus Ingest service may experience downtime. This might change soon.</p><h3>                                                           <strong>Kafka still wins when</strong></h3><p>Despite the cost advantages of Zerobus Ingest Ingest, Kafka remains a better choice in following scenarios:</p><p><strong>Exactly-once semantics requirements</strong> - For financial transactions, order processing, or other workflows where duplicate processing could cause serious issues, Kafka&#8217;s exactly-once delivery guarantees are critical. While Zerobus Ingest roadmap includes this feature, organizations that need it today must still rely on Kafka.</p><p><strong>Ultra-low latency fan-out</strong> - If your use case requires multiple consumers reading the same stream with different processing logic, Kafka&#8217;s pub-sub model excels. Zerobus Ingest currently lacks the subscriber/consumer model that makes Kafka so powerful for fan-out patterns where one stream feeds multiple downstream applications.</p><h3>                                                              Other Limitations </h3><p>As of writing,</p><ul><li><p>Zerobus Ingest provides <strong>at-least-once delivery semantics</strong>, meaning each message will be delivered one or more times. It does not yet support <strong>exactly-once</strong> semantics. However, the duplicates can be handled using other Databricks and delta features.</p></li><li><p>Zerobus Ingest currently supports <strong>writing only to managed</strong> Delta tables</p></li><li><p><strong>Schema evolution</strong> on target tables is <strong>not yet supported</strong> in Zerobus Ingest, so the table schema must match the incoming message structure.</p></li><li><p>Each individual message is limited to a <strong>maximum size of 10 MB</strong> when processed through Zerobus Ingest.</p></li></ul><h3>                                                                What&#8217;s Next</h3><p>Databricks is actively enhancing Zerobus Ingest with several key features in development. The roadmap includes <strong>exactly-once delivery semantics</strong> for stronger consistency guarantees, <strong>MQTT</strong> protocol support to broaden IoT and device connectivity options,  comprehensive <strong>CDC pipeline capabilities</strong> that will handle updates and deletes in addition to inserts and <strong>subscriber/consumer</strong> model to enable more flexible data consumption patterns.</p><p>Enjoy streaming in a cost efficient and simplified manner!</p><div class="pullquote"><h2>Conclusion</h2></div><p>Zerobus Ingest offers a compelling alternative to message bus in a lot of scenarios. While Kafka remains essential for complex streaming architectures, Zerobus Ingest closes the gap for straightforward ingestion use cases&#8212;delivering the reliability you need at a fraction of the cost and complexity.</p><p>The cost savings extend beyond infrastructure. Kafka expertise commands premium salaries, and maintaining distributed message bus systems requires dedicated engineering time that could be spent on higher-value work. Zerobus Ingest&#8217;s simplicity means junior engineers can manage what previously required highly skilled distributed systems expertise. When you factor in reduced operational overhead, lower training costs, and faster time-to-production, the economics become even more compelling. Sometimes the best architecture isn&#8217;t the most sophisticated&#8212;it&#8217;s the one that solves your problem.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/p/databricks-zerobus-the-best-bus-is?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/p/databricks-zerobus-the-best-bus-is?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databricksters.com/p/databricks-zerobus-the-best-bus-is?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Bulking Up: High-Performance Batch Salesforce Writes with PySpark]]></title><description><![CDATA[An example of reverse-ETL to Salesforce with parent/child object upserts.]]></description><link>https://www.databricksters.com/p/bulking-up-high-performance-batch</link><guid isPermaLink="false">https://www.databricksters.com/p/bulking-up-high-performance-batch</guid><dc:creator><![CDATA[Neil Wilson]]></dc:creator><pubDate>Tue, 16 Dec 2025 16:01:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!h427!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <a href="https://spark.apache.org/docs/latest/api/python/tutorial/sql/python_data_source.html">Python Data Source API</a> allows Spark developers to easily define custom sources and sinks for their Spark jobs <strong>written in Python</strong>. One of the most commonly requested custom sinks I&#8217;ve seen in the field is writing data back to Salesforce. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h427!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h427!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h427!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h427!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h427!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h427!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg" width="1024" height="559" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139728,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/180362407?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!h427!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h427!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h427!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h427!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc52ce22-96ca-48a0-9591-1c645b2c0f13_1024x559.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I&#8217;ve written an example custom Salesforce batch writer <a href="https://github.com/neil-wilson-data/python-data-sources/blob/main/salesforce/salesforce_batch_writer.py">here</a>. It is designed to support uploads via the Salesforce Bulk API v1.0 or v2.0, but this guide will focus on v2.0. I strongly recommend using 2.0 for newly introduced features discussed below.</p><p>This code is a robust example, and is not intended to be copy/pasted into production.</p><h2><strong>Background on Salesforce Bulk API 2.0</strong></h2><p>Before diving into the code, let&#8217;s discuss how the Bulk API 2.0 works and how our writer can interact with it. A &#8220;job&#8221; is the unit of work in the Salesforce Bulk API. In v1 of the Salesforce API, users had to create a job and then manually break that job into chunks of 10,000 records and track each batch individually. In v2, you simply submit your data via a job and it handles batches and retries automatically. The v1 limit for upload was 10MB per <strong>batch</strong>, while the v2 limit is 150MB per <strong>job</strong>.</p><p>But what if my DataFrame is larger than 150MB, do I have to manually slice the data into sub 150MB chunks and iteratively submit multiple Bulk API jobs? This is where the power of Spark shines, though it comes with a requirement. Spark will automatically parallelize the work into multiple Salesforce &#8220;jobs&#8221;, but it won&#8217;t automatically slice large partitions. You must explicitly control the partition size (using <strong>repartition</strong>) to ensure you don&#8217;t send a chunk larger than 150MB.</p><h2><strong>Background on Spark writes</strong></h2><p>In Spark, when <code>df.write</code> is called, the driver calls the writer() method for the DataSource object in question. In our Salesforce example, calling this writer method will result in the instantiation of our SalesforceBatchWriter object.</p><pre><code><code>def writer(self, schema: StructType, overwrite: bool):
    &#8220;&#8221;&#8220;Create a writer instance for the given schema.&#8221;&#8220;&#8221;
    return SalesforceBatchWriter(self.options, schema)

class SalesforceBatchWriter(DataSourceWriter):
    &#8220;&#8221;&#8220;
    DataSourceWriter implementation for Salesforce Bulk API operations.
    Handles both authentication and bulk data upload to Salesforce objects.
    &#8220;&#8221;&#8220;
    
    def __init__(self, options: Dict[str, str], schema: StructType):
        self.options = options
        self.schema = schema</code></code></pre><p>Spark then looks at the write() method within your DataSourceWriter (recall above, our SalesforceBatchWriter inherited from the DataSourceWriter class), serializes (pickles) this write method, and creates a task for every partition in your DataFrame. It then sends these tasks to the Executors. Simply put, in Spark, each partition of data receives its own write task. This means if we have an extremely large DataFrame, so long as its partitioned and each partition is under 150MB, each partition will receive its own set of write instructions and can be submitted in parallel as multiple Salesforce Bulk API v2 jobs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sitd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sitd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 424w, https://substackcdn.com/image/fetch/$s_!Sitd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 848w, https://substackcdn.com/image/fetch/$s_!Sitd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 1272w, https://substackcdn.com/image/fetch/$s_!Sitd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sitd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png" width="683" height="487" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:487,&quot;width&quot;:683,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31854,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://neilwilsondata.substack.com/i/180362407?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Sitd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 424w, https://substackcdn.com/image/fetch/$s_!Sitd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 848w, https://substackcdn.com/image/fetch/$s_!Sitd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 1272w, https://substackcdn.com/image/fetch/$s_!Sitd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff90982d9-f86a-4340-ae01-2168ffcb99b5_683x487.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Spark to Salesforce Writer</strong></h2><p>Now that we understand a bit more about how Spark and the Salesforce Bulk API can interact, let&#8217;s dive into our custom writer implementation.</p><pre><code><code>def write(self, rows: Iterator[Row]) -&gt; SalesforceCommitMessage:
        &#8220;&#8221;&#8220;
        Write rows to Salesforce using Bulk API.
        
        Args:
            rows: Iterator of PySpark Row objects to write
            
        Returns:
            SalesforceCommitMessage with write statistics
        &#8220;&#8221;&#8220;
        # Import inside method to meet serialization requirements on executors
        from simple_salesforce import Salesforce, SalesforceAuthenticationFailed

        ctx = TaskContext.get()
        partition_id = ctx.partitionId()

        username = self.options.get(&#8221;username&#8221;)
        password = self.options.get(&#8221;password&#8221;)
        security_token = self.options.get(&#8221;security_token&#8221;)
        sobject = self.options.get(&#8221;sobject&#8221;)
        instance_url = (self.options.get(&#8221;instance_url&#8221;) or &#8220;&#8221;).strip()
        domain = (self.options.get(&#8221;domain&#8221;) or &#8220;login&#8221;).strip()
        api_version = self.options.get(&#8221;api_version&#8221;, &#8220;1&#8221;)  # &#8220;1&#8221; (Bulk V1) or &#8220;2&#8221; (Bulk V2)

        if not all([username, password, security_token, sobject]):
            raise ValueError(&#8221;Missing required Salesforce options: &#8216;username&#8217;, &#8216;password&#8217;, &#8216;security_token&#8217;, &#8216;sobject&#8217;&#8221;)

        # Collect iterator to a list of dicts for bulk insert
        data_to_upload: List[Dict[str, Any]] = [row.asDict() for row in rows]

        if not data_to_upload:
            print(f&#8221;Partition {partition_id}: No rows to write.&#8221;)
            return SalesforceCommitMessage(partition_id=partition_id, records_written=0, errors=0)

        try:
            if instance_url:
                sf = Salesforce(
                    username=username,
                    password=password,
                    security_token=security_token,
                    instance_url=instance_url,
                )
            else:
                sf = Salesforce(
                    username=username,
                    password=password,
                    security_token=security_token,
                    domain=domain,
                )</code></code></pre><p>The <code>data_to_upload </code>is an important array variable in this writer. To understand what it is doing, recall that write() is pickled and sent as a task for <em>each partition</em> in your DataFrame. This means that the <code>rows: Iterator[Row]</code> argument that data_to_upload is iterating over is the set of rows for <strong>one partition</strong>. This is turning each row into a key/value dict for compatibility with the Salesforce API and storing them in a Python List. It&#8217;s extremely important to recognize that creating this List of Dict objects is <strong>materializing the entire partition in the memory of your executor</strong>. This must be done, as the Salesforce API requires the full payload to be constructed before sending. If this is done on a partition that is too large, you will face Out of Memory (OOM) issues. If this is the case, logic could be added to chunk your partition, but that would mean additional Bulk API jobs and additional API calls in Salesforce.</p><p>The next step is to actually perform the bulk insert of these records and track the status of the job returned by the API.</p><pre><code><code>if api_version == &#8220;2&#8221;:
                job_summaries = self._bulk2_insert(sf, sobject, data_to_upload, batch_size)
                
                for summary in job_summaries:
                    num_processed = int(summary.get(&#8221;numberRecordsProcessed&#8221;, 0))
                    num_failed = int(summary.get(&#8221;numberRecordsFailed&#8221;, 0))
                    success += max(0, num_processed - num_failed)
                    errors += max(0, num_failed)

                    print(f&#8221;Partition {partition_id} job {summary.get(&#8217;job_id&#8217;)} summary: &#8220;
                        f&#8221;processed={num_processed}, failed={num_failed}, total={summary.get(&#8217;numberRecordsTotal&#8217;)}&#8221;)

                    if num_failed &gt; 0 and summary.get(&#8221;job_id&#8221;):
                        try:
                            failed_csv = sf.bulk2.__getattr__(sobject).get_failed_records(summary[&#8221;job_id&#8221;])
                            if isinstance(failed_csv, str):
                                preview = failed_csv[:2048]
                                print(f&#8221;Partition {partition_id} job {summary[&#8217;job_id&#8217;]} failed records (preview):\n&#8221;
                                    f&#8221;{preview}&#8221;)
                                
                                # Store the first error preview to bubble up to the driver
                                if not failed_records_preview:
                                    failed_records_preview = preview
                        except Exception as e:
                            print(f&#8221;Partition {partition_id}: Could not retrieve failed records: {e}&#8221;)</code></code></pre><p>job_summaries performs the insert via the _bulk2_call method, and will contain job_status information for the submitted job once it completes. _bulk2_call is an internal method that uses the <a href="https://github.com/simple-salesforce/simple-salesforce/blob/master/simple_salesforce/bulk2.py#L1044">simple-salesforce insert method</a> and attempts to upload the data in the v2 syntax &#8220;insert(records=records)&#8221; or if that fails attempts the v1 syntax &#8220;insert(data=records)&#8221;. It also contains logic to allow for performing upserts instead of inserts.</p><pre><code><code>def _bulk2_call(
        self,
        sf,
        sobject: str,
        operation: str,
        records: List[Dict[str, Any]],
        batch_size: int,
        upsert_key: Optional[str] = None,
    ):
        &#8220;&#8221;&#8220;Execute a Bulk API 2.0 job for the given operation and records.

        Uses the modern signature if available and falls back to legacy signatures.

        Args:
            sf: An authenticated simple_salesforce Salesforce client instance.
            sobject: The target Salesforce object API name (e.g., &#8220;Contact&#8221;).
            operation: The CRUD operation (&#8221;insert&#8221;, &#8220;update&#8221;, &#8220;upsert&#8221;, &#8220;delete&#8221;).
            records: The list of prepared record dicts to send.
            batch_size: Preferred batch size hint for the Bulk API 2.0 client.
            upsert_key: External ID field name for upsert operations (required for upsert).

        Returns:
            Any: The result object or list of result objects from the Bulk API call.

        Raises:
            ValueError: If upsert_key is missing for an upsert operation.
        &#8220;&#8221;&#8220;
        api = sf.bulk2.__getattr__(sobject)
        fn = getattr(api, operation)

        try:
            # Prefer modern signature
            if operation == &#8220;upsert&#8221;:
                if not upsert_key:
                    raise ValueError(&#8221;Option &#8216;upsertField&#8217; is required for upsert.&#8221;)
                return fn(records=records, external_id_field=upsert_key, batch_size=batch_size)
            return fn(records=records, batch_size=batch_size)
        except TypeError:
            # Fallback for older versions
            if operation == &#8220;upsert&#8221;:
                if not upsert_key:
                    raise ValueError(&#8221;Option &#8216;upsertField&#8217; is required for upsert.&#8221;)
                return fn(data=records, external_id_field=upsert_key, batch_size=batch_size)
            return fn(data=records, batch_size=batch_size)</code></code></pre><h2><strong>Usage</strong></h2><p>The idea of a reusable and configurable Salesforce writer is great, but how easy is it to use? To use the example I&#8217;ve written, simply copy the <a href="https://github.com/neil-wilson-data/python-data-sources/blob/main/salesforce/salesforce_batch_writer.py">SalesforceBatchDataSource</a> class anywhere that makes sense for your project. Import the class and register the DataSource with:</p><pre><code><code>spark.dataSource.register(SalesforceBatchDataSource)</code></code></pre><p>Now you&#8217;re ready to use the writer!</p><p>If running on Databricks, I recommend using <a href="https://docs.databricks.com/aws/en/security/secrets/">dbutils secrets</a> for secret management of your Salesforce credentials. The following code is an example of how to retrieve those and configure your Salesforce connection. <code>instance_host</code> should be your company&#8217;s salesforce URI without the https prefix, e.g. mycompany.my.salesforce.com. <code>domain</code> should be &#8220;test&#8221; for sandbox development, or any other string value for production.</p><pre><code><code>try:
    SF_USERNAME = dbutils.secrets.get(scope=&#8221;neil-salesforce&#8221;, key=&#8221;username&#8221;)
    SF_PASSWORD = dbutils.secrets.get(scope=&#8221;neil-salesforce&#8221;, key=&#8221;password&#8221;)
    SF_TOKEN = dbutils.secrets.get(scope=&#8221;neil-salesforce&#8221;, key=&#8221;token&#8221;)
    SF_DOMAIN = dbutils.secrets.get(scope=&#8221;neil-salesforce&#8221;, key=&#8221;domain&#8221;)

    try:
        SF_INSTANCE_HOST = dbutils.secrets.get(scope=&#8221;salesforce&#8221;, key=&#8221;instance_host&#8221;)
        SF_INSTANCE_URL = f&#8221;https://{SF_INSTANCE_HOST}&#8221;
    except Exception:
        SF_INSTANCE_URL = &#8220;https://test.salesforce.com&#8221; if SF_DOMAIN == &#8220;test&#8221; else &#8220;https://login.salesforce.com&#8221;

    sf_creds = {
        &#8220;username&#8221;: SF_USERNAME,
        &#8220;password&#8221;: SF_PASSWORD,
        &#8220;security_token&#8221;: SF_TOKEN,
        &#8220;instance_url&#8221;: SF_INSTANCE_URL,
    }
    run_job = True
    print(&#8221;Loaded Salesforce credentials from Databricks secrets.&#8221;)
    print(f&#8221;Using instance_url for auth: {SF_INSTANCE_URL}&#8221;)
except Exception as e:
    print(f&#8221;Warning: Could not load Databricks secrets. {e}&#8221;)
    print(&#8221;Provide credentials manually or configure secrets; skipping write.&#8221;)</code></code></pre><p>Next, the write can be called on our DataFrame. The SF_SOBJECT will be the name of the Salesforce object you are writing to. In my example, I am using a custom &#8220;big&#8221; object for testing writes larger than the Salesforce developer limit for traditional objects. Another more realistic example of an SF_SOBJECT might be &#8220;Contact&#8221; or &#8220;Account&#8221;. Note that the schema of the DataFrame you attempt to write <strong>must match the schema that the Salesforce Bulk API V2 expects for that object</strong>.</p><pre><code><code>NUM_PARTITIONS = 32
API_VERSION = &#8220;2&#8221;
SF_SOBJECT = &#8220;spark_perf_test__b&#8221;

df_perf = df_to_write.repartition(NUM_PARTITIONS)
    
print(f&#8221;DataFrame repartitioned into {NUM_PARTITIONS} partition(s).&#8221;)
print(&#8221;Schema being sent to Salesforce:&#8221;)
df_perf.printSchema()
    
 # --- 4. Run the Write Job ---
print(f&#8221;Starting write of {TOTAL_RECORDS:,} records to {SF_SOBJECT} via Bulk API v{API_VERSION}...&#8221;)

try:
    (
        df_perf.write
        .format(&#8221;salesforce-batch&#8221;)
        .mode(&#8221;append&#8221;)
        .options(**sf_creds)
        .option(&#8221;sobject&#8221;, SF_SOBJECT)
        .option(&#8221;api_version&#8221;, API_VERSION)
        .save()
        )</code></code></pre><h4><strong>Parent/Child Relationships</strong></h4><p>A must-have for Salesforce integration is the ability to handle objects with parent/child relationships. For example, when uploading a Contact, the Contact should be tied to an Account. This is possible in our example code by performing an upsert, and submitting the Parent ID with the child record.</p><pre><code><code>df_ready = contacts_df.select(
         &#8220;FirstName&#8221;,
         &#8220;LastName&#8221;,
         &#8220;Email&#8221;,  # Contact upsert key in this example
         col(&#8221;AccountExtId&#8221;).alias(&#8221;Account.Oracle_Id__c&#8221;)  # Link Contact -&gt; Account by Account external ID
     )

(df_ready.write
     .format(&#8221;salesforce-batch&#8221;)
     .mode(&#8221;append&#8221;)
     .options(**sf_creds)
     .option(&#8221;sobject&#8221;, &#8220;Contact&#8221;)
     .option(&#8221;api_version&#8221;, &#8220;2&#8221;)
     .option(&#8221;operation&#8221;, &#8220;upsert&#8221;)
     .option(&#8221;upsertField&#8221;, &#8220;Email&#8221;)   # child field to upsert on. unique key
     # Optional tuning:
     # .option(&#8221;ignoreNullValues&#8221;, &#8220;true&#8221;)  # omit nulls instead of clearing
     # .option(&#8221;batchSize&#8221;, &#8220;10000&#8221;)
     .save()
    )</code></code></pre><h4><strong>Null Values</strong></h4><p>Above we see the option &#8220;ignoreNullValues&#8221;. This option determines what we do when not all fields are submitted for an object. If ignoreNullValues is set to True, those empty fields in the payload will not be touched within Salesforce. If ignoreNullValues is set to false, those fields will be overwritten with Null values within Salesforce.</p><h2><strong>Performance</strong></h2><p>In the snippet above we set NUM_PARTITIONS equal to 32 and repartition the DataFrame with this value. It&#8217;s crucial to consider what this value should be set to. As we discussed above, Spark will create a task for each partition, which means creating a Salesforce Bulk API job per partition. This increases throughput, but also multiplies how many Salesforce API calls you make per pipeline run. Each job submitted will use more than one Salesforce API call as it processes, so pay careful attention to your organization&#8217;s Salesforce API limits when developing this solution.</p><p>With this in mind, for large uploads to Salesforce this solution does outperform submitting all records as one Salesforce Bulk API job. When submitting 300,000 records in one batch, uploading took 372 seconds. When running after repartitioning to 32 partitions (one for each core in my cluster), the time dropped to 31 seconds.</p><p>In my performance testing, 32 partitions was chosen to match the 32 cores in my provisioned cluster. For a production use case I recommend finding a balance between the number of partitions you choose to maximize throughput, and the number of Salesforce API calls you are comfortable making per run.</p><h2><strong>Conclusion</strong></h2><p>This was an in-depth explanation of how this batch writer has been implemented, and the thought behind it. The final takeaway should be that if you consider some crucial information from this post, writing multiple objects to Salesforce should be as easy as:</p><pre><code><code># Write the DataFrame `df_accounts` to the standard Account object (recommended: upsert via External ID)
(
    df_accounts.write
    .format(&#8221;salesforce-batch&#8221;)
    .mode(&#8221;append&#8221;)
    .options(**sf_creds)
    .option(&#8221;sobject&#8221;, &#8220;Account&#8221;)
    .option(&#8221;api_version&#8221;, &#8220;2&#8221;)
    .option(&#8221;operation&#8221;, &#8220;upsert&#8221;)
    .option(&#8221;upsertField&#8221;, ACCOUNT_EXT_ID_FIELD)  # e.g., &#8220;Oracle_Id__c&#8221; or &#8220;AccountNumber&#8221;
    .save()
)

# Link the DataFrame `df_contacts` to Accounts by flattening the relationship column
# Assumes df_contacts has: FirstName, LastName, Email, AccountExtId (values like &#8220;ORA-1001&#8221;)
from pyspark.sql.functions import col

df_contacts_ready = df_contacts.select(
    &#8220;FirstName&#8221;,
    &#8220;LastName&#8221;,
    &#8220;Email&#8221;,  # Upsert key on Contact
    col(&#8221;AccountExtId&#8221;).alias(f&#8221;Account.{ACCOUNT_EXT_ID_FIELD}&#8221;)  # e.g., &#8220;Account.Oracle_Id__c&#8221;
)

# Write the linked Contacts (recommended: upsert via Email)
(
    df_contacts_ready.write
    .format(&#8221;salesforce-batch&#8221;)
    .mode(&#8221;append&#8221;)
    .options(**sf_creds)
    .option(&#8221;sobject&#8221;, &#8220;Contact&#8221;)
    .option(&#8221;api_version&#8221;, &#8220;2&#8221;)
    .option(&#8221;operation&#8221;, &#8220;upsert&#8221;)
    .option(&#8221;upsertField&#8221;, &#8220;Email&#8221;)  # must be non-null/non-empty; ideally External ID/Unique
    .save()
)</code></code></pre>]]></content:encoded></item><item><title><![CDATA[Cheese and Rice, that's config.json Bourne]]></title><description><![CDATA[Deploying Fine Tuned Models to Provisioned Throughput Endpoints]]></description><link>https://www.databricksters.com/p/jesus-christ-thats-configjson-bourne</link><guid isPermaLink="false">https://www.databricksters.com/p/jesus-christ-thats-configjson-bourne</guid><dc:creator><![CDATA[Austin]]></dc:creator><pubDate>Tue, 09 Dec 2025 16:02:10 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/04aa22cb-a9bb-48dc-8265-211e507c2882_1500x1000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of my customers recently tried fine tuning a Llama 3.1 8B model using Unsloth on Databricks Serverless GPU Compute (SGC), which worked great. Then they tried deploying that model to a Provisioned Throughput endpoint, which didn&#8217;t. It took me much longer to diagnose this issue than I care to admit, so instead of talking about the journey, we&#8217;re going to skip to the destination this time. If you&#8217;re trying to do this with a larger mode, say Llama 3.3 70B, then stay tuned for our next installment of this blog I&#8217;m coauthoring with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joshua Eason&quot;,&quot;id&quot;:293149700,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5687722-f359-4e30-af3d-be4ebf9498e1_4016x4016.jpeg&quot;,&quot;uuid&quot;:&quot;455c63c1-dc7c-49d6-b0aa-34e4cb23660c&quot;}" data-component-name="MentionToDOM"></span>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.databricksters.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eKhX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eKhX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 424w, https://substackcdn.com/image/fetch/$s_!eKhX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 848w, https://substackcdn.com/image/fetch/$s_!eKhX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 1272w, https://substackcdn.com/image/fetch/$s_!eKhX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eKhX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif" width="498" height="230" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:230,&quot;width&quot;:498,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2976308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/181096050?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eKhX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 424w, https://substackcdn.com/image/fetch/$s_!eKhX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 848w, https://substackcdn.com/image/fetch/$s_!eKhX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 1272w, https://substackcdn.com/image/fetch/$s_!eKhX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ee979f-bbba-48b9-8fa4-413b7989dfc0_498x230.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">What&#8217;s better, this or John Wick?</figcaption></figure></div><p>The code I provide will work with SGC or on demand ML Runtime, but it&#8217;s significantly faster in SGC even if you only use a single A10 in both scenarios.</p><p>If you use MLR, then start with these pip installs to have all the libraries play nicely with MLR 16.4 LTS, if you&#8217;re using SGC you should pin these dependencies in the environments tab:</p><pre><code>%pip install unsloth[cu124-torch260]==2025.9.6
%pip install threadpoolctl==3.1.0
%pip install accelerate==1.7.0
%pip install unsloth_zoo==2025.9.8
%restart_python</code></pre><p>Next we need to get our base model from <code>unsloth</code>. This works for other tuning frameworks like HuggingFace <code>trl</code> as well, but we&#8217;re using Unsloth for this demo because it&#8217;s nice and quick, and also my customer was using it. Using 4-bit quantization doesn&#8217;t necessarily make sense for your production use case, since I&#8217;m going to merge it back into 16-bit later anyway, but if you want to save time and RAM during a demo then have at it.</p><pre><code>from unsloth import FastLanguageModel
import torch

# Some changes needed for larger models
model, tokenizer = FastLanguageModel.from_pretrained(
   model_name = &#8220;unsloth/llama-3.1-8b&#8221;,
   max_seq_length = 2048,
   dtype = torch.bfloat16,
   load_in_4bit = True,   # fastest + lowest memory
)</code></pre><p>And some sample data:</p><pre><code># It&#8217;s a toy set, replace this with your real data
from datasets import Dataset

data = [
   {&#8221;text&#8221;: &#8220;### Instruction: Say hello politely.\n### Response: Hello! How may I ruin your day?&#8221;},
   {&#8221;text&#8221;: &#8220;### Instruction: Explain PEFT.\n### Response: A lightweight way to fine-tune large models on the cheap.&#8221;},
   {&#8221;text&#8221;: &#8220;### Instruction: Explain LoRA.\n### Response: LoRA adds small trainable matrices instead of updating full weights.&#8221;},
]

dataset = Dataset.from_list(data)</code></pre><p>Great, now we can define our tokenizer and model as well as tokenize the dataset above:</p><pre><code># Define our tokenizer and tokenize dataset
def tokenize(example):
   encoding = tokenizer(
       example[&#8221;text&#8221;],
       truncation=True,
       max_length=1024,
       padding=&#8221;max_length&#8221;,
   )
   encoding[&#8221;labels&#8221;] = encoding[&#8221;input_ids&#8221;].copy()
   return encoding

tokenized_dataset = dataset.map(tokenize)

# Define our model
model = FastLanguageModel.get_peft_model(
   model,
   r = 8,
   lora_alpha = 16,
   lora_dropout = 0.0,
   target_modules = [&#8221;q_proj&#8221;, &#8220;k_proj&#8221;, &#8220;v_proj&#8221;, &#8220;o_proj&#8221;],
)</code></pre><p>Similarly for our training args, the trainer, and then we train the model:</p><pre><code>from transformers import TrainingArguments, Trainer

# Define our training arguments, trainer, and train toy model
training_args = TrainingArguments(
   output_dir = &#8220;outputs&#8221;,
   per_device_train_batch_size = 1,
   gradient_accumulation_steps = 1,
   warmup_steps = 0,
   max_steps = 10,
   learning_rate = 5e-5,
   logging_steps = 5,
   optim = &#8220;adamw_torch&#8221;,
   bf16 = True,
   remove_unused_columns = False,
)

trainer = Trainer(
   model = model,
   args = training_args,
   train_dataset = tokenized_dataset,
)

trainer.train()</code></pre><p>Great, so now still only in our VM we&#8217;re going to merge this adapter later back into our base weights as promised, though in a real use case you would probably have used 16-bit all along. Here is where you&#8217;re really going to see a huge difference between SGC (which does this in ~1 minute) and MLR (which takes ~6):</p><pre><code>import shutil
import os

# If you run multiple within short succession, just rename this
LOCAL_TEMP_PATH = &#8220;/tmp/llama_merged_model_6&#8221;

# See, I told you we would merge back into 16bit weights
model.save_pretrained_merged(
   LOCAL_TEMP_PATH,
   tokenizer=tokenizer,
   save_method=&#8221;merged_16bit&#8221;,
   safe_serialization=True  # Force safetensors
)

print(&#8221;Merged model saved at:&#8221;, LOCAL_TEMP_PATH)</code></pre><p>Here&#8217;s our first stumbling block we&#8217;re going to daintily step over. If you don&#8217;t do this manually, then your <code>_name_or_path</code> param in your config.json Bourne file will either be set to your base model or won&#8217;t be defined at all. In either case, it&#8217;s enough for the Provisioned Throughput (PT) endpoint to reject the entire thing:</p><pre><code>import json
import os

config_path = os.path.join(LOCAL_TEMP_PATH, &#8220;config.json&#8221;)

with open(config_path, &#8220;r&#8221;) as f:
   config = json.load(f)

# Rename the model name to avoid triggering the security check fail, you MUST do this
config[&#8221;_name_or_path&#8221;] = &#8220;unsloth/Meta-Llama-3.1-8B&#8221;

# Save the sanitized config back
with open(config_path, &#8220;w&#8221;) as f:
   json.dump(config, f, indent=2)

print(f&#8221;Sanitized config saved to {config_path}&#8221;)</code></pre><p>Now we&#8217;re finally going to save what we have out to a UC Volume, so pick a path you have read and write permissions on:</p><pre><code>import subprocess
import mlflow

# Define your paths
UC_VOLUME_PATH = &#8220;/Volumes/&lt;catalog_name&gt;/&lt;schema_name&gt;/&lt;volume_name&gt;/merged_weights_mlr_bf16&#8221;

# Define the final model name; this must match what the PT Endpoint expects
CATALOG = &#8220;&lt;catalog_name&gt;&#8221;
SCHEMA = &#8220;&lt;schema_name&gt;&#8221;
REGISTERED_NAME = f&#8221;{CATALOG}.{SCHEMA}.llama_3_1_8b_custom&#8221;

# Copy artifacts to volume
print(f&#8221;Copying from {LOCAL_TEMP_PATH} to {UC_VOLUME_PATH}...&#8221;)

if os.path.exists(UC_VOLUME_PATH):
   subprocess.run([&#8217;rm&#8217;, &#8216;-rf&#8217;, UC_VOLUME_PATH], check=True)

os.makedirs(UC_VOLUME_PATH, exist_ok=True)

# Copy files
subprocess.run([&#8221;cp&#8221;, &#8220;-r&#8221;, f&#8221;{LOCAL_TEMP_PATH}/.&#8221;, UC_VOLUME_PATH], check=True)
print(&#8221;Model copied to UC Volume.&#8221;)</code></pre><p>Cool, here&#8217;s another gotcha: you also need a <code>generation_config.json</code> file to avoid failing the model scan. Here&#8217;s one that works; you can change as needed:</p><pre><code># Add generation_config.json to the UC Volume BEFORE logging to MLflow
gen_config = {
   &#8220;bos_token_id&#8221;: 128000,
   &#8220;eos_token_id&#8221;: 128001,
   &#8220;pad_token_id&#8221;: 128004,
   &#8220;do_sample&#8221;: True,
   &#8220;temperature&#8221;: 0.6,
   &#8220;max_length&#8221;: 8192
}

with open(os.path.join(UC_VOLUME_PATH, &#8220;generation_config.json&#8221;), &#8220;w&#8221;) as f:
   json.dump(gen_config, f, indent=2)

print(&#8221;generation_config.json Bourne added to volume.&#8221;)</code></pre><p>Alright I can see I&#8217;ve overdone the Jason Bourne meme. Lucky for you we&#8217;re about done here. We only need to log and register the model to MLflow and then we can serve the fine tuned Llama 3.1 8B model using optimized Provisioned Throughput endpoints for greatly increased token throughput:</p><pre><code># Log and register to MLflow directly from UC Volume path
mlflow.set_registry_uri(&#8221;databricks-uc&#8221;)

# Define input example for signature (sorry, one more for the road!)
input_example = {
   &#8220;messages&#8221;: [
       {&#8221;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: &#8220;That&#8217;s Jason Bourne.&#8221;}
   ]
}

print(f&#8221;Registering model: {REGISTERED_NAME}&#8221;)

with mlflow.start_run(run_name=&#8221;register_llama_3_1&#8221;) as run:
   model_info = mlflow.transformers.log_model(
       transformers_model=UC_VOLUME_PATH,
       artifact_path=&#8221;model&#8221;,
       task=&#8221;llm/v1/chat&#8221;,
       input_example=input_example,
       registered_model_name=REGISTERED_NAME,
       metadata={
           &#8220;source&#8221;: &#8220;uc_volume&#8221;,
           &#8220;original_path&#8221;: UC_VOLUME_PATH
       }
   )

print(f&#8221;Model version {model_info.registered_model_version} registered.&#8221;)</code></pre><p>From here you can deploy via the UI or if you want to finish this whole thing in the API you can simply run this with your desired throughput bands. I&#8217;m setting mine to the smallest since it&#8217;s a demo:</p><pre><code>import requests

API_ROOT = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get()
API_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

headers = {
   &#8220;Authorization&#8221;: f&#8221;Bearer {API_TOKEN}&#8221;,
   &#8220;Content-Type&#8221;: &#8220;application/json&#8221;
}

model_name = &#8220;&lt;catalog_name&gt;.&lt;schema_name&gt;.llama_3_1_8b_custom&#8221;
model_version = 1 #change as needed
endpoint_name = &#8220;llama-31-8b-mlr-test&#8221;

payload = {
   &#8220;name&#8221;: endpoint_name,
   &#8220;config&#8221;: {
       &#8220;served_entities&#8221;: [
           {
               &#8220;entity_name&#8221;: model_name,
               &#8220;entity_version&#8221;: str(model_version),
               &#8220;min_provisioned_throughput&#8221;: 19000,
               &#8220;max_provisioned_throughput&#8221;: 19000,
           }
       ]
   }
}

response = requests.post(
   f&#8221;{API_ROOT}/api/2.0/serving-endpoints&#8221;,
   headers=headers,
   json=payload
)

print(json.dumps(response.json(), indent=2))</code></pre><p>If you navigate to your Serving tab you&#8217;ll see this container building. For a toy example on Llama 3.1 8B this should take about 10 minutes.</p><p>Please let me know if this unblocked you!</p><p>Happy coding.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Your Storage Bill Is Too High. Here Are 3 Levels of VACUUM to Fix It]]></title><description><![CDATA[Have a look under the covers about Vacuum & Vacuum Lite & Vacuum Using Inventory]]></description><link>https://www.databricksters.com/p/your-storage-bill-is-too-high-here</link><guid isPermaLink="false">https://www.databricksters.com/p/your-storage-bill-is-too-high-here</guid><dc:creator><![CDATA[Canadian Data Guy]]></dc:creator><pubDate>Tue, 02 Dec 2025 16:02:34 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/178845458/3d80716ad9fdaddee36740c710822aed.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h2>The Hidden Storage Problem</h2><p>A common pattern emerges in Delta Lake deployments: storage costs creep up month after month despite stable data ingestion volumes. Teams investigate and discover that their actual in-use data represents only a fraction of their total storage footprint. In one documented case shared on public forums, a team found their active data was just 18% of total storage&#8212;the rest was historical data files that had never been cleaned up.</p><p>This isn&#8217;t a bug&#8212;it&#8217;s a consequence of how Delta Lake&#8217;s time travel feature works. And it&#8217;s why the VACUUM command evolved from a basic cleanup tool into a sophisticated optimization toolkit with three distinct modes, each designed for different scenarios.</p><div><hr></div><h2>Understanding the Root Cause: Why Files Accumulate</h2><p>Delta Lake&#8217;s transaction log is the key to understanding storage bloat. Every write operation&#8212;INSERT, UPDATE, DELETE, MERGE&#8212;generates new Parquet files and records them in the transaction log. But here&#8217;s the critical detail: <strong>old files aren&#8217;t automatically deleted</strong>. They&#8217;re marked as &#8220;removed&#8221; in the log, but they remain on disk.</p><p>This design enables Delta Lake&#8217;s powerful time travel queries:</p><pre><code><code>SELECT * FROM events VERSION AS OF 100
SELECT * FROM events TIMESTAMP AS OF &#8216;2024-11-01&#8217;</code></code></pre><p>The transaction log itself is self-cleaning&#8212;log entries older than 30 days (configurable via <code>delta.logRetentionDuration</code>) are automatically pruned. However, <strong>the actual Parquet data files are never automatically deleted</strong>. Without intervention, they accumulate indefinitely.</p><p>For tables with frequent MERGE or UPDATE operations, this can mean months of historical files building up despite a 7-day retention policy.</p><h2>VACUUM FULL: The Comprehensive Cleanup</h2><p>The original VACUUM command, now referred to as <strong>VACUUM FULL</strong> (though FULL is the default and doesn&#8217;t need to be specified), was designed as a thorough cleanup mechanism:</p><pre><code><code>VACUUM events RETAIN 168 HOURS</code></code></pre><h3>How VACUUM FULL Works</h3><p>The operation proceeds in three distinct phases:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dtuH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dtuH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 424w, https://substackcdn.com/image/fetch/$s_!dtuH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 848w, https://substackcdn.com/image/fetch/$s_!dtuH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!dtuH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dtuH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png" width="1456" height="666" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:666,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dtuH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 424w, https://substackcdn.com/image/fetch/$s_!dtuH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 848w, https://substackcdn.com/image/fetch/$s_!dtuH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!dtuH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F734ee373-6d00-478d-a5cc-cccfbf013c32_2510x1148.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Phase 1: Recursive File Listing</strong></p><p>VACUUM starts by recursively listing every file in the table&#8217;s storage directory. On cloud storage, this means API calls:</p><ul><li><p>AWS S3: <code>ListObjectsV2</code> requests</p></li><li><p>Azure Blob: <code>List Blobs</code> operations</p></li><li><p>GCP Storage: List operations</p></li></ul><p>For large tables with millions of files across thousands of partitions, this phase can be time-consuming and generates significant API costs. The listing happens in parallel across Spark worker nodes, with parallelism determined by the number of unique directory paths.</p><p><strong>Phase 2: Delta Log Comparison</strong></p><p>VACUUM reads the Delta transaction log to identify which files are currently referenced. Files are marked for deletion if they:</p><ol><li><p>Are not referenced in the current table state, AND</p></li><li><p>Are older than the retention threshold (7 days by default)</p></li></ol><p>Critically, VACUUM FULL doesn&#8217;t just consult the Delta log&#8212;it also identifies <strong>straggler files</strong>: files that exist in the table directory but were never successfully committed to the Delta log (typically from aborted writes or failed jobs).</p><p><strong>Phase 3: Deletion</strong></p><p>File deletion is a driver-only operation. The driver issues deletion commands to the cloud storage provider:</p><ul><li><p>AWS: Uses <code>DeleteObjects</code> bulk API (single-threaded)</p></li><li><p>Azure/GCP: Can delete in parallel if <code>spark.databricks.delta.vacuum.parallelDelete.enabled</code> is set to true</p></li></ul><h3>The Performance Challenge</h3><p>For large tables, VACUUM FULL can take a significant amount of time&#8212;reports from the community mention runs lasting 30-60 minutes or more for petabyte-scale tables. The recursive listing phase is often the bottleneck, both in runtime and API call costs.</p><p>Teams typically run VACUUM FULL weekly or monthly due to these performance constraints.</p><div><hr></div><h2>VACUUM USING INVENTORY: The Extreme Scale Solution (Delta Lake 3.2.0)</h2><div class="pullquote"><p>I&#8217;ll briefly address it, but <strong>I don&#8217;t recommend</strong> this approach. In my experience&#8212;across <strong>400+ companies</strong> and thousands of use cases from terabyte to petabyte scale&#8212;I haven&#8217;t seen a single team use it. I&#8217;m including it only for completeness, not as a path worth your time</p></div><p>Released in May 2024 with Delta Lake 3.2.0, <strong>VACUUM USING INVENTORY</strong> introduced a novel approach to eliminate the expensive file listing phase.</p><h3>The Core Innovation</h3><p>Instead of making live API calls to list files, this mode leverages pre-generated inventory reports from cloud providers:</p><ul><li><p>AWS S3 Inventory</p></li><li><p>Azure Storage Blob Inventory</p></li><li><p>GCP Storage Insights</p></li></ul><p>These services generate daily or weekly manifest files containing complete lists of all objects in a bucket. VACUUM USING INVENTORY:</p><ol><li><p>Reads the pre-generated inventory manifest</p></li><li><p>Compares it against the Delta transaction log</p></li><li><p>Identifies and deletes files</p></li></ol><h3>The Reality: When NOT to Use It</h3><p>Despite impressive performance numbers, VACUUM USING INVENTORY comes with significant operational overhead that makes it unsuitable for most organizations:</p><p><strong>Setup Complexity:</strong></p><ul><li><p>Configure cloud inventory service</p></li><li><p>Establish inventory scheduling (daily/weekly)</p></li><li><p>Create and maintain manifest tables in Delta Lake</p></li><li><p>Configure VACUUM to read from inventory locations</p></li><li><p>Monitor inventory freshness and handle schema evolution</p></li></ul><p><strong>The Compliance Risk:</strong></p><div class="pullquote"><p>For compliance-critical use cases&#8212;GDPR right-to-deletion, CCPA data retention, financial record-keeping&#8212;the stakes are high. If inventory manifests are stale or misconfigured, files that should have been deleted might be missed. The potential penalties from compliance violations can far exceed any computational savings.</p></div><p><strong>Recommendation:</strong></p><p>VACUUM USING INVENTORY is best reserved for organizations that:</p><ul><li><p>Operate dozens of petabyte-scale tables</p></li><li><p>Have dedicated platform engineering teams</p></li><li><p>Have low compliance risk or robust inventory monitoring</p></li></ul><p><strong>For most teams, the operational complexity and risk aren&#8217;t justified by the savings. Standard VACUUM modes are simpler, more reliable, and sufficient.</strong></p><div><hr></div><h2>VACUUM LITE: The Practical Evolution (Delta Lake 3.3.0)</h2><p>On January 6, 2025, Delta Lake 3.3.0 introduced <strong>VACUUM LITE</strong>, which fundamentally changed the maintenance equation by addressing VACUUM FULL&#8217;s core bottleneck in a simpler way.</p><h3>The Key Insight</h3><p>Rather than scanning storage directories or managing complex inventory systems, VACUUM LITE takes a direct approach:</p><ol><li><p>Read the Delta transaction log (which already lists all committed files)</p></li><li><p>Identify files marked as &#8220;removed&#8221; and older than the retention threshold</p></li><li><p>Delete them</p></li></ol><p>That&#8217;s it. No directory traversal. No ListObjects API calls. No inventory management.</p><pre><code><code>VACUUM events LITE RETAIN 168 HOURS</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ohnG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ohnG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 424w, https://substackcdn.com/image/fetch/$s_!ohnG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 848w, https://substackcdn.com/image/fetch/$s_!ohnG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 1272w, https://substackcdn.com/image/fetch/$s_!ohnG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ohnG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png" width="1456" height="1256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1256,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ohnG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 424w, https://substackcdn.com/image/fetch/$s_!ohnG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 848w, https://substackcdn.com/image/fetch/$s_!ohnG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 1272w, https://substackcdn.com/image/fetch/$s_!ohnG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450e90c8-40f4-4079-a954-9e8f07fbc0ae_1628x1404.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Performance Characteristics</h3><p>Community reports show dramatic speedups:</p><ul><li><p>Operations that took 30-60 minutes with VACUUM FULL now complete in under 5 minutes</p></li><li><p>Minimal API call costs since directory listing is eliminated</p></li><li><p>Suitable for frequent execution (daily or multiple times per day)</p></li></ul><h3>The Trade-Off: Speed vs. Thoroughness</h3><p>VACUUM LITE&#8217;s speed comes from being more selective. It only deletes files tracked in the Delta transaction log. It will not catch:</p><ul><li><p><strong>Straggler files from aborted writes</strong></p></li><li><p><strong>Orphaned files from failed jobs</strong></p></li><li><p><strong>Any files not committed to the transaction log</strong></p></li></ul><h3>The Baseline Requirement</h3><p>There&#8217;s a critical safety mechanism: <strong>VACUUM LITE requires at least one successful VACUUM FULL run within the transaction log retention window</strong> (30 days by default).</p><p>This baseline ensures VACUUM LITE has a complete picture. Without it, the command fails with:</p><pre><code><code>DELTA_CANNOT_VACUUM_LITE: VACUUM &lt;tableName&gt; LITE cannot delete all eligible files 
as some files are not referenced by the Delta log. Please run VACUUM FULL.</code></code></pre><p>This safety check prevents accidental data loss. If you encounter this error, run VACUUM FULL once to establish the baseline.</p><div><hr></div><h2>Recommended Strategy: The Hybrid Approach</h2><p>The most effective strategy combines both VACUUM modes:</p><h3>Weekly Maintenance: VACUUM LITE</h3><p>Use LITE for routine cleanup of files from normal operations (MERGE, UPDATE, DELETE):</p><pre><code><code>-- Scheduled daily
VACUUM high_churn_table LITE RETAIN 168 HOURS</code></code></pre><p><strong>When to use:</strong></p><ul><li><p>Tables with frequent data modifications</p></li><li><p>Well-managed ingestion pipelines</p></li><li><p>Daily or multi-daily maintenance windows</p></li></ul><h3>Weekly/Monthly Deep Clean: VACUUM FULL</h3><p>Use FULL for comprehensive cleanup:</p><pre><code><code>-- Scheduled weekly (e.g., Sunday morning)
VACUUM high_churn_table RETAIN 168 HOURS</code></code></pre><p><strong>When to use:</strong></p><ul><li><p>Establishing the baseline for LITE mode</p></li><li><p>After irregular ingestion issues</p></li><li><p>Cleaning up straggler files</p></li><li><p>Compliance-critical scenarios requiring thoroughness</p></li><li><p>Periodic deep maintenance</p></li></ul><p>This two-tier approach provides:</p><ul><li><p>Fast, frequent cleanup via LITE (minutes daily)</p></li><li><p>Comprehensive straggler removal via FULL (weekly/bi-weekly)</p></li><li><p>Predictable maintenance windows</p></li><li><p>Lower overall compute costs</p></li></ul><div><hr></div><h1><strong>The Secret to a Cheaper </strong><code>VACUUM</code><strong> (Hardware Configuration) $$$</strong></h1><p>The standard Databricks recommendation for VACUUM is:</p><ul><li><p>Auto-scaling cluster (1-4 workers)</p></li><li><p>Compute-optimized instances (AWS C5, Azure F-series, GCP C2)</p></li><li><p>8 cores per worker</p></li><li><p>8-32 core driver</p></li></ul><p>This is a safe, balanced configuration. However, there&#8217;s an alternative approach if you can afford some occasional failures but want to save every penny.</p><h3>The Single-Node Strategy</h3><p>For VACUUM LITE and certain VACUUM FULL scenarios, running on a single powerful driver node (0 workers) can be more cost-effective:</p><p><strong>Configuration:</strong></p><ul><li><p>Driver: Large compute-optimized instance (32-64 cores)</p></li><li><p>Workers: 0</p></li><li><p>Mode: Single-node cluster</p></li></ul><p><strong>Why this works:</strong></p><ol><li><p><strong>File listing</strong>: For many tables, the listing phase is bottlenecked by cloud storage API rate limits, not CPU. Additional workers may not significantly improve listing time.</p></li><li><p><strong>File deletion</strong>: This is driver-only regardless of worker count. Workers sit idle during this phase.</p></li><li><p><strong>Cost</strong>: A single large driver for 30 minutes often costs less than a multi-node cluster for the same duration..</p></li></ol><div><hr></div><h2>Decision Framework: Choosing Your VACUUM Strategy</h2><p>Here&#8217;s a practical guide for determining which VACUUM mode to use:</p><h3>Use VACUUM FULL When:</h3><p>&#9989; Running your first VACUUM on a table (establishes LITE baseline)<br>&#9989; After messy or failed data ingestion jobs<br>&#9989; Periodic deep cleaning (weekly/bi-weekly/monthly)<br>&#9989; Compliance-critical scenarios (thoroughness matters)<br>&#9989; Tables with known straggler file issues<br>&#9989; Resolving DELTA_CANNOT_VACUUM_LITE errors</p><p><strong>Typical frequency</strong>: Weekly to monthly, depending on table characteristics</p><h3>Use VACUUM LITE When:</h3><p>&#9989; Daily or frequent maintenance operations<br>&#9989; Well-managed tables with regular modifications<br>&#9989; Routine cleanup of committed files<br>&#9989; Tables with an established FULL baseline<br>&#9989; Speed and cost optimization are priorities</p><p><strong>Typical frequency</strong>: Daily to weekly</p><div><hr></div><div><hr></div><h2>Understanding the Timeline: When Features Became Available</h2><p>For teams planning their VACUUM strategy, here&#8217;s when each feature was introduced:</p><ul><li><p><strong>VACUUM (FULL mode)</strong>: Original Delta Lake feature, available since early versions</p></li><li><p><strong>VACUUM USING INVENTORY</strong>: Delta Lake 3.2.0 (May 9, 2024)</p><ul><li><p>Requires Databricks Runtime 15.4 LTS or later</p></li></ul></li><li><p><strong>VACUUM LITE</strong>: Delta Lake 3.3.0 (January 6, 2025)</p><ul><li><p>Requires Databricks Runtime 16.1 or later</p></li></ul></li></ul><p>Check your Databricks Runtime version to determine which features are available in your environment.</p><div><hr></div><h2>Key Takeaways</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PLH1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PLH1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 424w, https://substackcdn.com/image/fetch/$s_!PLH1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 848w, https://substackcdn.com/image/fetch/$s_!PLH1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 1272w, https://substackcdn.com/image/fetch/$s_!PLH1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PLH1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png" width="1254" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:549381,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/178742399?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!PLH1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 424w, https://substackcdn.com/image/fetch/$s_!PLH1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 848w, https://substackcdn.com/image/fetch/$s_!PLH1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 1272w, https://substackcdn.com/image/fetch/$s_!PLH1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea1d783-b894-4033-8ddb-eeae9cca73d0_1254x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The evolution of VACUUM in Delta Lake reflects the maturation of large-scale data lakehouse operations:</p><ol><li><p><strong>VACUUM FULL remains essential</strong> for comprehensive cleanup and establishing baselines, despite being slower and more expensive</p></li><li><p><strong>VACUUM LITE is the game-changer</strong> for most teams&#8212;fast enough for daily use, simple enough to trust in production, and cost-effective</p></li><li><p><strong>VACUUM USING INVENTORY is specialized</strong> and best reserved for extreme-scale scenarios with dedicated platform engineering. <strong>Compliance considerations trump cost savings: For regulated data, thoroughness and reliability are more important than marginal compute savings</strong></p></li><li><p><strong>The hybrid approach works</strong>: Combine frequent LITE runs with periodic FULL runs for optimal cost and coverage</p></li><li><p><strong>Hardware optimization matters</strong>: Test both standard auto-scaling and single-node configurations to find your optimal cost/performance balance</p></li></ol><p>The lesson isn&#8217;t &#8220;always use the latest feature&#8221;&#8212;it&#8217;s understanding the trade-offs and choosing the right tool for your specific tables, scale, and requirements.</p><div><hr></div><h2>References</h2><ol><li><p><a href="https://github.com/delta-io/delta/releases/tag/v3.3.0">Delta Lake 3.3.0 Release Notes</a> - VACUUM LITE introduction</p></li><li><p><a href="https://github.com/delta-io/delta/releases/tag/v3.2.0">Delta Lake 3.2.0 Release Notes</a> - VACUUM USING INVENTORY</p></li><li><p><a href="https://docs.databricks.com/en/delta/vacuum.html">Databricks Documentation: Remove unused data files with VACUUM</a></p></li><li><p><a href="https://docs.delta.io/latest/delta-vacuum.html">Efficient Delta Vacuum with File Inventory</a></p></li><li><p><a href="https://github.com/delta-io/delta">Delta Lake GitHub Repository</a></p></li></ol>]]></content:encoded></item><item><title><![CDATA[Trace your steps back to Slack]]></title><description><![CDATA[Create a slackbot to to review MLflow traces for your agent.]]></description><link>https://www.databricksters.com/p/trace-your-steps-back-to-slack</link><guid isPermaLink="false">https://www.databricksters.com/p/trace-your-steps-back-to-slack</guid><dc:creator><![CDATA[Veena]]></dc:creator><pubDate>Tue, 25 Nov 2025 16:02:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Li1P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you have been creating and deploying Agents on Databricks, then perhaps you are already aware of the existence of MLflow Review Apps. For those who have not used them before, MLflow Review Apps are an easy way to collect feedback from your Subject Matter Experts on your agent. Databricks provides support for using review apps through the built-in interface or, if you need more customization, through a <a href="https://github.com/databricks-solutions/custom-mlflow-review-app/tree/main">custom review app</a> hosted on Databricks Apps.</p><p>But what if we could just bring this process directly to Slack? This blog post will walk you through building a Slackbot that enables real-time agent interaction and feedback collection.</p><h2>How does tracing and feedback work in MLflow?</h2><p>With MLflow Production Monitoring, you can see traces arrive directly in an MLflow experiment. These traces can be synced to a table in Unity Catalog.</p><p>Each trace has a unique ID automatically generated by MLflow. This ID can be used to add feedback (source: <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/collect-user-feedback/#implementing-feedback-collection">Databricks documentation</a>) via the MLflow <code>log_feedback</code> function. This can be an LLM judge or human feedback. Feedback is also stored as an assessment linked to the specific trace, making it queryable through MLflow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Li1P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Li1P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 424w, https://substackcdn.com/image/fetch/$s_!Li1P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 848w, https://substackcdn.com/image/fetch/$s_!Li1P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 1272w, https://substackcdn.com/image/fetch/$s_!Li1P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Li1P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png" width="803" height="1125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1125,&quot;width&quot;:803,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Li1P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 424w, https://substackcdn.com/image/fetch/$s_!Li1P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 848w, https://substackcdn.com/image/fetch/$s_!Li1P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 1272w, https://substackcdn.com/image/fetch/$s_!Li1P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faff6dc68-c00c-4aca-8334-fd93b8d4170e_803x1125.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A labeling session (source: <a href="https://docs.databricks.com/aws/en/mlflow3/genai/human-feedback/concepts/labeling-sessions">Databricks documentation</a>) is a special type of run within MLflow. Databricks recommends adding specific traces to a labelling session beforehand-- the custom or built-in review app then connects to that labeling session and exposes the traces to SMEs. The app allows us to just interact with the MLflow client in a specific way. This requires us to pre-select traces.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S6YX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S6YX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 424w, https://substackcdn.com/image/fetch/$s_!S6YX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 848w, https://substackcdn.com/image/fetch/$s_!S6YX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 1272w, https://substackcdn.com/image/fetch/$s_!S6YX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S6YX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S6YX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 424w, https://substackcdn.com/image/fetch/$s_!S6YX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 848w, https://substackcdn.com/image/fetch/$s_!S6YX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 1272w, https://substackcdn.com/image/fetch/$s_!S6YX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af50aad-ee37-4f49-89f4-5d0e94030a9f_1600x840.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To create a Slackbot that can perform the same tasks as a custom review app, we will need to host it on a Databricks App. In this app, we are going to use labeling sessions slightly differently. Instead of interacting with pre-selected traces, we will allow SMEs to interact with the agent directly, creating traces and adding them to an already- created labeling session immediately. Then, the SME can add feedback via Slack interactions.</p><h1>Building the Slackbot Review App</h1><p><a href="https://github.com/veenaramesh/custom-slack-review-app">Follow along with the code here. </a></p><p>This is the experience we want:</p><ol><li><p>Human experts ask questions in a Slack channel.</p></li><li><p>The agent answers the question in the same Slack thread.</p></li><li><p>Human experts provide feedback via Slack shortcuts.</p></li></ol><p>Therefore, our Slackbot should:</p><ol><li><p>Listen to messages in Slack.</p></li><li><p>Call our agent in Databricks.</p></li><li><p>Collect feedback from SMEs in Slack.</p></li><li><p>Annotate MLflow traces with that feedback.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7MQj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7MQj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 424w, https://substackcdn.com/image/fetch/$s_!7MQj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 848w, https://substackcdn.com/image/fetch/$s_!7MQj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 1272w, https://substackcdn.com/image/fetch/$s_!7MQj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7MQj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png" width="1174" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1174,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7MQj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 424w, https://substackcdn.com/image/fetch/$s_!7MQj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 848w, https://substackcdn.com/image/fetch/$s_!7MQj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 1272w, https://substackcdn.com/image/fetch/$s_!7MQj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd9a906f-1769-47fe-a536-bd05ea1145d1_1174x626.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ol><h2>Some housekeeping</h2><p>Before we get started with writing the Databricks app, we will first need to create the following: </p><h3>Creating an app in Slack</h3><p>First, let&#8217;s create an application in Slack. For more detailed instructions, <a href="https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Fpython.plainenglish.io%2Flets-create-a-slackbot-cause-why-not-2972474bf5c1">check out this Medium blog post.</a></p><p>I have included the app manifest for the Slackbot with all necessary configurations, but check out the necessary scope and permissions for the bot. We will definitely need the scopes: (1) chat:write (2) groups:read (3) im:read (4) mpim:history (5) commands.</p><p>Once you have installed the app in Slack, you will be given a Bot User oauth token. Save this securely. We will need to use that in our app.</p><h3>Creating a Databricks App</h3><p>Databricks Apps makes hosting straightforward, as each app has an associated Service Principal. All we need to do is ensure that the Service Principal has access to our MLflow experiment and agent endpoint.</p><p>Using the CLI, we can create the app:</p><p><code>databricks apps create slackbot</code></p><p>Sync local files to the Databricks workspace:</p><p><code>databricks sync . &#8220;/Users/$DATABRICKS_USERNAME/slackbot-app&#8221;</code></p><p>Then, deploy:</p><p><code>databricks apps deploy slackbot --source-code-path /Workspace/Users/$DATABRICKS_USERNAME/agent-proto</code></p><h3>Creating a MLflow labeling session</h3><p>We should also create a labeling session that we will use within our MLflow experiment. This creates a persistent Mlflow run that we will link all Slack-generated traces to. You can do this in a notebook with the SDK or through the MLflow experiment UI. </p><pre><code>import mlflow.genai.labeling as labeling

import mlflow.genai.label_schemas as schemas

# Create a simple labeling session with built-in schemas

session = labeling.create_labeling_session(
    name=&#8221;customer_service_review_jan_2024&#8221;, 
    assigned_users=[&#8221;alice@company.com&#8221;, &#8220;bob@company.com&#8221;],
    label_schemas=[schemas.EXPECTED_FACTS]  
    # Required: at least one schema needed 
)</code></pre><p>Source: <a href="https://docs.databricks.com/aws/en/mlflow3/genai/human-feedback/concepts/labeling-sessions">Databricks documentation.</a></p><h2>1. Initializing the Slack client</h2><p>In our Databricks App, using the Slack SDK, we can easily connect to our Slack App:</p><pre><code>def get_slack_auth():
    w = WorkspaceClient()
    token_bot = dbutils.secrets.get(scope=&#8221;brickbrain-scope&#8221;, key=&#8221;slack-bot-token&#8221;)    
    return token_bot

def start_slack_client():
    logger.info(&#8221;Initalized slack client. &#8220;)
    ssl_context = ssl.create_default_context()
    ssl_context.check_hostname = False
    ssl_context.verify_mode = ssl.CERT_NONE
    token_bot = get_slack_auth()
    client = slack_sdk.WebClient(token=token_bot, ssl=ssl_context)
    return App(client=client, process_before_response=False)

app = start_slack_client()
</code></pre><p>Note: store the Slack token in Databricks Secrets for security. Ensure your Service Principal has permissions to that secret scope.</p><h2>2. Listening to events</h2><p>Depending on the permissions given to your application, your slackbot will be able to receive and be able to respond to different events. Take a look at the full list of the events (source: <a href="https://docs.slack.dev/reference/events/">Slack documentation</a>). </p><p>First, let&#8217;s take a look at the message event, which observes whether or not a message was sent to a channel. In the example, I am observing every event that is sent to a channel. If you want to minimize the scope, you can select a message subtype or naively use string manipulation. I am going to be using <a href="https://docs.slack.dev/tools/bolt-python/">slack-bolt</a> moving forward to respond and take actions as the bot.</p><p>Bolt has many decorators that we can use to listen or observe events. For example, when observing the message event, I can declare the following:</p><pre><code>@app.event(&#8221;message&#8221;)
def llm_response(event, say, client):
    logger.info(f&#8221;Message received - User: {event[&#8217;user&#8217;]}, Text: {event[&#8217;text&#8217;][:20]}...&#8221;)
   &lt;...&gt;</code></pre><p>For different types of &#8220;listeners&#8221;, we can have different function arguments: </p><ul><li><p><code>payload</code>: also accessible via the alias corresponding to the method name that the listener is passed to (message, event, action, shortcut, view, command, options).</p><ul><li><p>In this case, <code>event</code> == payload</p></li></ul></li><li><p><code>say</code>: function send a message to the channel associated with the event.</p></li><li><p><code>ack</code>: function that must be acknowledged that an incoming event was received by the app.</p></li><li><p><code>client</code>: web API client that uses the token associated with that event.</p></li><li><p><code>logger</code></p></li></ul><p>This is not a complete list! But these are the most important ones for our use case (source: <a href="https://docs.slack.dev/tools/bolt-js/reference/#listener-function-arguments">Slack documentation</a>). </p><h2>3. Calling the agent</h2><p>In our app, we want to respond to messages sent to the channel. We can easily trigger an LLM call now. However, in order to add feedback to the trace, we need to get the trace ID. When interacting with a Databricks endpoint, we can do this by setting the variable <code>return_trace</code> to True. </p><pre><code>        input_data = {
            &#8220;input&#8221;: history + [{&#8221;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;:  message_text}],
            &#8220;databricks_options&#8221;: {&#8221;return_trace&#8221;: True}
        }

        response = mlflow_client.predict(endpoint=ENDPOINT_NAME, inputs=input_data)</code></pre><p>The response output will then give me the trace ID: </p><pre><code>        trace_id = response[&#8217;databricks_output&#8217;][&#8217;trace&#8217;][&#8217;info&#8217;][&#8217;trace_id&#8217;]</code></pre><h2>4. Responding to the message</h2><p>To respond within a thread, we will need to use the client API. Recall that the listener argument &#8220;say&#8221; is offered with most events. However, &#8220;say&#8221; does not allow us to respond within a thread. </p><p>LLMs often use and produce Markdown as an output format. It is important to note that Slack uses its own markdown language, and although most basic syntax support is provided, some elements are absent. Take a look at what is supported <a href="https://www.markdownguide.org/tools/slack/.">here</a>.</p><p>If you want to ensure that the output is stylized in the same way that the LLM intended, I would suggest looking at manually converting the Markdown text into Slack&#8217;s mrkdwn format. This would require some string manipulation with regex (source: <a href="https://github.com/fla9ua/markdown_to_mrkdwn">Github repo</a>).</p><pre><code>    result = client.chat_postMessage(
        channel=event[&#8217;channel&#8217;],
        blocks=[
            {
                &#8220;type&#8221;: &#8220;section&#8221;,
                &#8220;text&#8221;: {&#8221;type&#8221;: &#8220;mrkdwn&#8221;, &#8220;text&#8221;: slack_response}
            },
        ],
        text=slack_response,
        thread_ts=event[&#8217;ts&#8217;],  # reply in the thread
        metadata={
            &#8220;event_type&#8221;: &#8220;agent_response&#8221;,
            &#8220;event_payload&#8221;: {
                &#8220;trace_id&#8221;: trace_id, # trace id in metadata
                &#8220;thread_id&#8221;: event[&#8217;ts&#8217;],
                &#8220;resource_type&#8221;: &#8220;AGENT_RESPONSE&#8221;,
            }
        }
    )</code></pre><p>Using the Client API, we can also attach metadata to each message. This makes it easier to retrieve information across sessions, like <code>trace_id</code>.</p><p>We have designed the response simply, but Slack has a lot of options on how to design a Slack message. Take a look at <a href="https://app.slack.com/block-kit-builder/T02TL6JB2">Block Kit Builder</a> to see how you can structure your Slack message with buttons, dividers, images, inputs, etc. </p><h2>5. Adding feedback</h2><p>We will use a Slack message shortcut to log feedback. I found this method to be the most straightforward and easiest to customize. However, we can also use Slack message blocks to design a feedback form as well.</p><p>When I use the add_feedback shortcut, this triggers the event &#8220;message_shortcut&#8221;. Because we have added the trace id to the metadata of the agent response Slack message, we can access that trace_id in the Slack shortcut.</p><pre><code>@app.message_shortcut(&#8221;log_feedback&#8221;)
def handle_log_feedback_shortcut(ack, shortcut, client):
    ack()
    logger.info(f&#8221;Feedback message shortcut triggered by user: {shortcut[&#8217;user&#8217;][&#8217;name&#8217;]}&#8221;)
    
    message = shortcut[&#8217;message&#8217;]
    message_ts = message[&#8217;ts&#8217;]
    
    metadata = message.get(&#8217;metadata&#8217;, {})</code></pre><p>When handling this event, we can use the Client API to open a view with the formatted feedback form. We can add comments and binary feedback. These inputs will be translated as input for <code>mlflow.log_feedback()</code>. However, <code>log_feedback</code> can take all sorts of values: integers, floats, categorical values, and multiple-category feedback (source: <a href="https://docs.databricks.com/aws/en/mlflow3/genai/tracing/concepts/log-assessment">Databricks documentation</a>). So, feel free to customize this to what your evaluation system needs.</p><p>Since this is a form, once we hit submit, we will need to respond to another Slack event as well. This will create another Slack event called &#8220;view&#8221;. This is where we actually handle the feedback submission and use <code>mlflow.log_feedback().</code> For your review app, you can also log expectations (aka ground truth) using another function <code>log_expectations()</code>.</p><h2>6. Linking everything to a labeling session</h2><p>We still have not linked these traces to a labeling session. To do so, we fetch the run ID associated with the labeling session and the trace_id:</p><pre><code>def link_traces_to_run(run_id: str, trace_ids: List[str]) -&gt; Dict[str, Any]:
    creds = get_databricks_host_creds()
    url = _get_mlflow_api_url(&#8217;/traces/link-to-run&#8217;, creds=creds)
    data = {&#8217;run_id&#8217;: run_id, &#8216;trace_ids&#8217;: trace_ids}

############################
in the @app.event function: 
############################

link_traces_to_run(run_id=LABELLING_SESSION.mlflow_run_id, trace_ids=[trace_id])
logger.info(f&#8221;Traces linked to run - Run: {LABELLING_SESSION.mlflow_run_id}, Trace: {trace_id}&#8221;)
</code></pre><h2>7. Handling with conversation history</h2><p>Slack threads make conversation history management simple. Instead of requiring a database to checkpoint, we can simply fetch the threads themselves. Using the client API and the thread ID:</p><pre><code>def get_thread_messages(client, channel, thread_ts):
    response = client.conversations_replies(
        channel=channel,
        ts=thread_ts,
        inclusive=True,  # Include the parent message
        limit=10  # Max messages to retrieve
    )
    logger.info(f&#8221;Retrieved {len(response[&#8217;messages&#8217;])} messages from thread {thread_ts}&#8221;)
    return response[&#8217;messages&#8217;]</code></pre><h1>Happy reviewing!</h1><p><a href="https://github.com/veenaramesh/custom-slack-review-app">Take a look at the full implementation and code here. </a></p><p>There are no limitations in how you can use MLflow review apps! You can easily bring the feedback mechanism in MLflow to Slack, reducing any friction in the feedback process. Thanks for reading.</p>]]></content:encoded></item><item><title><![CDATA[Your Low-Code Shortcut to Production-Grade Agent on Databricks]]></title><description><![CDATA[Watch now (16 mins) | Follow our end-to-end journey building a knowledge assistant with AgentBricks and establishing a continuous evaluation loop with MLflow to improve it.]]></description><link>https://www.databricksters.com/p/your-low-code-shortcut-to-production</link><guid isPermaLink="false">https://www.databricksters.com/p/your-low-code-shortcut-to-production</guid><dc:creator><![CDATA[Canadian Data Guy]]></dc:creator><pubDate>Tue, 18 Nov 2025 16:02:42 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/179205898/d0c9755576a97a68d744cbf23dc56c4d.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>The code base is available at https://github.com/jiteshsoni/BrickBrain, but you likely don&#8217;t need it. You can simply use the Databricks UI to create your agent.</p>]]></content:encoded></item><item><title><![CDATA[The Goldilocks Approach: Hierarchical Classification with AI_QUERY in Databricks]]></title><description><![CDATA[Leverage Databricks `AI_QUERY` to tackle complex, context-dependent hierarchical classification problems that traditional ML and simple LLM prompts cannot solve.]]></description><link>https://www.databricksters.com/p/the-goldilocks-approach-hierarchical</link><guid isPermaLink="false">https://www.databricksters.com/p/the-goldilocks-approach-hierarchical</guid><dc:creator><![CDATA[Mandy Baker]]></dc:creator><pubDate>Tue, 11 Nov 2025 16:01:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wbqC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wbqC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wbqC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 424w, https://substackcdn.com/image/fetch/$s_!wbqC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 848w, https://substackcdn.com/image/fetch/$s_!wbqC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!wbqC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wbqC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png" width="284" height="426" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1536,&quot;width&quot;:1024,&quot;resizeWidth&quot;:284,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wbqC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 424w, https://substackcdn.com/image/fetch/$s_!wbqC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 848w, https://substackcdn.com/image/fetch/$s_!wbqC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!wbqC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F848c03a3-6226-4911-8897-83ebe37ad820_1024x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Classification Machine Learning problems have been around for a long time. Traditional approaches like decision trees and SVMs have dominated the landscape, and with the rise of NLP, we gained powerful tools for text classification using techniques like TF-IDF and word embeddings. Now, we have Large Language Models that understand context and nuance in ways that traditional models often can&#8217;t match. This opens up opportunities to automate classification processes, especially in complex scenarios where rigid classification rules break down, such as hierarchical classification.</p><p>How can you take advantage of the latest frontier in classification? This blog will show you how to use AI_QUERY in Databricks to run batch inference for complex hierarchical classifications with LLMs.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>As always, choose the best approach for your problem! There are many occasions when a classical approach is a better choice. However, if:</p><ul><li><p><strong>Your historical data is not trustworthy</strong> (e.g. it may be inconsistent, difficult to validate, or labeled by multiple people with different interpretations; you can&#8217;t trust that the patterns held therein are going to be useful in a classical machine learning setup)</p></li><li><p><strong>Your categories are flexible </strong>(e.g. you need to be able to add a new category on the whims of your business partners without retraining an entire model)</p></li><li><p><strong>You&#8217;re dealing with nuanced, context-dependent classifications</strong> (e.g. rigid rules fail; maybe &#8220;billing dispute&#8221; vs &#8220;billing inquiry&#8221; depends on subtle linguistic cues that are difficult to encode as features)</p></li></ul><p>&#8230; then read on to learn how to use AI_QUERY for hierarchical classification!</p><h2><strong>The Goal</strong></h2><p>Let&#8217;s say that we&#8217;re a telecommunications company and we have a lot of customer call transcripts coming in that we want to classify along four levels: <strong>Domain (Level 1) &#8594; Category (Level 2) &#8594; Problem Type (Level 3) &#8594; Root Cause (Level 4)</strong>. This hierarchy creates a comprehensive support taxonomy covering everything from network outages and billing disputes to device issues and order management, so our business team can gain a lot of insights once all of these transcripts are classified correctly. The only obstacle is actually classifying these transcripts. How should we do that?</p><p>If you&#8217;re already familiar with the power of <a href="https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query">AI_QUERY</a>, which allows you to query LLM endpoints via SQL for batch workloads, you could jump right into the Databricks SQL editor or a notebook and use the SQL function right away, relying on a monster prompt that attempts to take each record and assign the right hierarchy in one shot.</p><h2><strong>The Problem with the &#8220;Everything at Once&#8221; Approach</strong></h2><p>Unfortunately, if we have 12 Domains, and each Domain contains 5-6 Categories, and each Category has 3+ Problem Types, and each Problem Type has 5+ Root Causes, we have at minimum 900 hierarchical paths that a single transcript could take.</p><p>Combinatorial explosion! From a statistical perspective, we&#8217;re asking an LLM to perform a 900-class classification problem&#8212;the kind of task where even specialized neural networks start sweating.</p><p>There are a couple of issues with treating each of the 900 options as its own unique value:</p><ol><li><p><strong>More choices = worse outcomes:</strong> Imagine a 900x900 confusion matrix; a classification problem of this size will very likely result in poor precision and poor recall.</p></li><li><p><strong>The &#8220;needle in a haystack&#8221; problem:</strong> When you present an LLM with a massive list of options, accuracy degrades significantly. False Negatives are easy when there are 899 other options, some of which sound very similar.</p></li><li><p><strong>Prompt complexity becomes unmaintainable (and more expensive):</strong> Your prompt becomes a short story, making it difficult to debug, version control, and understand what instructions the model is actually following - and on top of that you&#8217;re paying for all those tokens each time you send a request!</p></li></ol><h2><strong>What About Individual Level Classification?</strong></h2><p>On the other hand, if we look at each level individually, we greatly reduce the number of options per record. We only need to predict across 12 Domains, for example, and once we have the Domain, we only need to predict across 5 or 6 Categories, and so on down the levels. This approach will improve shrink the prediction space and likely improve accuracy, precision, and recall. The downside here is that we&#8217;re now running four sequential SQL queries. Depending on the model we&#8217;re using, this could get expensive and/or slow.</p><p>By now you might be thinking, &#8220;So that porridge is too cold, and this one is too hot&#8230; How do I use AI_QUERY for hierarchical classification???&#8221;</p><h2><strong>The Goldilocks Solution: Hierarchically-Aware Classification</strong></h2><p>To get just-right porridge, we can find a balance of the two approaches above by running two queries: the first will use a simple prompt to classify Level 1, and then the second will use a dynamic prompt to classify Levels 2-4. Here&#8217;s our recipe:</p><ol><li><p><strong>Create a hierarchies table</strong></p></li><li><p><strong>Build a gold-standard evaluation dataset</strong></p></li><li><p><strong>Execute your Domain (Level 1) prompt</strong></p></li><li><p><strong>Execute your Levels 2-4 prompt (with dynamic hierarchy filtering)</strong></p></li></ol><h3><strong>Step 1: Build Your Hierarchy Table</strong></h3><p>First, we need to build a table of our classification hierarchies. This table will be the reference for our second dynamic AI_QUERY prompt. It should look something like the schema here:</p><pre><code><code>transcript_classification_hierarchy_table
&#9500;&#9472;&#9472; level_1 (string)             -- Domain
&#9500;&#9472;&#9472; level_2_map (string)         -- Category
&#9500;&#9472;&#9472; level_3_map (string)         -- Problem Type
&#9492;&#9472;&#9472; level_4_map (string)         -- Root Cause</code></code></pre><p>Each row represents a valid path through your hierarchy. This is your source of truth that ensures we only provide valid paths to the LLM.</p><p>Here&#8217;s a sample record:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5s35!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5s35!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 424w, https://substackcdn.com/image/fetch/$s_!5s35!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 848w, https://substackcdn.com/image/fetch/$s_!5s35!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 1272w, https://substackcdn.com/image/fetch/$s_!5s35!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5s35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png" width="1456" height="286" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:286,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:189358,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/177490556?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5s35!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 424w, https://substackcdn.com/image/fetch/$s_!5s35!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 848w, https://substackcdn.com/image/fetch/$s_!5s35!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 1272w, https://substackcdn.com/image/fetch/$s_!5s35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6830b421-4ffc-4e1f-bede-f0481bbb6ae5_2036x400.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3><strong>Step 2: Build Your Evaluation Dataset</strong></h3><p>Next, we need to make sure we have a solid evaluation dataset. If possible, grab an unsuspecting nearby SME and beg them to help you build this dataset.</p><p>The number of examples depends on how many categories you have in your dataset, but aim for at least 50-100 labeled examples covering your major categories. Try to avoid a scenario where you spend a lot of time prompt engineering, only to realize your ground truths are not all that accurate to begin with and you&#8217;ve been tuning your prompt in the wrong direction (speaking from personal experience, this is not fun).</p><p><strong>Bonus:</strong> Having a gold-standard evaluation dataset allows you to continually try new models and model versions as they&#8217;re released and use the cheapest, fastest version that passes your evaluation metrics. This is deeply unsexy work, but it is possibly the most valuable.</p><h3><strong>Step 3: Classify Level 1 (Domain)</strong></h3><p>Once you have your hierarchical table and evaluation dataset, we&#8217;re ready to run our first AI_QUERY SQL statement: classifying Level 1. In this first batch inference round, we focus only on classifying the Domain. Since all the hierarchies stem from this first choice, getting it right is critical&#8212;but with only 12 options, our LLM has a much better chance of getting the classifications right. Here&#8217;s a sample query, using Llama 3.3 70b. We concatenate the prompt, which consists of the 12 Domain options, as well as the transcript itself and an instruction to only return the category name.</p><pre><code><code>%sql
CREATE OR REPLACE TABLE catalog.schema.call_transcripts_l1_predictions AS
SELECT
   call_id,
   transcript_text,
   AI_QUERY(&#8216;databricks-meta-llama-3-3-70b-instruct&#8217;,
      CONCAT(
       &#8216;Classify the transcript into one of the following categories:
       - Network &amp; Connectivity
       - Billing &amp; Payments
       - Account Management
       - Device &amp; Equipment
       - Service Provisioning
       - Technical Support
       - Mobile Services
       - Internet Services
       - TV &amp; Streaming
       - Voice Services
       - Security &amp; Privacy
       - Sales &amp; Orders\\n\\n&#8217;,
       &#8216;Transcript: &#8216;, transcript_text, &#8216;\\n\\n&#8217;,
       &#8216;Return only the category name.&#8217;     
    )  
  ) 
AS level_1_classification
FROM catalog.schema.call_transcripts_raw</code></code></pre><p>Spend time here refining the prompt and testing out a variety of models to get high accuracy at Level 1 before moving on to the next step. In this example, my prompt is very simple: I don&#8217;t even bother defining my categories. In your own use cases, adding in descriptions of each category may be beneficial.</p><h3><strong>Step 4: Classify Levels 2-4</strong></h3><p>Once we&#8217;re happy with the Level 1 accuracy, we create our second AI_QUERY SQL statement. This time, we will write a much more complex prompt that 1) builds upon the L1 classifications and 2) joins with the hierarchical mappings from Step 1. This way, our prompt only includes the valid options for Levels 2 through 4 that correspond to the predicted Level 1 Domain.</p><pre><code>%sql
CREATE OR REPLACE TABLE catalog.schema.call_transcripts_all_classifications AS
SELECT 
  call_id,
  transcript_text,
  l1.level_1_classification,
  AI_QUERY(&#8216;databricks-meta-llama-3-3-70b-instruct&#8217;,
    CONCAT(
      &#8216;Classify the call transcript into Level 2, Level 3, and Level 4 subcategories. Strictly adhere to the hierarchies as listed below.\n\n&#8217;,
      &#8216;Transcript: &#8216;, l1.transcript_text, &#8216;\n&#8217;,
      &#8216;Level 1: &#8216;, l1.level_1_classification, &#8216;\n\n&#8217;,
      &#8216;Valid Level 2 options: &#8216;, hier.level_2_map, &#8216;\n\n&#8217;,
      &#8216;Valid Level 3 options by Level 2: &#8216;, hier.level_3_map, &#8216;\n\n&#8217;,
      &#8216;Valid Level 4 options by Level 3: &#8216;, hier.level_4_map, &#8216;\n\n&#8217;,
      &#8216;Return ONLY valid JSON (no markdown): {&#8221;level_2&#8221;: &#8220;X&#8221;, &#8220;level_3&#8221;: &#8220;Y&#8221;, &#8220;level_4&#8221;: &#8220;Z&#8221;}\n&#8217;
    )
  ) AS classification_json
FROM catalog.schema.call_transcripts_l1_predictions l1
INNER JOIN catalog.schema.transcript_classification_hierarchy_table hier
   ON l1.level_1_classification = hier.level_1</code></pre><p>The query above might feel a bit abstract, so let&#8217;s take a look at what the prompt will actually look like for the L1 Domain &#8220;Billing &amp; Payments&#8221;:</p><pre><code><code>Classify the call transcript into Level 2, Level 3, and Level 4 subcategories. Strictly adhere to the hierarchies as listed below.

Transcript: [transcript not shown for brevity]

Level 1: Billing &amp; Payments

Valid Level 2 options: Auto-Pay, Invoice Issues, Payment Plans, Payment Processing, Refunds

Valid Level 3 options by Level 2: {&#8221;Invoice Issues&#8221;: [&#8221;Cannot Access Invoice&#8221;, &#8220;Incorrect Charges&#8221;, &#8220;Missing Credits&#8221;], &#8220;Payment Processing&#8221;: [&#8221;Duplicate Payment&#8221;, &#8220;Payment Declined&#8221;, &#8220;Payment Not Posted&#8221;], &#8220;Auto-Pay&#8221;: [&#8221;Cannot Disable&#8221;, &#8220;Not Working&#8221;, &#8220;Setup Failed&#8221;], &#8220;Payment Plans&#8221;: [&#8221;Application Denied&#8221;, &#8220;Early Payoff Request&#8221;, &#8220;Missed Payment&#8221;], &#8220;Refunds&#8221;: [&#8221;Incorrect Amount&#8221;, &#8220;Not Received&#8221;, &#8220;Request Denied&#8221;]}

Valid Level 4 options by Level 3: {&#8221;Incorrect Charges&#8221;: [&#8221;Double Billing&#8221;, &#8220;Proration Error&#8221;, &#8220;Service Not Ordered&#8221;, &#8220;Tax Calculation Error&#8221;, &#8220;Wrong Rate Applied&#8221;], &#8220;Missing Credits&#8221;: [&#8221;Adjustment Not Posted&#8221;, &#8220;Manual Credit Not Entered&#8221;, &#8220;Promotion Not Applied&#8221;, &#8220;Refund Not Processed&#8221;, &#8220;System Processing Delay&#8221;], &#8220;Cannot Access Invoice&#8221;: [&#8221;Account Access Restricted&#8221;, &#8220;Email Not Received&#8221;, &#8220;PDF Generation Error&#8221;, &#8220;Portal Login Issue&#8221;, &#8220;System Maintenance&#8221;], &#8220;Payment Declined&#8221;: [&#8221;Bank Fraud Detection&#8221;, &#8220;Card Expired&#8221;, &#8220;Incorrect Card Information&#8221;, &#8220;Insufficient Funds&#8221;, &#8220;Payment Gateway Error&#8221;], &#8220;Payment Not Posted&#8221;: [&#8221;Bank Processing Time&#8221;, &#8220;Manual Entry Required&#8221;, &#8220;Processing Delay&#8221;, &#8220;System Synchronization Issue&#8221;, &#8220;Wrong Account Number Entered&#8221;], &#8220;Duplicate Payment&#8221;: [&#8221;Auto-Pay and Manual Payment&#8221;, &#8220;Browser Refresh Error&#8221;, &#8220;Customer Submitted Twice&#8221;, &#8220;Multiple Payment Methods Active&#8221;, &#8220;System Glitch&#8221;], &#8220;Not Working&#8221;: [&#8221;Auto-Pay Disabled&#8221;, &#8220;Bank Account Closed&#8221;, &#8220;Insufficient Funds&#8221;, &#8220;Payment Method Expired&#8221;, &#8220;System Configuration Error&#8221;], &#8220;Setup Failed&#8221;: [&#8221;Account Not Eligible&#8221;, &#8220;Bank Verification Failed&#8221;, &#8220;Incompatible Payment Type&#8221;, &#8220;Invalid Account Information&#8221;, &#8220;Portal Technical Issue&#8221;], &#8220;Cannot Disable&#8221;: [&#8221;Balance Owed&#8221;, &#8220;Contractual Requirement&#8221;, &#8220;Pending Transaction&#8221;, &#8220;Portal Access Issue&#8221;, &#8220;System Processing Lag&#8221;], &#8220;Application Denied&#8221;: [&#8221;Account Not Eligible&#8221;, &#8220;Balance Too Low&#8221;, &#8220;Credit Check Failed&#8221;, &#8220;Existing Plan Active&#8221;, &#8220;Previous Default History&#8221;], &#8220;Missed Payment&#8221;: [&#8221;Auto-Pay Not Set Up&#8221;, &#8220;Customer Forgot&#8221;, &#8220;Financial Hardship&#8221;, &#8220;Payment Date Confusion&#8221;, &#8220;Payment Method Failed&#8221;], &#8220;Early Payoff Request&#8221;: [&#8221;Approved With Fee&#8221;, &#8220;Approved Without Fee&#8221;, &#8220;Denied Per Terms&#8221;, &#8220;Requires Balance Verification&#8221;, &#8220;Under Review&#8221;], &#8220;Not Received&#8221;: [&#8221;Check Lost in Mail&#8221;, &#8220;Processing Time Not Elapsed&#8221;, &#8220;Refund Method Changed&#8221;, &#8220;System Error&#8221;, &#8220;Wrong Bank Account&#8221;], &#8220;Incorrect Amount&#8221;: [&#8221;Calculation Error&#8221;, &#8220;Credits Applied First&#8221;, &#8220;Fees Deducted&#8221;, &#8220;Partial Refund Per Policy&#8221;, &#8220;Tax Adjustment&#8221;], &#8220;Request Denied&#8221;: [&#8221;Contractual Terms&#8221;, &#8220;Outside Refund Window&#8221;, &#8220;Previous Refund Given&#8221;, &#8220;Promotional Restriction&#8221;, &#8220;Service Already Used&#8221;]}

Return ONLY valid JSON (no markdown): {&#8221;level_2&#8221;: &#8220;X&#8221;, &#8220;level_3&#8221;: &#8220;Y&#8221;, &#8220;level_4&#8221;: &#8220;Z&#8221;}</code></code></pre><p>Now, instead of asking the LLM to consider 900 paths, we&#8217;re saying, &#8220;Hey, you already decided this is a &#8216;Network &amp; Connectivity&#8217; issue. Here are the valid paths within that domain. Pick one.&#8221; We&#8217;ve gone from a 900-class problem to a ~75-class problem. Our accuracy, precision, and recall will benefit.</p><h3><strong>Now We Have Our Hierarchical Classifications!</strong></h3><p>With this approach, you get:</p><ul><li><p><strong>Improved accuracy</strong> by reducing overall cognitive load on the LLM</p></li><li><p><strong>Lower token costs</strong> by only including relevant options in each prompt</p></li><li><p><strong>Valid hierarchies</strong> that make sense together</p></li><li><p><strong>Easier debugging and iteration</strong> when you evaluate Level 1 accuracy independently</p></li><li><p><strong>Flexibility</strong> to add new categories without rewriting your entire classification logic</p></li></ul><h2><strong>Wrapping Up</strong></h2><p>AI_QUERY in Databricks makes it remarkably easy to leverage LLMs for complex classification tasks without managing API calls, rate limits, or infrastructure. By combining it with a hierarchically-aware approach, you can tackle complex classification problems. Break down the complexity, give your LLM manageable choices at each step, and let the structure of your domain guide the way.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.databricksters.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Databricksters! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>