<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Databricksters: Brickette]]></title><description><![CDATA[Because these are smaller nuggets of wisdom]]></description><link>https://www.databricksters.com/s/brickette</link><image><url>https://substackcdn.com/image/fetch/$s_!zPJJ!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff49ecae-7c56-403c-9389-61b28de6a50f_1280x1280.png</url><title>Databricksters: Brickette</title><link>https://www.databricksters.com/s/brickette</link></image><generator>Substack</generator><lastBuildDate>Tue, 30 Jun 2026 18:23:24 GMT</lastBuildDate><atom:link href="https://www.databricksters.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Soni]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[databricksters@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[databricksters@substack.com]]></itunes:email><itunes:name><![CDATA[Canadian Data Guy]]></itunes:name></itunes:owner><itunes:author><![CDATA[Canadian Data Guy]]></itunes:author><googleplay:owner><![CDATA[databricksters@substack.com]]></googleplay:owner><googleplay:email><![CDATA[databricksters@substack.com]]></googleplay:email><googleplay:author><![CDATA[Canadian Data Guy]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why Your information_schema Queries Are Killing the Metastore]]></title><description><![CDATA[And the 3 Functions to Use Instead]]></description><link>https://www.databricksters.com/p/why-your-information_schema-queries</link><guid isPermaLink="false">https://www.databricksters.com/p/why-your-information_schema-queries</guid><dc:creator><![CDATA[Canadian Data Guy]]></dc:creator><pubDate>Tue, 30 Jun 2026 15:01:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!isGN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Sometimes engineers want to check if a table exists before writing or merging data. To do this, they often loop through a list of tables and query <code>information_schema.tables</code>.</p><div><hr></div><h2>&#10060; The Anti-Pattern: Full Catalog Scans in a Loop</h2><p>Here is the code that can causes pipelines to hit metastore rate limits or fail entirely:</p><pre><code><code># This forces a full scan of the entire metastore catalog on every execution
spark.read.table("catalog.information_schema.tables") \
    .filter((col("table_schema") == my_schema) &amp; (col("table_name") == my_table))
</code></code></pre><h3><span>The Technical Bottlene</span>ck</h3><p>The <code>information_schema</code> is built for wide-scope catalog introspection&#8212;such as <span>auditing, broad governance mapping, or checking metadata across an entire environment at once.</span></p><p><span>When you chain a </span><code>.filter()</code><span> in Spark </span><strong><span>after</span></strong><span> loading </span><code>information_schema.tables</code><span>, there is </span><strong><span>zero predicate pushdown</span></strong><span> to the underlying metastore database. The metast</span>ore is forced <span>to enumerate and return </span><em><span>every single object</span></em><span> inside that catalog to the Spark engine, leaving Spark to do the actual row filtering client-side in memory.</span></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!isGN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!isGN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 424w, https://substackcdn.com/image/fetch/$s_!isGN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 848w, https://substackcdn.com/image/fetch/$s_!isGN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 1272w, https://substackcdn.com/image/fetch/$s_!isGN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!isGN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg" width="1456" height="863" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:863,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5561,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/svg+xml&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.databricksters.com/i/203766763?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!isGN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 424w, https://substackcdn.com/image/fetch/$s_!isGN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 848w, https://substackcdn.com/image/fetch/$s_!isGN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 1272w, https://substackcdn.com/image/fetch/$s_!isGN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555d1579-2cf7-46e9-8ad7-2e34682e62ef_1080x640.svg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><span>The Solution: Use True Point-Lookup APIs</span></h2><p><span>If you only need to verify the existence of a single table, you must bypass catalog-wide views and target a single metadata record directly.</span></p><h3><span>Option 1: Programmatic Check via </span><code>tableExists()</code><span> (Recommended)</span></h3><p><span>For pure automation pipelines, the native Spark catalog API handles this efficiently without compiling a heavy relational query pl</span>an. It hits a single metadata record instantly.</p><pre><code><code># Target a single metadata record directly
exists = spark.catalog.tableExists("catalog.schema.table")
</code></code></pre><h3><span>Option 2: SQL Scoped Check via </span><code>SHOW TABLES</code></h3><p><span>If you are working strictly within raw SQL strings, limit your query scope using an explicit schema filter an</span>d a wildcard match rather than querying the entire catalog.</p><pre><code><code>-- Fast, schema-scoped point lookup
SHOW TABLES IN catalog.schema LIKE 'table_name';
</code></code></pre><h3>Option 3: Structural Validation via <code>DESCRIBE TABLE</code></h3><p>Often, you don&#8217;t just need to know if a table exists&#8212;you need to ensure its current column layout matches what your engine expects. You can leverage structural evaluation enclosed in an exception block.</p><pre><code><code>from pyspark.sql.utils import AnalysisException

# Validates existence and structure simultaneously 
try:
    spark.sql("DESCRIBE TABLE catalog.schema.table")
    exists = True
except AnalysisException:
    # Catch AnalysisException specifically. A bare 'except' turns 
    # permission errors and transient timeouts into false "table missing" signals.
    exists = False
</code></code></pre><h2>When Do You Actually Need <code>information_schema</code>?</h2><p>If your objective is broad discovery&#8212;like mapping end-to-end data lineage across the entire workspace&#8212;using <code>information_schema</code> is completely appropriate. However, you should follow this single operational rule: <strong>Read it exactly once at the top of the job, cache it, and filter from memory.</strong></p><pre><code><code># Execute exactly one metastore scan total
all_tables = spark.read.table("catalog.information_schema.tables").cache()

# Filter from the cached memory layer as many times as needed
bronze_tables = all_tables.filter(col("table_schema") == "bronze")
silver_tables = all_tables.filter(col("table_schema") == "silver")
</code></code></pre><div><hr></div><h2>The Ultimate Fix: Cut the Check Entirely</h2><p>Pre-flight existence checks usually exist because a pipeline doesn&#8217;t know whether to create a new table or append to an existing one. If you are using Delta Lake, Delta already solves this for you.</p><p>Using <code>.mode("append")</code> will automatically create the table on the first write, and seamlessly append to it on every run after:</p><pre><code><code>df.write \
  .format("delta") \
  .mode("append") \
  .saveAsTable("catalog.schema.target_table")
</code></code></pre><p>SQL pipelines get the exact same guarantee from a single DDL line at setup:</p><pre><code><code>CREATE TABLE IF NOT EXISTS catalog.schema.target_table (
    id INT,
    data STRING,
    event_time TIMESTAMP
) USING DELTA;

INSERT INTO catalog.schema.target_table SELECT * FROM staging_view;
</code></code></pre><blockquote><p><strong>The Bottom Line:</strong> When you see an existence check before a write in a code review, ask what it&#8217;s actually guarding against. Usually, there&#8217;s no good answer&#8212;and removing it will save your metastore from a world of hurt.</p></blockquote>]]></content:encoded></item></channel></rss>