Feb 25, 2025

As organizations accumulate vast amounts of data, managing storage costs becomes a critical challenge.

14 Comments

Canadian Data Guy

Sep 12, 2025

@Yogesh : Is Partitioned table negotiable, do you suspect this will work with Liquid Clustered table as well?

https://www.databricksters.com/i/157838141/step-partition-data-by-date

super helpful

How do you not archive things under the _delta_log prefix in an s3 lifecycle policy? They're under the same prefix as the parquet files.

do we really need to apply OPTIMIZE predicates? files optimized previously would not be optimized again (at least afaik)

thank you

Very helpful 👌

Interesting.

Very nice article

Deepthi Venkitaramanan

Feb 28, 2025

As a product manager who has to run analytics tasks over historical data to arrive at conclusions, I found the sections on partitioning by date and using that to speed up queries and the section on lifecycle policy and retention and how to optimize that to minimize cost eye opening. Thanks for sharing this!

Insightful read

Archiving strategies are often overlooked but play such a crucial role in balancing cost and performance. Loved how you highlighted the synergy between Delta Lake and cloud lifecycle policies looking forward to more of your expert takes!

Reply

Share