14 Comments
User's avatar
Canadian Data Guy's avatar

@Yogesh : Is Partitioned table negotiable, do you suspect this will work with Liquid Clustered table as well?

https://www.databricksters.com/i/157838141/step-partition-data-by-date

Geethu's avatar

super helpful

Owlcat432's avatar

How do you not archive things under the _delta_log prefix in an s3 lifecycle policy? They're under the same prefix as the parquet files.

Yuriy's avatar

do we really need to apply OPTIMIZE predicates? files optimized previously would not be optimized again (at least afaik)

thank you

Mukesh's avatar

Very helpful 👌

Rahul's avatar

Very nice article

Deepthi Venkitaramanan's avatar

As a product manager who has to run analytics tasks over historical data to arrive at conclusions, I found the sections on partitioning by date and using that to speed up queries and the section on lifecycle policy and retention and how to optimize that to minimize cost eye opening. Thanks for sharing this!

Ranjitha R's avatar

Archiving strategies are often overlooked but play such a crucial role in balancing cost and performance. Loved how you highlighted the synergy between Delta Lake and cloud lifecycle policies looking forward to more of your expert takes!

Sumitha's avatar

Very well explained

Radz's avatar

Very insightful article and helpful with real time examples.