Apr 25, 2025

Vectorized UDFs, Zero-Copy Arrows & 100× Speed-Ups

5 Comments

I think one of the best ways to do it is to use instance rolls. That way there is no need to pass past any credentials at all. Another thing which you can look at is python custom data sources. So databricks launch like 50 connectors to external things and I think you should look at the code for that and see how other people have handled credentials. Here is one of my other blogs. Maybe that helps. It uses the same python custom data sources https://www.canadiandataguy.com/p/stop-waiting-for-connectors-stream?r=5ehbt&utm_campaign=post&utm_medium=web

Also, could you clarify how did you find my original blog ? ChatGPT or did Google search you here or was it social media?

Saugat Mukherjee

Mar 28

Thanks for this. I have been looking for a blog on this, outside the bare skeleton docs 😊. This is very well written.

Did you ever manage to read spark env inside of your python udf ,especially if you are using a UC persisted python batch udf ? It is interesting that even though the python udf runs on the executors, even setting spark. Executorenv doesn’t work.

Reply (1)

Canadian Data Guy

Mar 28

Are you trying to store credentials inside spark env and then trying to make API calls ?

Reply (1)

Saugat Mukherjee

Mar 29

Exactly ! Just that the credential in this case is the unity catalog service credential name, which is different in different environments. So, I was hoping if I could keep the same persisted function definition and just have different credential name passed through spark env .

Reply (1)

Canadian Data Guy

Mar 29

https://www.canadiandataguy.com/p/stop-waiting-for-connectors-stream?r=5ehbt&utm_campaign=post&utm_medium=web

Databricksters

Everything You Ever Wanted to Know about…