Data Science Landing Zone
π§ This capability reference page is a draft.
If you want to be notified when the capability reference page is finished, click here.
Data science teams in your organization that want to run data science workloads like AI or ML models may not have dedicated software engineering or cloud infrastructure engineering skills among their members. By providing a data science landing zone allowing only a small subset of relevant cloud services via Service and Location Restrictions you can provide these teams a safe and productive environment to run interactive computing environments like Jupyter notebooks against large data lakes and cloud-based data warehouses. Going even further, cloud foundation teams can design a data science landing zone to also allow access to dedicated cloud infrastructure like GPUs for training models or rapidly scaling compute capacity.
π€οΈ Based on our experience data science landing zones are most useful for developing and testing models. Production models are often run by dedicated teams with significant software and operations experience together with other workloads, e.g. as part of an application living in a Cloud-native Landing Zone.
Here are some example of simple landing zone designs for data science workloads
GCP Example
a central BigQuery data warehouse
different data science each receive their own GCP project and read-only access to the data warehouse, either as part of the landing zone or as an optional Managed Data Lake access service
analysts can run their own queries, either directly from Google Cloud Console, Looker Dashboards or third party solutions
data science teams get charged transparently for all the big query jobs they consumed, enabling Chargeback via consumption cost allocation via a Monthly cloud tenant billing report
Related Tools
Currently no tool implementations documented. Contributions welcome!