Google Cloud has arrived at your event Cloud Data Summit with several novelties. Among them, the test version of a new data lake storage engine, BigLakereleased as part of Google Cloud’s plans to remove all kinds of data-related boundaries, as well as remove barriers between data lakes and data warehouses, and to make it easier to analyze the data they house.
BigLake is designed to provide a unified interface to any storage layer, including the data lake and data warehouse, regardless of format. It has been developed with the intention of bringing together Google’s experience with running and managing its BigQuery data warehouse and extending it to Google Cloud Storage data lakes. Thus, it intends to combine the best of data lakes and data warehouses in a single service, which manages to abstract from the underlying formats and storage systems.
Additionally, data can be stored in BigQuery or in AWS3 or Azure Data Lake Storage Gen2. Through BigLake, developers gain access to a uniform storage engine, and the ability to query the underlying data storage services through a single system. And without the need to move or duplicate data.
Using policy tags, BigLake allows administrators to configure their security policies at the table, row, and column levels. This includes data stored in Google Cloud, as well as the two supported third-party systems. In parallel, Google’s analytics service, BigQuery Omni, is responsible for activating security controls, which ensure that only the appropriate data flows to tools such as Spark, Presto or TensorFlow. In addition, the service also integrates with Google Dataplex to offer additional data management features.
BigLake will also offer strongly differentiated access controls, and its API will come to Google Cloud. in addition to working with Apache Parquet file formats and open source processing engines, such as Apache Spark or Beam, as well as Delta or Iceberg table formats.
In addition to BigLake, Google has also confirmed that its globally distributed Spanner SQL database will soon have a new feature: change streams. With it, users will be able to track changes to a database in real time. Whether it’s inserts, updates or data deletion.
Google Cloud has also announced that its data science project lifecycle management tool, Vertex AI Workbenchhas already finished its testing phase and its final version is now available. Connected Sheets for Looker is now available as well, as well as the ability to access Looker data models in your Data Studio BI tool.
In addition to several new features related to Google Cloud, Google has also announced the creation of the Data Cloud Alliancean alliance that the company is part of along with companies such as Confluentes, Databricks, Dataiku, Deloitte, Elastic, Fivetran, MongoDB, Neo4j, Redis and Starburst.
Alliance members will provide infrastructure, APIs and integration support to ensure data accessibility and portability across multiple platforms and products, and across multiple environments. They will also collaborate on new data models, processes and platform integrations common to the industry. All to improve data portability.