Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
gcs Table Function
Provides a table-like interface toSELECT and INSERT data from Google Cloud Storage. Requires the Storage Object User IAM role.
This is an alias of the s3 table function.
If you have multiple replicas in your cluster, you can use the s3Cluster function (which works with GCS) instead to parallelize inserts.
Syntax
Arguments
| Argument | Description |
|---|---|
url | Bucket path to file. Supports following wildcards in readonly mode: *, **, ?, {abc,def} and {N..M} where N, M — numbers, 'abc', 'def' — strings. |
NOSIGN | If this keyword is provided in place of credentials, all the requests will not be signed. |
hmac_key and hmac_secret | Keys that specify credentials to use with given endpoint. Optional. |
format | The format of the file. |
structure | Structure of the table. Format 'column1_name column1_type, column2_name column2_type, ...'. |
compression_method | Parameter is optional. Supported values: none, gzip or gz, brotli or br, xz or LZMA, zstd or zst. By default, it will autodetect compression method by file extension. |
GCSThe GCS path is in this format as the endpoint for the Google XML API is different than the JSON API:and not https://storage.cloud.google.com.
url, format, structure, compression_method work in the same way, and some extra parameters are supported:
| Parameter | Description |
|---|---|
access_key_id | hmac_key, optional. |
secret_access_key | hmac_secret, optional. |
filename | Appended to the url if specified. |
use_environment_credentials | Enabled by default, allows passing extra parameters using environment variables AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, AWS_CONTAINER_CREDENTIALS_FULL_URI, AWS_CONTAINER_AUTHORIZATION_TOKEN, AWS_EC2_METADATA_DISABLED. |
no_sign_request | Disabled by default. |
expiration_window_seconds | Default value is 120. |
Returned value
A table with the specified structure for reading or writing data in the specified file.Examples
Selecting the first two rows from the table from GCS filehttps://storage.googleapis.com/my-test-bucket-768/data.csv:
gzip compression method:
Usage
Suppose that we have several files with following URIs on GCS:- ‘https://storage.googleapis.com/my-test-bucket-768/some_prefix/some_file_1.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/some_prefix/some_file_2.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/some_prefix/some_file_3.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/some_prefix/some_file_4.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/another_prefix/some_file_1.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/another_prefix/some_file_2.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/another_prefix/some_file_3.csv’
- ‘https://storage.googleapis.com/my-test-bucket-768/another_prefix/some_file_4.csv’
file-000.csv, file-001.csv, … , file-999.csv:
test-data.csv.gz:
test-data.csv.gz from existing table:
my-test-bucket-768 directory recursively:
test-data.csv.gz files from any folder inside my-test-bucket directory recursively:
Partitioned Write
If you specifyPARTITION BY expression when inserting data into GCS table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency.
Examples
- Using partition ID in a key creates separate files:
file_x.csv, file_y.csv, and file_z.csv.
- Using partition ID in a bucket name creates files in different buckets:
my_bucket_1/file.csv, my_bucket_10/file.csv, and my_bucket_20/file.csv.