Pre-step
Install Cloud SDK fro GCP which contains gcloud, gsutil and bq command-line tools following official instructions here
- Download this file
- Upload it to your bucket
- For GCS use:
# maybe you need to reauthenticate
gcloud auth login --update-adc
# validate project info with
gcloud config list
# if is not the project you want use
gcloud config set project {PROJECT_ID}
# to create a bucket
gsutil mb gs://BUCKET_NAME
# to upload into a bucket use
# gsutil cp origin destination
gsutil cp segments.csv gs://BUCKET_NAME- For GCP use:
# to copy from a bucket use
# gsutil cp origin destination
gsutil cp gs://de-bootcamp-2021/all_data.csv gs://BUCKET_NAME- For GCP create a
dataset
bq --location=US mk -d \
--description "This is my test dataset." \
mydataset- For GCP use
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
gs://BUCKET_NAME/all_data.csv
# if schema is not right maybe explicitly declare it
bq load \
--skip_leading_rows=1 \
--source_format=CSV \
mydataset.mytable \
gs://BUCKET_NAME/all_data.csv \
producto:STRING,presentacion:STRING,marca:STRING,categoria:STRING,catalogo:STRING,precio:STRING,fechaRegistro:STRING,cadenaComercial:STRING,giro:STRING,nombreComercial:STRING,direccion:STRING,estado:STRING,municipio:STRING,latitud:STRING,longitud:STRING
# if there is a schema mismatch error try deleting the current table
bq rm -t mydataset.mytable
# If data was ingested without errors go to BigQUery UI and play around with SQL queries on your table- Delete files in the buckets
- For GCP use
# you can remove specific object
gsutil rm -r gs://BUCKET_NAME/all_data.csv
# or recursively delete
gsutil rm -r gs://BUCKET_NAME- Delete buckets
- For GCP use
gsutil rb gs://BUCKET_NAME- Delete datawarehouse tables
- For GCP use
```python
# remove the table
bq rm -t mydataset.mytable
# remove the dataset
bq rm -d mydataset
# if you want to remove dataset and all its tables
bq rm -r -d mydataset