question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

firestore-bigquery-export: typed arrays for schema views

See original GitHub issue

[REQUIRED] Step 2: Extension name

This feature request is for extension: firestore-bigquery-export, and in particular for the GENERATE_SCHEMA_VIEWS.md

What feature would you like to see?

Support for arrays of different types. For example, some of the arrays in our project have maps inside them. A natural way to handle this would be to create a value field for each of the keys in the map. Even now, it’s a bit strange that the array type doesn’t specify if it’s a string, number, boolean etc.

How would you use it?

To import and use bigquery on firestore objects that have arrays of nested maps.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:16
  • Comments:15 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
dackers86commented, Dec 15, 2022

Initial PR added. For discussion, initial results have produced something similar to the following:

SELECT
  *
FROM
  (
    SELECT
      document_name,
      document_id,
      timestamp,
      operation,
      JSON_EXTRACT_SCALAR(data, '$.name') AS name,
      JSON_EXTRACT_SCALAR(data, '$.date') AS date,
      JSON_EXTRACT_SCALAR(data, '$.total') AS total,
      JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
      JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
      JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
    FROM
      `dev-extensions-testing.da_testing4.da_testing4_raw_changelog` da_testing4_raw_changelog
      LEFT JOIN UNNEST(
        json_extract_array(da_testing4_raw_changelog.data, '$.cartItems')
      ) cartItems WITH OFFSET _cartItems
  )

And for the latest view

-- Given a user-defined schema over a raw JSON changelog, returns the
-- schema elements of the latest set of live documents in the collection.
--   timestamp: The Firestore timestamp at which the event took place.
--   operation: One of INSERT, UPDATE, DELETE, IMPORT.
--   event_id: The event that wrote this row.
--   <schema-fields>: This can be one, many, or no typed-columns
--                    corresponding to fields defined in the schema.
SELECT
  document_name,
  document_id,
  timestamp,
  operation,
  name,
  date,
  total,
  productName,
  quantity,
  isGift
FROM
  (
    SELECT
      document_name,
      document_id,
      FIRST_VALUE(timestamp) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS timestamp,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS operation,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) = "DELETE" AS is_deleted,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.name')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS name,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.date')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS date,
      FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.total')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS total,
      JSON_EXTRACT_SCALAR(cartItems, '$.productName') AS productName,
      JSON_EXTRACT_SCALAR(cartItems, '$.quantity') AS quantity,
      JSON_EXTRACT_SCALAR(cartItems, '$.isGift') AS isGift
    FROM
      `dev-extensions-testing.da_testing4.da_testing4_raw_latest`
      LEFT JOIN unnest(
        json_extract_array(
          `dev-extensions-testing.da_testing4.da_testing4_raw_latest`.data,
          '$.cartItems'
        )
      ) cartItems WITH OFFSET _cartItems
  )
WHERE
  NOT is_deleted
GROUP BY
  document_name,
  document_id,
  timestamp,
  operation,
  name,
  date,
  total,
  productName,
  quantity,
  isGift

Samples results lead to multiple rows per array item.

image

Questions

  1. Would developers find this easier, if the array columns are separated into different columns, as opposed to adding multiple rows.
  2. The latest view may have some performance issues, the latest BQ updates includes a much more performant script. For readability and ease upgrade. Should this be included in the update or as a separate PR
1reaction
gregfentoncommented, Jan 3, 2021

Thank you @nwparker ! I have turned to looking at a combination of unnest() and json_extract_array() to extract the values from my data. I suspect I’ll be simply adding a hand-coded query to BQ rather than expecting @firebaseextensions/fs-bq-schema-views to generate one for me.

I got some good direction from this question I posted on Stackoverflow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading data from Firestore exports | BigQuery
BigQuery supports loading data from Firestore exports created using the Firestore managed import and export service. The managed import and export service  ......
Read more >
Generate schema views using 'Export Collections to ...
In the fields array, each item will have a name and type property. name is the fieldName you have used in your collections...
Read more >
Exporting data from Firebase (Firestore) to BigQuery
Firestore supports export/import operations of collections to Cloud Storage and BigQuery supports the import od these exports. We just need to ...
Read more >
Export Performance Monitoring data to BigQuery - Firebase
Enable BigQuery export; What data is exported to BigQuery? Detailed data schema. What can you do with the exported data? View average app...
Read more >
Firestore to BigQuery via Firebase Extensions - Invertase
BigQuery can be considered the opposite of Firestore from a data structured point of view. Using an advanced querying engine, large complex ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found