question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Provide table ingestion status through API

See original GitHub issue

Summary

Operating Pinot on behalf of data engineers represents a challenge when these users only have access to the Controller endpoints. SRE aren’t specifically aware of tables configurations details as created by the system users. If something is wrong, there is no other means for data engineers to get the details of the error than requesting log extraction from SREs.

Observed use cases

Realtime table ingestion

Any Kafka connectivity issue coming from a bad server, port or credential will not be reported back in the table status.

BatchConfig

Recently on 0-7-0.SNAPSHOT, a new awesome batchIngestionConfig allows minion tasks to ingest data. For misc reason, that ingestion may fail because of user provided configuration (invalid S3, or GCS credentials, etc). Sadly, the only way for a data engineer to get feedback from the root cause not seing anything being ingested is ask an SRE to investigate the Pinot logs.

Suggestion

Include a table ingestion state as part of the table status.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
icefury71commented, Apr 21, 2021

Added a high level design document for this issue. At the moment, I’ve only considered Pinot RealTime tables. Once the high level approach is agreed upon - I will extend this to Minion based ingestion of Offline tables as well.

You can find the design here: https://docs.google.com/document/d/12w6rEJBRKACKomSdL871GCjTtzLxY1N7kYE308RT8JY/edit

1reaction
kishoregcommented, Feb 24, 2021

I feel there are two parts to this

  1. Need for an API to know the ingestionStatus of a table
  2. How do we implement that API and what should we use for that

I think having 1 is important and its user friendly. It allows us to internally call multiple endpoints/queries etc to present a complete overview of what’s going on with the table. I am happy to have an overall status API for a table and ingestion Status is a part of the response.

Freshness/partial response/actually testing the connection information is part of the implementation and we can start with what makes sense and enhance as needed but having a standard endpoint is a good start.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Provide table ingestion status through API #6524 - GitHub
I think having 1 is important and its user friendly. It allows us to internally call multiple endpoints/queries etc to present a complete ......
Read more >
Importing Table Records Using the Data Ingestion API
Treasure Data provides an ingest API that allows you to programmatically import rows into existing tables in a TD database.
Read more >
Kusto.Ingest status reporting - Azure Data Explorer
Ingestion status in the Azure table · Pending indicates that the source has been queued for ingestion and is yet to be updated....
Read more >
Ingesting data using the Near real time ingestion API
The Near real time ingestion API enables you to ingest data directly into your Oracle Unity data objects. Unlike the Stream API, you...
Read more >
Ingestion Templates API ·
The Tables v1 API is scheduled for removal November 1, 2022, when you must transition to the Tables v2 API. See the Migration...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found