question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

k8s Container Attributes not Included when Exporting to Stackdriver in GKE

See original GitHub issue

Describe your environment.

  • Google Cloud Platform
  • GKE cluster
  • Docker container
  • Python 3.7.4

Python dependencies (trimmed to just relevant):

google-api-core[grpc]==1.14.2 ; platform_python_implementation != 'PyPy'
google-api-python-client==1.7.11
google-auth-httplib2==0.0.3
google-auth==1.6.3
google-cloud-core==1.0.3
google-cloud-firestore==1.4.0
google-cloud-logging==1.12.1
google-cloud-monitoring==0.33.0
google-cloud-pubsub==1.0.0
google-cloud-storage==1.19.1
google-cloud-trace==0.22.1
google-resumable-media==0.4.1
googleapis-common-protos[grpc]==1.6.0
grpc-google-iam-v1==0.12.3
grpcio==1.23.0
opencensus-context==0.1.1
opencensus-ext-stackdriver==0.7.2
opencensus==0.7.3

Steps to reproduce.

Using the following code:

import logging

from opencensus.stats import stats
from opencensus.stats import aggregation
from opencensus.stats import measure
from opencensus.stats import view
from opencensus.tags import tag_key as tag_key_module
from opencensus.tags.tag_map import TagMap
from opencensus.tags.tag_value import TagValue

from opencensus.ext.stackdriver import stats_exporter

from . import config

log = logging.getLogger(__name__)

entity_count = measure.MeasureInt('entity_count', 'Count of entities', 'entities')

x_key = tag_key_module.TagKey('x')
y_key = tag_key_module.TagKey('y')

view_key = (x_key, y_key)

count_view = view.View(
    f'entity_count',
    'Count of entities collected',
    view_key,
    entity_count,
    aggregation.CountAggregation(),
)

exporter = stats_exporter.new_stats_exporter(stats_exporter.Options(project_id='my_project_id'))

view_manager = stats.stats.view_manager
view_manager.register_exporter(exporter)
view_manager.register_view(count_view)

recorder = stats.stats.stats_recorder

def record_entity(x, y):
    log.debug('telemetry: %s/%s', x, y)
    tag_map = TagMap()
    tag_map.insert(x_key, TagValue(x))
    tag_map.insert(y_key, TagValue(y))
    mmap = recorder.new_measurement_map()
    mmap.measure_int_put(entity_count, 1)
    try:
        mmap.record(tag_map)
    except Exception as e:
        log.exception('error recording metric: %s', e)

… enter a Python shell locally and call record_entity('a', 'b') and everything works (including seeing data appear in Stackdriver’s UI). This gives an exception when running in a container in GKE (see details below).

What is the expected behavior?

I expect this to work the same in GKE as it does locally.

What is the actual behavior?

Running the same code in a container in GKE gives the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/opencensus/metrics/transport.py", line 59, in func
    return self.func(*aa, **kw)
  File "/usr/local/lib/python3.7/site-packages/opencensus/metrics/transport.py", line 113, in export_all
    export(itertools.chain(*all_gets))
  File "/usr/local/lib/python3.7/site-packages/opencensus/ext/stackdriver/stats_exporter/__init__.py", line 162, in export_metrics
    self.client.project_path(self.options.project_id), ts_batch)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/monitoring_v3/gapic/metric_service_client.py", line 1024, in create_time_series
    request, retry=retry, timeout=timeout, metadata=metadata
  File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 273, in retry_wrapped_func
    on_error=on_error,
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 182, in retry_target
    return target()
  File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 One or more TimeSeries could not be written: The set of resource labels is incomplete. Missing labels: (container_name namespace_name).: timeSeries[0-199]

Additional context.

This might be a regression of #647.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
rphillipszcommented, Oct 7, 2019

I’m having the same issue but in opencensus-node. Locally works fine, but results in the same error when deployed to GKE. -update- the work-around by @ymaki worked for me as well

3reactions
ymakicommented, Oct 7, 2019

I also faced this issue. It would be nice if the library could resolve these parameter automatically.

Though I’m not sure how to fix it, I come here to share a workaround.

If we add 2 environment variable, NAMESPACE and CONTAINER_NAME, then we can avoid gRPC parameters missing. We can add the environment variables like:

(snip)
          env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: CONTAINER_NAME
              value: "INSERT_CONTAINER_NAME_HERE"
Read more comments on GitHub >

github_iconTop Results From Across the Web

Monitoring Kubernetes Clusters on GKE (Google Container ...
Metrics can be exported to a number of backends, including Stackdriver. 3.4 Choosing the right tool. 3.5 Under the hood (Metrics and aggregating ......
Read more >
Migrating to Cloud Operations for GKE
The k8s_pod and k8s_cluster nodes might include logs not present in the Legacy Logging and Monitoring support. Monitoring only: gke_container (GKE Container).
Read more >
Cluster Configuration - The Unofficial GKE Security Guide
Kubernetes RBAC cannot be used to grant permissions to the GCP GKE API (container.googleapis.com). This is performed solely by Cloud IAM.
Read more >
Associate logs from GKE containers with ... - Issue Tracker
Hi, the metadata present in the Stackdriver logs does have all of the information we need, but unfortunately those attributes are only accessible...
Read more >
Stack Driver Monitoring, Logging and Alerting on Google ...
Stack Driver Monitoring, Logging and Alerting on Google Cloud Kubernetes EngineMicro Service Deployment on Google Cloud Kubernetes Engine ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found