question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

StatisticsGen does not recognize missing fields

See original GitHub issue

For some reason StatisticsGen is not recognizing the missing values in my file. I was expecting that StatisticsGen recognizes the missing fields in the last records. When I generate the statistics using generate_statistics_from_csv directly works perfectly fine.

# Loading the Files
input_config = example_gen_pb2.Input(splits=[
    example_gen_pb2.Input.Split(name='train', pattern='train/*'),
    example_gen_pb2.Input.Split(name='eval', pattern='eval/*')
])

example_gen = CsvExampleGen(
    input_base=_data_root,
    input_config=input_config)

context.run(example_gen, enable_cache=False)

# Generating Stats
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen, enable_cache=False)

# Showing the Stats
context.show(statistics_gen.outputs['statistics'])

File being used:

index,inputSurname,label
0,BALLADARAS NATE RAE,0
1,LABRANCHE TRACIE SURIANO,0
2,VENTURES LLC             TIERNAN RE,1
3,CHOU                     ABC,1
4,JENSEN DARREN RANEE,0
5,VANDERMOLEN DEBORA PATRICIA,0
6,ZAMBRANO YANGFANG SESE,0
7,IMAGE LLC                DENTAL,1
8,OFFICE                   S BRUCE             LAW,1
9,,

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
aondra17commented, Apr 21, 2022

I had same problem, and this worked as solution (i had only int and “missing”):

import csv
import tensorflow as tf
from tqdm import tqdm

def x(a) :
    if not a:
        a = tf.train.Feature()
    else:
        
        a = tf.train.Feature(int64_list=tf.train.Int64List(value=[int(a)]))   
    return  a


original_data_file = "./data/data.csv"
tfrecords_filename = "./tfR/consumer-complaints.tfrecords"
tf_record_writer = tf.io.TFRecordWriter(tfrecords_filename)

with open(original_data_file) as csv_file:
    reader = csv.DictReader(csv_file, delimiter=",", quotechar='"')
    for row in tqdm(reader):
        #row = clean_rows(row)
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    "index": x(row["index"]),
                    "inputSurname": x(row["inputSurname"]),
                    "label":x(row["label"]),
                    
                }
            )
        )
        tf_record_writer.write(example.SerializeToString())
    tf_record_writer.close()

0reactions
google-ml-butler[bot]commented, May 21, 2021

Are you satisfied with the resolution of your issue? Yes No

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow TFDV does not work with Specific NaN values
I'm using Tensorflow Data Validation to generate stats from the data and infer an schema to input in TFX. I didn't find any...
Read more >
tfx.v1.components.StatisticsGen - TensorFlow
The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation.
Read more >
Solved: MISSING FIELDS - Microsoft Power BI Community
From time to time, Existing fields do not show in the formula bar and writting them is not accepted and return an error...
Read more >
TensorFlow Extended (TFX) for data validation in practice
We all know that real-life data can be low-quality and full of surprises: missing values, measurement errors, poorly specified fields or non- ...
Read more >
Bulk Statistics File Format - Cisco
For example, a missing field indicates an error. This field does not validate the data. The keying information for these statistics in this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found