question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stepping with discrete numerical data

See original GitHub issue

I have this data: nyears_uniqueness.csv.txt

When I use pyGAM to smooth it, I get some strange results:

gam = LinearGAM(n_splines=4)
gam.fit(df[['nyears']], df.uniqueness)
# /home/naught101/miniconda3/envs/science/lib/python3.6/site-packages/pygam/pygam.py:1172: UserWarning: detected catergorical data for feature 0
#  self._validate_data_dep_params(X)

x_pred = np.linspace(min(df.nyears), max(df.nyears), num=100)
y_pred = gam.predict(x_pred)    

df.plot.scatter(x='nyears', y='uniqueness')
ax = gca()
ax.plot(x_pred, y_pred)    

figure_1-1

When I try plotting the same data in R, I get a much smoother curve. I’m not sure if that’s just a matter of the defaults being more appropriate for the data, or if this is a bug in pyGAM. If, do you have any suggestions for how to improve the pyGAM plot?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dswahcommented, Sep 9, 2018

solved! by default we assume all data is numerical. thanks @naught101

0reactions
naught101commented, Feb 27, 2018

I don’t really use R much at the moment, and haven’t used R’s GAM other than as a comparison with pyGAM, to see if I was seeing expected behaviour.

I would think that the best action would be just to assume any numerical data is numerical, unless dtype='categorical' is supplied. Is there any reason to assume categorical for such data, regardless of how discrete the values are?

Read more comments on GitHub >

github_iconTop Results From Across the Web

What Is Discrete Data vs. Continuous Data? Uses and Examples
Continuous data is a type of quantitative data that represents precise measurements of nearly any numeric value. Often, a continuous data ...
Read more >
Discrete vs Continuous variables: How to Tell the Difference
How to tell the difference between discrete vs continuous variables in easy steps. ... Step 2: Think about “hidden” numbers that you haven't...
Read more >
What is Discrete Data? Examples & explanation.
Discrete Data is a numerical type of data that can only take certain values and is usually determined by counting. Data can be...
Read more >
Discrete Data Examples - Career Karma
Discrete data consists of whole numbers with finite values. This sort of data can't be broken down into smaller pieces or decimals.
Read more >
Strategies for working with discrete, categorical data
We covered various feature engineering strategies for dealing with structured continuous numeric data in the previous article in this series ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found