Stepping with discrete numerical data
See original GitHub issueI have this data: nyears_uniqueness.csv.txt
When I use pyGAM to smooth it, I get some strange results:
gam = LinearGAM(n_splines=4)
gam.fit(df[['nyears']], df.uniqueness)
# /home/naught101/miniconda3/envs/science/lib/python3.6/site-packages/pygam/pygam.py:1172: UserWarning: detected catergorical data for feature 0
# self._validate_data_dep_params(X)
x_pred = np.linspace(min(df.nyears), max(df.nyears), num=100)
y_pred = gam.predict(x_pred)
df.plot.scatter(x='nyears', y='uniqueness')
ax = gca()
ax.plot(x_pred, y_pred)
When I try plotting the same data in R, I get a much smoother curve. I’m not sure if that’s just a matter of the defaults being more appropriate for the data, or if this is a bug in pyGAM. If, do you have any suggestions for how to improve the pyGAM plot?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
What Is Discrete Data vs. Continuous Data? Uses and Examples
Continuous data is a type of quantitative data that represents precise measurements of nearly any numeric value. Often, a continuous data ...
Read more >Discrete vs Continuous variables: How to Tell the Difference
How to tell the difference between discrete vs continuous variables in easy steps. ... Step 2: Think about “hidden” numbers that you haven't...
Read more >What is Discrete Data? Examples & explanation.
Discrete Data is a numerical type of data that can only take certain values and is usually determined by counting. Data can be...
Read more >Discrete Data Examples - Career Karma
Discrete data consists of whole numbers with finite values. This sort of data can't be broken down into smaller pieces or decimals.
Read more >Strategies for working with discrete, categorical data
We covered various feature engineering strategies for dealing with structured continuous numeric data in the previous article in this series ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
solved! by default we assume all data is numerical. thanks @naught101
I don’t really use R much at the moment, and haven’t used R’s GAM other than as a comparison with pyGAM, to see if I was seeing expected behaviour.
I would think that the best action would be just to assume any numerical data is numerical, unless
dtype='categorical'
is supplied. Is there any reason to assume categorical for such data, regardless of how discrete the values are?