Tables aren't redefined for re-runs of UDF apply
See original GitHub issueDescription of the bug
As part of iterative development in a Jupyter environment, apply may be re-run several times. The developer might need to update candidates or create a new labeling function, for example.
When this happens, the corresponding Postgres table is cleared but not dropped. This means that the definition of the table cannot change to accommodate the updated parameters for apply.
To Reproduce
Steps to reproduce the behavior:
- Run the max_storage_temp_tutorial notebook in fonduer-tutorials, up to and including the Labeling Functions section.
- Add a new LF, doesn’t need to do anything in particular (could return ABSTAIN every time). Add this to the
stg_temp_lfslist. - Re-run the remainder of cells in the section.
Upon calling LFAnalysis, the following exception is thrown:
ValueError: Number of LFs (7) and number of LF matrix columns (6) are different
Expected behavior
Underlying tables for a re-run of a UDF apply method should not only be cleared, but dropped.
Error Logs/Screenshots
Full stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-e005feee6300> in <module>
5 sorted_lfs = sorted(lfs, key=lambda lf: lf.name)
6
----> 7 LFAnalysis(L=L_train[0], lfs=sorted_lfs).lf_summary(Y=L_gold_train[0].reshape(-1))
~/.venv/lib/python3.7/site-packages/snorkel/labeling/analysis.py in __init__(self, L, lfs)
44 if len(lfs) != self._L_sparse.shape[1]:
45 raise ValueError(
---> 46 f"Number of LFs ({len(lfs)}) and number of "
47 f"LF matrix columns ({self._L_sparse.shape[1]}) are different"
48 )
ValueError: Number of LFs (7) and number of LF matrix columns (6) are different
Environment (please complete the following information)
- OS: Ubuntu 18.04
- PostgreSQL Version: 12.1
- Poppler Utils Version: 0.71.0-5
- Fonduer Version: 0.8.3
Additional context
https://github.com/HazyResearch/fonduer/issues/263#issuecomment-527588765 advises restarting Python, but this does not appear to solve the problem.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top Results From Across the Web
Db2 for i SQL: Table function considerations - IBM
The SQL-result argument repeats for table functions; each instance corresponding to a column to be returned as defined in the RETURNS TABLE clause...
Read more >Returning tables in UDF - sql server - Stack Overflow
User defined functions have several limitations (they are meant not to have side-effects - cannot anything in the database - etc.).
Read more >Create User-defined Functions (Database Engine) - SQL Server
User-defined functions can't make use of dynamic SQL or temp tables. Table variables are allowed. SET statements aren't allowed in a user- ......
Read more >User Defined Functions In Snowflake | Chapter-21.2 - YouTube
User Defined Functions (aka UDFs ) In Snowflake allows data developer to wrap their SQL logic into a function (so called User Defined...
Read more >Oracle FLEXCUBE Universal Banking User Guide
The Value Dated Changes function of Oracle Banking Corporate Lending enables you to make changes to borrower tranches and borrower drawdowns under a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@robbieculkin Sorry for the late response. We will fix this asap.
Hi @robbieculkin,
Thanks for your question.
I think this is a problem related to the Snorkel package since it assumes all labeling functions can at least apply to one sample in the dataset which means the labeling function cannot always return ABSTAIN.
In Fonduer, we save the labeling function outputs in a sparse format which means we will store the labeling function name as a key based on your definition while if it always returns ABSTAIN Fonduer won’t save any results. And we send the labeling function names and outputs to snorkel to calculate the weak labels which cause your error if you have some labeling function always return ABSTAIN.
FYI: Fonduer gradually updates labeling function outputs which means if you update the only results or add new results (it won’t clear existing results by default). If you want to clear all existing results you can call the
clear()function.Thanks, Sen