question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tables aren't redefined for re-runs of UDF apply

See original GitHub issue

Description of the bug

As part of iterative development in a Jupyter environment, apply may be re-run several times. The developer might need to update candidates or create a new labeling function, for example. When this happens, the corresponding Postgres table is cleared but not dropped. This means that the definition of the table cannot change to accommodate the updated parameters for apply.

To Reproduce

Steps to reproduce the behavior:

  1. Run the max_storage_temp_tutorial notebook in fonduer-tutorials, up to and including the Labeling Functions section.
  2. Add a new LF, doesn’t need to do anything in particular (could return ABSTAIN every time). Add this to the stg_temp_lfs list.
  3. Re-run the remainder of cells in the section.

Upon calling LFAnalysis, the following exception is thrown:

ValueError: Number of LFs (7) and number of LF matrix columns (6) are different

Expected behavior

Underlying tables for a re-run of a UDF apply method should not only be cleared, but dropped.

Error Logs/Screenshots

Full stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-e005feee6300> in <module>
      5 sorted_lfs = sorted(lfs, key=lambda lf: lf.name)
      6 
----> 7 LFAnalysis(L=L_train[0], lfs=sorted_lfs).lf_summary(Y=L_gold_train[0].reshape(-1))

~/.venv/lib/python3.7/site-packages/snorkel/labeling/analysis.py in __init__(self, L, lfs)
     44             if len(lfs) != self._L_sparse.shape[1]:
     45                 raise ValueError(
---> 46                     f"Number of LFs ({len(lfs)}) and number of "
     47                     f"LF matrix columns ({self._L_sparse.shape[1]}) are different"
     48                 )

ValueError: Number of LFs (7) and number of LF matrix columns (6) are different

Environment (please complete the following information)

  • OS: Ubuntu 18.04
  • PostgreSQL Version: 12.1
  • Poppler Utils Version: 0.71.0-5
  • Fonduer Version: 0.8.3

Additional context

https://github.com/HazyResearch/fonduer/issues/263#issuecomment-527588765 advises restarting Python, but this does not appear to solve the problem.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
senwucommented, Mar 30, 2021

@robbieculkin Sorry for the late response. We will fix this asap.

1reaction
senwucommented, Mar 14, 2021

Hi @robbieculkin,

Thanks for your question.

I think this is a problem related to the Snorkel package since it assumes all labeling functions can at least apply to one sample in the dataset which means the labeling function cannot always return ABSTAIN.

In Fonduer, we save the labeling function outputs in a sparse format which means we will store the labeling function name as a key based on your definition while if it always returns ABSTAIN Fonduer won’t save any results. And we send the labeling function names and outputs to snorkel to calculate the weak labels which cause your error if you have some labeling function always return ABSTAIN.

FYI: Fonduer gradually updates labeling function outputs which means if you update the only results or add new results (it won’t clear existing results by default). If you want to clear all existing results you can call the clear() function.

Thanks, Sen

Read more comments on GitHub >

github_iconTop Results From Across the Web

Db2 for i SQL: Table function considerations - IBM
The SQL-result argument repeats for table functions; each instance corresponding to a column to be returned as defined in the RETURNS TABLE clause...
Read more >
Returning tables in UDF - sql server - Stack Overflow
User defined functions have several limitations (they are meant not to have side-effects - cannot anything in the database - etc.).
Read more >
Create User-defined Functions (Database Engine) - SQL Server
User-defined functions can't make use of dynamic SQL or temp tables. Table variables are allowed. SET statements aren't allowed in a user- ......
Read more >
User Defined Functions In Snowflake | Chapter-21.2 - YouTube
User Defined Functions (aka UDFs ) In Snowflake allows data developer to wrap their SQL logic into a function (so called User Defined...
Read more >
Oracle FLEXCUBE Universal Banking User Guide
The Value Dated Changes function of Oracle Banking Corporate Lending enables you to make changes to borrower tranches and borrower drawdowns under a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found