False positive ambiguous columns error when creating features
See original GitHub issueTech Debt Title
Summary
Weird error related to ambiguous columns that are not really ambiguous?
Feature related:
Age: new-tech-debt-introduced
Present since: 2020-03-06
Estimated cost: investigation_needed
Type: coding
Description 📋
It seems that when using a SQLExpressionTransform
when creating features can lead to false positive errors about ambiguous columnds.
An example:
source=Source(
readers=[
TableReader(
id="availability",
database="datalake_ebdb_raw",
table="horariosemanalimovel_aud",
)
.with_(self.column_sum)
.with_(
pivot,
group_by_columns=["imovel_id", "rev"],
pivot_column="diaDaSemana",
agg_column="column_sum",
aggregation=functions.sum,
mock_value=0,
mock_type="int",
with_forward_fill=True,
),
TableReader(
id="ure",
database="datalake_ebdb_clean",
table="user_revision_entity",
),
],
query=(
"""
with coalesced_availability as (
select
av.imovel_id as id,
av.rev,
coalesce(`1`, 0) as monday,
coalesce(`2`, 0) as tuesday,
coalesce(`3`, 0) as wednesday,
coalesce(`4`, 0) as thursday,
coalesce(`5`, 0) as friday,
coalesce(`6`, 0) as saturday,
coalesce(`7`, 0) as sunday
from availability av
), houses as (
select
ha.id_house as ha_id,
ha.rev as ha_rev,
av.rev as av_rev,
av.monday,
av.tuesday,
av.wednesday,
av.thursday,
av.friday,
av.saturday,
av.sunday
from datalake_ebdb_clean.house_aud ha
full outer join coalesced_availability av
on av.id = ha.id_house
and av.rev <= ha.rev
)
select distinct
ha_id as id,
coalesce(av_rev, ha_rev) as ts_revision,
monday as available_slots_monday,
tuesday as available_slots_tuesday,
wednesday as available_slots_wednesday,
thursday as available_slots_thursday,
friday as available_slots_friday,
saturday as available_slots_saturday,
sunday as available_slots_sunday,
(monday + tuesday + wednesday + thursday + friday + saturday + sunday) as total_available_slots_weekly
from houses
"""
),
),
feature_set=FeatureSet(
name="house_availability",
entity="house",
description=(
"""
Holds availability information related to house
feature such as "available_slots_monday" or
"total_available_slots_weekly"
"""
),
keys=[
KeyFeature(
name="id",
description="The House's Main ID",
)
],
timestamp=TimestampFeature(from_column="ts_revision", from_ms=True),
features=[
Feature(
name="available_slots_monday",
description="Number indicating available hours for visit on monday",
transformation=SQLExpressionTransform(
expression="coalesce(available_slots_monday, 9)"
),
),
...
It seems that the part:
Feature(
name="available_slots_monday",
description="Number indicating available hours for visit on monday",
transformation=SQLExpressionTransform(
expression="coalesce(available_slots_monday, 9)"
),
),
Causes the error:
org.apache.spark.sql.AnalysisException: Reference 'available_slots_monday' is ambiguous, could be: available_slots_monday, available_slots_monday.;
However if I change the query from monday as available_slots_monday,
to make the slect simply as monday
and then do:
Feature(
name="available_slots_monday",
description="Number indicating available hours for visit on monday",
transformation=SQLExpressionTransform(
expression="coalesce(monday, 9)"
),
it works!
Impact 💣
Some false positive errors that can be hard to debug.
Critical in: UNKOWN
Solution Hints :squirrel:
Not sure
Observations 🤔
Files related or evidences (like: prints)
Complete error:
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:7 (3 by maintainers)
Top Results From Across the Web
How to Solve the “Ambiguous Name Column” Error in SQL
One of the simplest ways to solve an “ambiguous name column” error — without changing column name — is to give the tables...
Read more >Enforcing non-ambiguous references to column names in ...
syntax_pg=true for PostgreSQL compatibility). It throws when the ORDER BY has a column name that is ambiguous.
Read more >PostgreSQL. False positive 'Ambiguous column reference'
DBE-5229 showing valid sql having query as ambiguous reference. 1. Similar to 1 issue (1 unresolved). N. DBE-16973 ambiguous reference false positive.
Read more >Remove ambiguous character types from the data source file ...
Single character fields should be eliminated from the data source file. These are more likely to cause false positives, since a single character ......
Read more >Adjust how locations and attributes are extracted—ArcGIS Pro
click the Coordinates tab, and click the Create features from coordinates toggle. ... They can produce locations that are false positives since they...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’m working on this fix. Changing just one line on the
SQLExpressionTransform
. This transformation was using a select statement on the dataframe. If the name of the feature is the same as one column already presented on the df it gets ambiguous. I’m changing the return to use awithColumn
statement, so Spark automatically overwrite the column if they have the same name, this is the same behaviour presented in the other transformations and features in Butterfree.Thanks! I’m closing this issue.