Keyword "Apache Spark" shows up on Warehouse as two keywords "Apache", "Spark"
See original GitHub issueHere’s how I set my project’s keywords (ref):
keywords=['Apache Spark'],
So my intention was 1 keyword, “Apache Spark”. However, here on Warehouse this shows up as two keywords, “Apache” and “Spark”.
Is this intentional?
I know that setup()
also accepts a single string for keywords
, instead of a list of strings, as follows:
keywords='Apache Spark',
In this case I would expect the keywords to be interpreted as whitespace-delimited – that is, two keywords, “Apache” and “Spark”, as they currently are on Warehouse.
Issue Analytics
- State:
- Created 7 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Apache Spark Key Terms, Explained - The Databricks Blog
In this blog post, we will discuss some of the key terms one encounters when working with Apache Spark.
Read more >How to check if a particular keyword exists in Apache Spark
Hi, I have a text file where I need to search for particular keyword exists or not. So can anyone help me how...
Read more >Spark SQL, DataFrames and Datasets Guide
Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data. When writing Parquet...
Read more >Intro to Apache Spark
Let's get started using Apache Spark, ... 1. create RDDs to filter each line for the keyword. “Spark”. 2. perform a ... MapReduce...
Read more >How to use LEFT and RIGHT keyword in SPARK SQL
You can use substring function with positive pos to take from the left: import org.apache.spark.sql.functions.substring substring(column, 0, 1).
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think that if you use
keywords="Apache Spark,"
you might get the behavior you want, but yea, closing this as WONTFIX. Thanks!This seems to be the doing on
format_tags
filter.https://github.com/pypa/warehouse/blob/master/warehouse/filters.py#L101
Specifically
L108:
split_tags = re.split(r'\s+', tags)
I think the correct behavior is this:
tags
is a list, then callformat_tags
recursively (or iteratively), to clean each individual one.tags
is a string, then execute the existing flow.Legacy pypi doesn’t seem to be doing any formatting - just dumps the keyword as a string on the UI.
@dstufft or @rjwebb can confirm.