question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Publish delta-spark to Conda

See original GitHub issue

Feature request

Overview

Publish the Python delta-spark package to a public Conda channel so that users of Conda can use Delta Lake. As of now, the package is only available at Pypi.

Motivation

Many users, particularly in data science, leverage Conda for package management.

Further details

Some packages are already available on Conda Forge, including delta-sharing.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:3
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
nkarpovcommented, Dec 15, 2022

I’ve submitted a PR https://github.com/conda-forge/staged-recipes/pull/21556 for this. It’s passing all the tests so just waiting for a review now.

@shubhamp051991 I was able to resolve that issue (and another similar one with another missing file) by adding the required files as additional sources in the conda meta.yaml (refer to the PR for more details)

Once the PR is reviewed and active for the most recent release, we can add something in this repo to generate future release conda meta.yaml

0reactions
KevinAppelBofacommented, Nov 15, 2022

@MrPowers @scottsand-db is there any update on this? I am adding in the delta lake for the first time for our group now to use starting with Spark 3.3.1 and this would be great if we can get this out of the conda and not have to do a pip pull I just built this now so I can get this setup in our conda environment, this would be great for the next time Spark 3.4 is out if there would be a conda version of the delta posted.

Based on the last release it is version 2.1.1, i do a git clone and then find the commit tied to this tag and use this for the meta.yaml; after this you just run the conda-build command, ie conda-build delta

hopefully this helps on getting the package built, i’m not sure on how they get uploaded though to conda-forge

meta.yaml

{% set name = "delta" %}
{% set version = "2.1.1" %}

package:
  name: "{{ name|lower }}"
  version: "{{ version }}"

source:
  git_url: "/scratch/fromgit-branch-2.1.1/delta"
  git_rev: d8c4fc17c25d6b5e0e9b3ebe1ff4cba39ecb39c5

build:
  number: 0
  noarch: python
  script: "{{ PYTHON }} setup.py install"

requirements:
  host:
    - python=3.9
    - pyspark=3.3.1
    - importlib_metadata
  run:
    - python=3.9
    - pyspark=3.3.1
    - importlib_metadata

test:
  imports:
    - delta

about:
  home: https://github.com/delta-io/delta
  summary: An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Read more comments on GitHub >

github_iconTop Results From Across the Web

Install PySpark, Delta Lake, and Jupyter Notebooks on Mac ...
This blog post explains how to install PySpark, Delta Lake, and Jupyter Notebooks on a Mac. This setup will let you easily run...
Read more >
Is there a location where customers can submit suggestions ...
You can submit suggestions and feature requests at our Ideas Portal. Expand Post.
Read more >
Local Databricks Development on Windows - Pivotal BI
This post sets out steps required to get your local development environment setup on Windows for databricks. It includes setup for both ...
Read more >
delta-spark - PyPI
This PyPi package contains the Python APIs for using Delta Lake with Apache Spark. Installation and usage. Install using pip install delta-spark; To...
Read more >
conda-forge feedstocks | community driven packaging for conda
packages on conda-forge ... publish: publish-feedstock · publication: publication-feedstock ... jsonapi-requests: jsonapi-requests-feedstock ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found