question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ParallelRunStep for R scripts

See original GitHub issue

Is your feature request related to a problem? Please describe. Currently, we’re able to deploy Python ML models with distributed batch scoring using the ParallelRunStep, and then we can call that deployed pipeline from Data Factory. Some of the data scientists that I work with model in R. ParallelRunStep can only run Python scripts and not R scripts. Due to this, we’re looking at other deployment alternatives, but it would be great if we could use Python to build a pipeline but have the scoring code be R.

Describe the solution you’d like Currently, we build ML pipelines in ML Studio using the Python SDK and the ParallelRunStep. We still want to use the Python SDK to build our pipelines, but it would be great if there was a way to run an R script from the ParallelRunStep instead of only Python scripts.

Describe alternatives you’ve considered Azure Batch. Azure Databricks + SparkR. Azure Databricks + Sparklyr.

Additional context For e.g., it would be great if the entry_script in ParallelRunConfig could be an R script if we wanted. Then the R batch scoring script would follow the same format as a Python scoring script (init function, run function, etc.). In addition, it would be great if we could specify the version of R as well.

env.r_version = '3.4.3'

parallel_run_config = ParallelRunConfig(
    environment=env,
    entry_script="batch_scoring.R",
    source_directory=".",
    output_action="append_row",
    mini_batch_size="20",
    error_threshold=1,
    compute_target=compute_target,
    process_count_per_node=2,
    node_count=1
)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
v-strudm-msftcommented, Mar 16, 2021

It’s been a few weeks since a solution was provided so we’ll close this issue for now. Should you have additional questions, please reopen this issue.

1reaction
xiangyan99commented, Feb 5, 2021

Thanks for the feedback, we’ll investigate asap.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting the ParallelRunStep - Azure Machine Learning
Your ParallelRunStep runs as a step in ML pipelines. You may want to test your scripts locally as a first step.
Read more >
AML Tool Selection Guide · Azure ML-Ops (Accelerator)
Scalable compute instances that work for GPU or non-GPU clusters. You run Python or R code in configurable Conda environments managed by Azure...
Read more >
Run R codes in Azure ML - RADACAD
There is a possibility to run R codes and Python in Azure ML. ... “Execute R scripts” able to get three different nodes....
Read more >
Advent of 2022, Day 11 – Creating Pipelines with Python SDK
Creating pipelines with the following codE: ... And pipeline was create using Python SDK and the ParallelRunStep function.
Read more >
3 Ways to Pass Data Between Azure ML Pipeline Steps
So if you want to see how your code behaves, you'll have to run the ... in p.iterdir(): with child.open('r') as f: print(f.read(),...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found