question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Questions about sample code?

See original GitHub issue

Hello, I am new to python and machine learning but need to use the library for a project. I read the website and the sample code but am still confused on how I can retrieve the features that have been (selected?) by each of the Relief algorithms.

Apologies if the site goes over this, but I didn’t see any information on this. I had a couple questions:

  1. How do we get back the features selected by each algorithm?
  2. The sample code below for the ReliefF algorithm prints a number at the end of running the code, is this number relevant to feature selection?
import pandas as pd
import numpy as np
from sklearn.pipeline import make_pipeline
from skrebate import ReliefF
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
                           'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
                           sep='\t', compression='gzip')

features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values

clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100),
                    RandomForestClassifier(n_estimators=100))

print(np.mean(cross_val_score(clf, features, labels)))
>>> 0.795

Thanks for any help, I’ve been trying to figure out this code using the internet for a couple weeks now but have not really gotten anywhere

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
ryanurbscommented, Jan 23, 2019

It was supposed to have been set up to handle strings as well, but I’ll have to take a closer look, not sure when I will be able to get to that. In the meantime I’d suggest encoding your variables as integers to avoid the error. Thanks Ryan

Get Outlook for Androidhttps://aka.ms/ghei36


From: Megan notifications@github.com Sent: Wednesday, January 23, 2019 3:40:31 PM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Mention Subject: [External] Re: [EpistasisLab/scikit-rebate] Questions about sample code? (#57)

@ryanurbshttps://github.com/ryanurbs I had another question about allowable datatypes. Are strings not supported by this library? I noticed most of the sample data in this repo contains numbers for each feature and no strings. I am currently trying to use data that has strings, and I receive the following error:

TypeError: unsupported operand type(s) for /: ‘str’ and ‘int’

My data looks something like this:

feature1 feature2 feature3 feature4 red on large open blue off small open

My current code:

feature_pairs = pd.DataFrame(feature_value_pairs)

Separate the features, from the label(s) (bug name(s))

features, labels = feature_pairs.drop(‘class’, axis=1).values, feature_pairs[‘class’].values

Make sure to compute the feature importance scores from only your training set

X_train, X_test, y_train, y_test = train_test_split(features, labels)

fs = ReliefF() fs.fit(X_train, y_train) # This is where the TypeError occurs

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/EpistasisLab/scikit-rebate/issues/57#issuecomment-456957799, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANWn0RwCJc4Q170pYo6fpyYFxFR23LzIks5vGMi_gaJpZM4aLQw-.

0reactions
txsingcommented, Mar 12, 2021

Hello, I am new to python and machine learning but need to use the library for a project. I read the website and the sample code but am still confused on how I can retrieve the features that have been (selected?) by each of the Relief algorithms.

Apologies if the site goes over this, but I didn’t see any information on this. I had a couple questions:

  1. How do we get back the features selected by each algorithm?
  2. The sample code below for the ReliefF algorithm prints a number at the end of running the code, is this number relevant to feature selection?
import pandas as pd
import numpy as np
from sklearn.pipeline import make_pipeline
from skrebate import ReliefF
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
                           'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
                           sep='\t', compression='gzip')

features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values

clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100),
                    RandomForestClassifier(n_estimators=100))

print(np.mean(cross_val_score(clf, features, labels)))
>>> 0.795

Thanks for any help, I’ve been trying to figure out this code using the internet for a couple weeks now but have not really gotten anywhere

I met the same problem, it seems a little bit difficult to find clear instructions on how to get ReliefF object from the pipeline object and to get to know the final selected features. I kept getting 'AttributeError: 'ReliefF' object has no attribute 'feature_importances_' error prompt by calling print(clf['relieff'].feature_importances_)

It will be great if the developer could give a simpler version of the example code showing the intermediate steps without using pipeline.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Top 30 Programming / Coding Interview Questions & Answers
In this Tutorial, we have Provided the Most Common Coding Interview Questions & Answers with Program Logic & Code Examples for you to ......
Read more >
Top 40 Coding Interview Questions You Should Know
Coding Interview Questions On Conceptual Understanding · 1. What is a Data Structure? · 2. What is an Array? · 3. What is...
Read more >
Top 30 Programming questions asked in Interview - Java C ...
1. String Programming Interview Questions · 1) What is the difference between String, StringBuilder, and StringBuffer in Java? (answer) · 2) Why String...
Read more >
Top 109 Scary Coding Interview Questions SOLVED with ...
The Top 13 General Coding, Design & Programming Fundamentals Questions · 1. What are the pros and cons of your chosen technology? ·...
Read more >
Practice for Cracking Any Coding Interview - GeeksforGeeks
Practice 200+ coding interview questions with the help of this course and get yourself interview prepared for your dream company. Mathematical.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found