Reproducing NELL-995 MAP Results
See original GitHub issueThanks very much for releasing the code in accompany with the paper. It definitely makes reproducing the experiments a lot easier. I’ve been playing with the code base and have some questions on reproducing the NELL-995 experiments.
The codebase does not contain the configuration file for NELL-995 experiments, nor does it contains the evaluation scripts for computing MAP. (Maybe you’ve missed them from the release?) I used the hyperparameters reported in “Experimental Details, section 2.3” and the appendix section 8.1 of the paper, which results in the following configuration file:
data_input_dir="datasets/data_preprocessed/nell-995/"
vocab_dir="datasets/data_preprocessed/nell-995/vocab"
total_iterations=1000
path_length=3
hidden_size=400
embedding_size=200
batch_size=64
beta=0.05
Lambda=0.02
use_entity_embeddings=1
train_entity_embeddings=1
train_relation_embeddings=1
base_output_dir="output/nell-995/"
load_model=1
model_load_dir="saved_models/nell-995/model.ckpt"
I run train & test as specified in the README, and evaluate the decoding results using the MAP computation script produced by the DeepPath paper. (I assumed that the experiment setup is exactly the same as the DeepPath paper since you compared head-to-head with them.)
However, the MAP results I obtained this way is significantly lower compared to the reported results.
MINERVA concept_athleteplaysinleague MAP: 0.810746658312 (380 queries evaluated)
MINERVA concept_athleteplaysforteam MAP: 0.649309434089 (386 queries evaluated)
MINERVA concept_organizationheadquarteredincity MAP: 0.944878371403 (246 queries evaluated)
MINERVA concept_athleteplayssport MAP: 0.919186046512 (602 queries evaluated)
MINERVA concept_personborninlocation MAP: 0.775690686628 (192 queries evaluated)
MINERVA concept_teamplayssport MAP: 0.762183612184 (111 queries evaluated)
MINERVA concept_athletehomestadium MAP: 0.519108225108 (200 queries evaluated)
MINERVA concept_worksfor MAP: 0.663530575465 (420 queries evaluated)
I did a few variation on embedding dimensions and also tried to freeze entity embeddings, yet none of the trials produced numbers close to the results tabulated in the MINERVA paper.
Would you please clarify the experiment setup for computing MAP? I want to make sure I did set the hyperparameters to the correct value. Besides, the DeepPath paper used a relation-dependent underlying graph per relation during inference. Did you also vary the graph per relation or used a base graph for all relations like you did for other datasets?
Many thanks.
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (4 by maintainers)
Top GitHub Comments
Hi Victoria, Thanks for trying out our code. Can you kindly point me to the evaluation script you used for evaluation. Unlike DeepPath, we train a single model for all the relations and hence we use a single graph. However to keep evaluation correct, we remove the edges corresponding to the query triple. For example, for the query triple John_Doe --works_for–> Google When MINERVA starts to walk from the node John_Doe, it is not allowed to take the edge works_for to reach Google. (ref: https://github.com/shehzaadzd/MINERVA/blob/master/code/data/grapher.py#L56)
Hi shehzaadzd I have run the experiment on the dataset nell-995 with the config file above, this is my result
config file:
I also try to set the embedding size and hidden size to 50 ,the result is below
config file:
finally , i set the LSTM Layers to 3 according to your paper, the result
However, none of the results are similar to the result in paper, I think i set the hyperparameters completely according to the paper or the appendix. is my config file the optimal parameter for your experiment. Could you help me to reproduce the results? Thanks a lot !!!