Failed to establish a new connection in Microsoft NNI
Explanation of the problem
The problem at hand involves the use of NNI (Neural Network Intelligence) on a High-Performance Computing (HPC) environment at a school. The code is functioning properly on a personal computer but when attempting to submit tasks on the manager node within the HPC environment, the following error is raised:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=17513): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b352d9c6f28>: Failed to establish a new connection: [Errno 111] Connection refused',))
The individual encountering the error believes that the issue may be related to the URL being used and considers the possibility of using the “nniManagerIP” to resolve the problem. However, it is uncertain what host should be specified in this scenario.
This problem requires technical expertise in network connections, URL resolution, and the specific implementation of NNI within a high-performance computing environment. Further investigation is necessary to determine the root cause and potential solutions.
Troubleshooting with the Lightrun Developer Observability Platform
Getting a sense of what’s actually happening inside a live application is a frustrating experience, one that relies mostly on querying and observing whatever logs were written during development.
Lightrun is a Developer Observability Platform, allowing developers to add telemetry to live applications in real-time, on-demand, and right from the IDE.
- Instantly add logs to, set metrics in, and take snapshots of live applications
- Insights delivered straight to your IDE or CLI
- Works where you do: dev, QA, staging, CI/CD, and production
Problem solution for Failed to establish a new connection in Microsoft NNI
To solve the “Failed to establish a new connection” error in Microsoft NNI, you can try the following steps:
- Check the URL: The error message mentions a ConnectionError with the URL “/api/v1/nni/check-status”. It’s possible that the URL is incorrect or inaccessible, so check to make sure it is correct and accessible.
- Use the correct host: The error message also mentions the host “localhost”, which may not be the correct host for the HPC setup. If this is the case, you can use the nniManagerIP to specify the correct host.
- Verify connectivity: Ensure that there is no firewall blocking the connection to the URL and the ports used by NNI are not being blocked. You can also try to ping the URL to see if there is a connectivity issue.
Here’s an example of how to specify the correct host in your code:
import requests nniManagerIP = "10.0.0.1" url = "http://" + nniManagerIP + "/api/v1/nni/check-status" try: response = requests.get(url) if response.status_code == 200: print("NNI manager is up and running") else: print("Failed to connect to NNI manager") except requests.exceptions.ConnectionError as e: print("Error:", e)
By following these steps, you should be able to resolve the “Failed to establish a new connection” error and successfully use NNI in your HPC setup.
Other popular problems with Microsoft NNI
Problem: Incorrect Configuration of URL
One of the most common issues with Microsoft NNI is incorrect configuration of the URL. The URL is used to connect to the NNI Manager to submit tasks and manage the experiment. When using NNI in a high performance computing environment with multiple compute nodes, it is important to specify the correct URL to ensure that the connection to the manager node is established successfully.
When attempting to use NNI in a high performance computing environment, the user may encounter an error such as “requests.exceptions.ConnectionError: HTTPConnectionPool(host=’localhost’, port=17513): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x2b352d9c6f28>: Failed to establish a new connection: [Errno 111] Connection refused’,)).” This error is raised when the connection to the manager node cannot be established due to incorrect configuration of the URL.
To resolve this issue, the user should specify the correct URL to connect to the NNI Manager, which can be done by using the
nniManagerIP parameter. For example, the code block below demonstrates how to specify the URL in a Python script:
import nni nni_manager_ip = '192.168.0.100' nni_manager_port = 8080 nni.init(nni_manager_ip, nni_manager_port)
Problem: Incompatible Version of NNI
Another common issue with Microsoft NNI is an incompatible version of the NNI package. It is important to ensure that the correct version of NNI is installed and being used, as different versions of NNI may have different requirements and compatibility with other packages.
When attempting to use Microsoft NNI, the user may encounter an error such as “ImportError: No module named ‘nni’.” This error is raised when the required version of NNI is not installed or not being used correctly.
To resolve this issue, the user should ensure that the correct version of NNI is installed and being used. This can be done by using a package manager such as pip to install the correct version of NNI and verifying that the correct version is being used in the script. For example, the code block below demonstrates how to install and import the latest version of NNI in a Python script:
!pip install nni import nni
Problem: Permission Denied Error with NNI
Manager Another common issue with Microsoft NNI is a permission denied error when attempting to connect to the NNI Manager. This error is typically raised when the user does not have the necessary permissions to connect to the NNI Manager and submit tasks.
When attempting to use Microsoft NNI, the user may encounter an error such as “requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://localhost:8080/api/v1/nni/check-status“. This error is raised when the user does not have the necessary permissions to connect to the NNI Manager and submit tasks.
To resolve this issue, the user should ensure that they have the necessary permissions to connect to the NNI Manager and submit tasks. This can typically be done by contacting the administrator of the NNI Manager and requesting the necessary permissions. Additionally, the user may need to update their authentication credentials or reconfigure the NNI Manager to allow connections
A brief introduction to Microsoft NNI
Microsoft NNI (Neural Network Intelligence) is a toolkit that automates the hyperparameter tuning process in deep learning. It was created to make it easier for researchers and practitioners to tune the hyperparameters of their models in an efficient and scalable manner. NNI is designed to work with popular deep learning frameworks such as PyTorch, TensorFlow, and Caffe, and provides support for various machine learning tasks, including image classification, natural language processing, and reinforcement learning.
NNI provides a set of abstractions to represent different machine learning algorithms and the search spaces of their hyperparameters. Users can define their own algorithms and hyperparameters in the configuration file, which will be used by NNI to perform the tuning. NNI uses a Bayesian optimization approach to search the hyperparameter space and optimize the performance metric, and provides various visualizations to help users understand the tuning process and the results. Additionally, NNI provides a simple API to interact with the tuning process, allowing users to easily start and stop tuning, view the status of experiments, and retrieve the results of the tuning.
Most popular use cases for Microsoft NNI
- Microsoft NNI (Neural Network Intelligence) is an open-source toolkit that supports automatic machine learning (AutoML) and hyperparameter tuning.
- Microsoft NNI can be used for a variety of tasks, including:
- Optimizing deep learning models for a specific dataset and problem
- Selecting the best performing neural network architecture for a given dataset
- Automatically tuning hyperparameters for a given model and dataset to achieve better performance.
- As an example, Microsoft NNI can be used to optimize a convolutional neural network (CNN) for image classification tasks. The following code block shows how to implement a basic CNN in NNI:
import nni import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras import backend as K def build_model(): model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) return model def run(params): model = build_model() model.fit(x_train, y_train, batch_size=params['batch_size'], epochs=params['epochs'], verbose=0, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0) return score if __name__ == '__main__': nni.automl.configure_search(optimize_mode=’minimize’) nni.automl.add_model_desc(model_desc=run) nni.automl.run()
It’s Really not that Complicated.
You can actually understand what’s going on inside your live applications. It’s a registration form away.