添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

I am trying to deploy a model trained using AutoML directly from the Portal but the deployment to Container Instance fails. From the logs I can read the errors reported below.

I am not sure what's the problem, it only appeared recently and fun fact is that if I try to deploy a model that was trained about 10 days ago it works without problem.
Did anything change in the meantime? What am I missing?

Exception in worker process
Traceback (most recent call last):
File "/var/azureml-server/routes_common.py", line 37, in <module>
from azureml.api.exceptions.ClientSideException import ClientSideException
ModuleNotFoundError: No module named 'azureml.api'

Traceback (most recent call last):
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/azureml-envs/azureml_058f846b7dd22d1daecb37981c0969bb/lib/python3.6/importlib/ init .py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 978, in _gcd_import
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
File "/var/azureml-server/entry.py", line 1, in <module>
import create_app
File "/var/azureml-server/create_app.py", line 4, in <module>
from routes_common import main
File "/var/azureml-server/routes_common.py", line 39, in <module>
from azure.ml.api.exceptions.ClientSideException import ClientSideException
ModuleNotFoundError: No module named 'azure.ml'

Sure,

i am using autoML, therefore I do not manage the environment.
If I looked at the outputs folder of the model I want to deploy I see the following conda enviroment and environment dependencies files.
I copy paste here the content of the conda env file for reading convenience. For the environment dependencies please refer to the attachment:

Conda environment specification. The dependencies defined in this file will

be automatically provisioned for runs with userManagedDependencies=False.

Details about the Conda environment file format:

https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:

The python interpreter version.

Currently Azure ML only supports 3.5.2 and later.

  • python=3.6.2
  • azureml-train-automl-runtime==1.33.0
  • inference-schema
  • azureml-interpret==1.33.0
  • azureml-defaults==1.33.0
  • numpy>=1.16.0,<1.19.0
  • pandas==0.25.1
  • scikit-learn==0.22.1
  • py-xgboost<=0.90
  • fbprophet==0.5
  • holidays==0.9.11
  • psutil>=5.2.2,<6.0.0
    channels:
  • anaconda
  • conda-forge

    122867-conda-env-v-1-0-0-yml.txt 123011-env-dependencies-json.txt

    @Leo, Sabato I think this is because of a recent change in the SDK, release notes are mentioned here .

    This change is causing an issue:

    azureml-defaults
    We are removing the dependency azureml-model-management-sdk==1.0.1b6.post1 from azureml-defaults.

    An issue is also reported on a different github repo.

    The workaround for now is to update your conda environment yaml file to include the package azureml-model-management-sdk under pip after azureml-defaults

    I hope this helps.

    I also tried to roll back to the previous version of azureml-defaults (same environment that works for previous models) but it fails as well.

    Maybe only changing the yaml file though Azure Store Explorer is not enough?

    Final attempt with the modified yml below also gave the same problem.

    It seams that modifying the yaml file in the outputs folder of the model I want to deploy has no effect as if dependencies are read from somewhere else.
    Any idea?

    azureml-train-automl-runtime==1.33.0
    inference-schema
    azureml-interpret==1.33.0
    azureml-defaults==1.33.0
    azure-ml-api-sdk
    azureml-model-management-sdk

    @romungi-MSFT Also this scenario (modify the yml file using Storage Manager, upload it to the cloud and trigger the deploy from the Portal) fails.

    However I got a different error this time:

    Starting gunicorn 20.1.0
    Listening at: http://127.0.0.1:31311 (11)
    Using worker: sync
    worker timeout is set to 300
    Booting worker with pid: 38
    SPARK_HOME not set. Skipping PySpark Initialization.
    Generating new fontManager, this may take some time...

    Hello

    I also got in to this issue 3 days back when I tried to deploy a pytorch based recommender model with a newer variation. I tried with both the recommended solution - explicitly installing either the library azure-ml-api-sdk . or azureml-model-management-sdk

    Both landed on this new error

    message": "Error in entry script, AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/azureml-envs/azureml_a9071edb452f2dedb0ab60b9e2450ad3/lib/python

    Somewhere I found that Pandas 1.3 is causing this error and 1.2 would solve it- I tried with Pandas 1.2 / 0.25 also but the "Cant get Attribute "new block" stays --

    I had to reschedule a planned demo as I never thought I would get this problem for a very similar model that is running in production.

    Any help is much appreciated -

    Thanks

    Update

    The pandas 1.2 library is working for this issue- Originally I was using it as a reference in the YML file during deployment - But I tried changing it in my environment and ran all the scripts including train and validate scripts and now the deployment worked. Hope this saves sometime for people out there who struggled like me

    Thank you

  •