添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Since (at least) last monday (22/3), I've been getting the error below from a background task that's running in AKS on linux. When running the same task from visual studio (on Windows), this error does not occur.

The error occurs anywhere between 1 and 10 times per minute, across 30-50 pods running the task. If I reduce the scaling, it still happens.

The full error:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)  ---> System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.Socket'.    
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)   
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)    
at Calidos.Artemis.Services.RelinkerADTService.Relink(String hisPatientUid) in /src/Models/Calidos.Artemis.Services/MessageProcesses/RelinkerADTService.cs:line 84    
at Calidos.Artemis.QueueProcessor.RelinkADTAction.Run(RelinkMsg msg, CloudQueue theQueue) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/Actions/RelinkADTAction.cs:line 27    
at Calidos.Artemis.QueueProcessor.QueueWorker.ProcessMessage_Relinker(CloudQueue theQueue, Boolean msgOk, String[] msg, Object msgType) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/QueueWorker.cs:line 272 ClientConnectionId:de553492-0d5d-4937-957c-2a76dbb3e8ee Routing Destination:d31d9ada42e9.tr2384.westeurope1-a.worker.database.windows.net,11047   
at Calidos.Artemis.Services.RelinkerADTService.Relink(String hisPatientUid) in /src/Models/Calidos.Artemis.Services/MessageProcesses/RelinkerADTService.cs:line 84    
at Calidos.Artemis.QueueProcessor.RelinkADTAction.Run(RelinkMsg msg, CloudQueue theQueue) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/Actions/RelinkADTAction.cs:line 27    
at Calidos.Artemis.QueueProcessor.QueueWorker.ProcessMessage_Relinker(CloudQueue theQueue, Boolean msgOk, String[] msg, Object msgType) in /src/MessageProcessor/Calidos.Artemis.QueueProcessor/QueueWorker.cs:line 272
System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.Socket'.    
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)    
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)   
at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Int32 optionValue)    
at System.Net.Sockets.Socket.set_NoDelay(Boolean value)    
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel) 

Occasionally, I also see this error 40:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) 

According to https://github.com/dotnet/SqlClient/issues/449, I was asked to ask Azure support. They forwarded me here.

Thanks in advance,
Tom Wuyts

Hello @Tom wuyts ,
Do you still see the issue as of now , if yes - kindly let us know.
Can you SSH into the node on which the pod is running and validate the latest logs in /var/log/syslog (or syslog.1) , check if you are seeing the errors like "eth0: Lost carrier"

If you would have seen those errors , you might be hitting the issue mentioned here https://github.com/Azure/aks-engine/issues/4341
It should be fixed as of today , kindly let us know if you are still seeing the time out errors.

Hi @shiva patpi , I'm still experiencing this error and it's starting to impact our customers. Do you have any other suggestions as to where to look to find a solution for this?

I noticed that there are a lot of connections going to the sql db (about 500-600/minute), could this have something to do with this?