Problem Definition
A customer has a Citrix Presentation Server 4.0 with Hotfix Rollup Pack 6 located in France.
The SQL data store is located in the USA.
At first, it was reported that IMA would not start on this server .These errors were observed in the event viewer.
error EV ID 3609 "Failed to load plugin ImaFileSs.dll IMA_result_failure"
– And –
“The server running MetaFrame Presentation Server failed to connect to the
Data Store.EV ID 3614 Error – IMA_RESULT_DBCONNECT_FAILURE The database is down or
there is a network failure.”
Troubleshooting
On further discussion, it was observer that, if IMA was left for long enough then it would usually eventually start. The time varied from a few minutes to five hours.
At first, network latency was suspected. A ping from the French server to the US SQL server hosting the data store showed an average of 109ms latency.
The customer was then advised to use a replicated data store, that is, build a replica SQL server in France to host the data store.
The customer then did some more testing. On a US Citrix Presentation Server 4 with latency of just 2ms, IMA started in 50 seconds. However on another Citrix Presentation Server in Germany, with similar latency to the French server (100ms), IMA always started in 15 minutes. It appeared that while latency was a factor in the slow IMA startup times, it did not explain the difference in startup times between the French and German servers.
Whenever IMA starts the following registry key shows the IMA plugin currently loaded .The plugins are loaded one by one.
HKEY_LOCAL_MACHINESOFTWARECitrixIMARUNTIME
Value Type: Reg_SZ
Value Name: CurrentlyLoadingPlugin
It was observer that during the delay, the value remained at MfPnSs.dll. This took over two hours to load.
To determine why this plugin took so long to load, a CDF trace was done on all IMA modules.
This was then filtered to show the time taken to load each plugin.
Loaded Plugin: AieSs.dll. Time required = 5406 ms
Loaded Plugin: Imaadminss.dll. Time required = 5000 ms
Loaded Plugin: ImaAppSs.dll. Time required = 5312 ms
Loaded Plugin: ImaDistSs.dll. Time required = 140 ms
Loaded Plugin: ImaDomain.dll. Time required = 0 ms
Loaded Plugin: ImaExternStorageSs.dll. Time required = 1859 ms
Loaded Plugin: ImaFileSs.dll. Time required = 672 ms
Loaded Plugin: Imagrpss.dll. Time required = 156 ms
Loaded Plugin: ImaNDS.dll. Time required = 156 ms
Loaded Plugin: ImaPolicySs.dll. Time required = 5328 ms
Loaded Plugin: ImaPsSs.dll. Time required = 4671 ms
Loaded Plugin: ImaRass.dll. Time required = 250 ms
Loaded Plugin: Imarelss.dll. Time required = 0 ms
Loaded Plugin: ImaRpcSs.dll. Time required = 0 ms
Loaded Plugin: ImaRuntimeSS.dll. Time required = 16562 ms
Loaded Plugin: ImaSrvSs.dll. Time required = 7125 ms
Loaded Plugin: ImaUserSs.dll. Time required = 1734 ms
Loaded Plugin: ImsSs.dll. Time required = 9547 ms
Loaded Plugin: LmsSs.dll. Time required = 6203 ms
Loaded Plugin: MfAppSs.dll. Time required = 34953 ms
Loaded Plugin: MfBrowserSs.dll. Time required = 1282 ms
Loaded Plugin: MfContentSs.dll. Time required = 1219 ms
Loaded Plugin: MfPnSs.dll. Time required = 10661062 ms
Loaded Plugin: MfPrintSs.dll. Time required = 30579 ms
Loaded Plugin: MfRpcSs.dll. Time required = 47 ms
Loaded Plugin: MfSrvSs.dll. Time required = 952406 ms
Loaded Plugin: RMAlertsSS.dll. Time required = 8375 ms
Loaded Plugin: RMAnalysisSS.dll. Time required = 937 ms
Loaded Plugin: RMAppSS.dll. Time required = 2016 ms
Loaded Plugin: RMBillingSS.dll. Time required = 171 ms
Loaded Plugin: RMLogFileSS.dll. Time required = 719 ms
Loaded Plugin: RMMonitorSS.dll. Time required = 21781 ms
Loaded Plugin: RMReportSS.dll. Time required = 2735 ms
Loaded Plugin: RMServerSS.dll. Time required = 1797 ms
Loaded Plugin: RMSummaryDBSS.dll. Time required = 8844 ms
Loaded Plugin: SmaSS.dll. Time required = 4563 ms
Loading Plugin: AieSs.dll
Loading Plugin: Imaadminss.dll
Loading Plugin: ImaAppSs.dll
Loading Plugin: ImaDistSs.dll
Loading Plugin: ImaDomain.dll
Loading Plugin: ImaExternStorageSs.dll
Loading Plugin: ImaFileSs.dll
Loading Plugin: Imagrpss.dll
Loading Plugin: ImaNDS.dll
Loading Plugin: ImaPolicySs.dll
Loading Plugin: ImaPsSs.dll
Loading Plugin: ImaRass.dll
Loading Plugin: Imarelss.dll
Loading Plugin: ImaRpcSs.dll
Loading Plugin: ImaRuntimeSS.dll
Loading Plugin: ImaSrvSs.dll
Loading Plugin: ImaUserSs.dll
Loading Plugin: ImsSs.dll
Loading Plugin: LmsSs.dll
Loading Plugin: MfAppSs.dll
Loading Plugin: MfBrowserSs.dll
Loading Plugin: MfContentSs.dll
Loading Plugin: MfPnSs.dll
Loading Plugin: MfPrintSs.dll
Loading Plugin: MfRpcSs.dll
Loading Plugin: MfSrvSs.dll
Loading Plugin: RMAlertsSS.dll
Loading Plugin: RMAnalysisSS.dll
Loading Plugin: RMAppSS.dll
Loading Plugin: RMBillingSS.dll
Loading Plugin: RMLogFileSS.dll
Loading Plugin: RMMonitorSS.dll
Loading Plugin: RMReportSS.dll
Loading Plugin: RMServerSS.dll
Loading Plugin: RMSummaryDBSS.dll
Loading Plugin: SmaSS.dll
As shown above, the plugin that took the longest was MfPnSs.dll.
10661062 ms equals 177 minutes. The next longest was mfsrvss.dll which took about 15 minutes.
The trace was again filtered to show the events that take place while MfPnSs.dll. loads.
Note: In the following trace, the actual server names have been changed to server1, server2 and server3 for privacy reasons.
zm CDF_INFO 1 Sending ELECT_MASTER to [SERVER1]
message CDF_INFO 5 set message 2d0001h urgentMessage flag
transprt CDF_INFO 1 Creating new outgoing connection for host server1.
transprt CDF_INFO 1 [server1]: Queueing buffer 08D9D4F0 to outgoing list (connection state: 1).
transprt CDF_ERROR 1 Attempting connection to host server1:2512.
transprt CDF_INFO 1 _IsSocketAddressAllowed: remote connections are permitted, succeeding
zm CDF_INFO 1 Sending ELECT_MASTER to [SERVER3]
message CDF_INFO 5 set message 2d0001h urgentMessage flag
zm CDF_INFO 1 Sending ELECT_MASTER to [SERVER2]
message CDF_INFO 5 set message 2d0001h urgentMessage flag
transprt CDF_INFO 1 [server2]: Queueing buffer 08D9C8C0 to outgoing list (connection state: 1).
zm CDF_INFO 1 ELECT_MASTER_ACK [Edison, SERVER3]: received.
zm CDF_ERROR 1 ElectOrQueryZoneMaster[Edison]: Sending no more packets: ACK has been received.
zm CDF_INFO 1 ElectOrQueryMaster[Edison]: Waiting for NOTIFY or DECLARE.
transprt CDF_ERROR 1 Server socket connect() to host server2 failed with WinSock err: 10060
transprt CDF_INFO 1 [server2]: Aborting buffer list, reason: 80000008h
transprt CDF_INFO 1 [server2]: Aborting buffer at 8D9D578h
message CDF_INFO 1 Message_AbortPacket: message not expecting reply.
transprt CDF_INFO 1 [server2]: Aborting buffer at 8D9C8C0h
message CDF_INFO 1 Message_AbortPacket: message not expecting reply.
distrib CDF_INFO 1 ScanJobQueue: Woke up
zm CDF_INFO 1 ElectOrQueryMaster[Edison]: Timed o
ut waiting for DECLARE. Waiting again.
zm CDF_INFO 1 ElectOrQueryMaster[Edison]: Waiting for NOTIFY or DECLARE.
ds2 CDF_ENTRY 5 KeepAliveRoutine – entry 3
zm CDF_INFO 1 ElectOrQueryMaster[Edison]: Timed out waiting for DECLARE (again).
zm CDF_INFO 1 ElectOrQueryMaster[Edison]: Running ELECTION.
dadodbcaccess CDF_INFO 7 ExecuteWithRetry() returns success(0) after 0 retries
zm CDF_INFO 1 Sending ELECT_MASTER to [SERVER1]
message CDF_INFO 5 set message 2d0001h urgentMessage flag
transprt CDF_INFO 1 [server1]: Queueing buffer 08D9C8C0 to outgoing list (connection state: 1).
zm CDF_INFO 1 Sending ELECT_MASTER to [SERVER3]
The above lines were repeated over and over again for three hours.
The interesting error was the following.
zm CDF_INFO 1 ElectOrQueryMaster[Edison]: Waiting for NOTIFY or DECLARE.
transprt CDF_ERROR 1 Server socket connect() to host server2 failed with WinSock err: 10060
The winsock errors indicated a networking issue connecting to server2.
The customer was queried to find out more about this server. Server2 was found to be a US data collector. Whenever IMA starts, it must announce itself to the data collector in its zone. The French server was incorrectly entered in a US zone. Evidently there was high latency and an unreliable connection to this server from the French server because it took almost 177 minutes to reach it.
The server was then moved to a European zone physically closer to the French server. It was confirmed that there were no longer any latency issues.
Resolution
After moving the server to a new zone, the restart time of the IMA Service was reduced from 177 minutes to 12 minutes.