Projects

Ticket #9 (new defect report)

Opened 4 years ago

Last modified 14 months ago

DSS crash on Solaris with more than 200 clients - numCPU detection problem

Reported by: prune@… Owned by:
Priority: critical Version: Solaris
Keywords: numCPUs solaris sparc intel sysconf Cc:

Description

When DSS 5.5.5 up to 6.0.3 is busy (have more than around 200 connexions), it crashes. This is due to the way DSS try to find the number of CPUs available on the host for each request :

#if(__solaris__)
{
    UInt32 numCPUs = 0;
    char linebuff[512] = "";
    StrPtrLen line(linebuff, sizeof(linebuff));
    StrPtrLen word;

    FILE *p = ::popen("uname -X","r");
    while((::fgets(linebuff, sizeof(linebuff -1), p)) > 0)
    {
        StringParser lineParser(&line);
        lineParser.ConsumeWhitespace(); //skip over leading whitespace

        if (lineParser.GetDataRemaining() == 0) // must be an empty line
            continue;

        lineParser.ConsumeUntilWhitespace(&word);

        if ( word.Equal("NumCPU")) // found a tag as first word in line
        {
            lineParser.GetThru(NULL,'=');
            lineParser.ConsumeWhitespace();  //skip over leading whitespace
            lineParser.ConsumeUntilWhitespace(&word); //read the number of cpus
            if (word.Len > 0)
                ::sscanf(word.Ptr, "%"_U32BITARG_"", &numCPUs);

            break;
        }
    }
    if (numCPUs == 0)
        numCPUs = 1;
        
    ::pclose(p);
    
        return numCPUs;
}
#endif

This is not efficient, use a lot of file descriptor and, in the end , crash DSS if the connexion rate is too high.

The solution is to use sysconf instead of the hard way. Thanks to Chris Burns, setting the numCPUs variable as a STATIC variable enables us to do the check only one time at the start. This will reduce the load on a busy server too.

Final code after patch :

#if(__solaris__)
{
    static UInt32 numCPUs = sysconf(_SC_NPROCESSORS_ONLN);  // Only ever called once

    if (numCPUs <= 0)
        numCPUs = 1;

    return numCPUs;
}
#endif

And this is the patch :

diff -ur DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp
--- DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp     Tue May  6 01:28:59 2008
+++ DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp   Wed May 28 10:27:16 2008
@@ -406,39 +414,12 @@

 #if(__solaris__)
 {
-    UInt32 numCPUs = 0;
-    char linebuff[512] = "";
-    StrPtrLen line(linebuff, sizeof(linebuff));
-    StrPtrLen word;
-
-    FILE *p = ::popen("uname -X","r");
-    while((::fgets(linebuff, sizeof(linebuff -1), p)) > 0)
-    {
-        StringParser lineParser(&line);
-        lineParser.ConsumeWhitespace(); //skip over leading whitespace
-
-        if (lineParser.GetDataRemaining() == 0) // must be an empty line
-            continue;
-
-        lineParser.ConsumeUntilWhitespace(&word);
-
-        if ( word.Equal("NumCPU")) // found a tag as first word in line
-        {
-            lineParser.GetThru(NULL,'=');
-            lineParser.ConsumeWhitespace();  //skip over leading whitespace
-            lineParser.ConsumeUntilWhitespace(&word); //read the number of cpus
-            if (word.Len > 0)
-                ::sscanf(word.Ptr, "%"_U32BITARG_"", &numCPUs);
-
-            break;
-        }
-    }
-    if (numCPUs == 0)
-        numCPUs = 1;
-        
-    ::pclose(p);
+    static UInt32 numCPUs = sysconf(_SC_NPROCESSORS_ONLN);  // Only ever called once
     
-       return numCPUs;
+    if (numCPUs <= 0)
+        numCPUs = 1;
+
+    return numCPUs;
 }
 #endif

This patch may breake the Solaris compatibility with versions prior to Solaris 9 (which are almost not supported anyway). I haven't tested this.
On some plateform you may need to also include unistd.h
Use the patch :

diff -ur DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp
--- DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp     Tue May  6 01:28:59 2008
+++ DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp   Wed May 28 10:27:16 2008
@@ -69,10 +69,14 @@
     #include <sys/sysctl.h>
 #endif

-#if (__solaris__ || __linux__ || __linuxppc__)
+#if (__linux__ || __linuxppc__)
     #include "StringParser.h"
 #endif

+#if (__solaris__)
+       #include <unistd.h>
+#endif
+
 #if __sgi__
        #include <sys/systeminfo.h>
 #endif

This have been tested on DSS 5.5.4, 5.5.5 and 6.0.3 on Solaris 10 Sparc, x86 and amd64.
You can find the full Solaris 10 patch, including changes for enabling compilation on Solaris sparc and intel/amd, on my blog, at  http://www.lecentre.net/blog/archives/288

Change History

Changed 4 years ago by stefanparvu14@…

hey,

Isn't this already covered in ticket #7 ? We should have one big patch set with all solaris modifications.

Changed 4 years ago by prune@…

This is covered in the patch of #7 (or on my blog) BUT this is a specific problem which needs a specifig bug report. Maybe DSS devs will make the change quickly this way...

Note: See TracTickets for help on using tickets.