Ticket #9 (new defect report)
DSS crash on Solaris with more than 200 clients - numCPU detection problem
| Reported by: | prune@… | Owned by: | |
|---|---|---|---|
| Priority: | critical | Version: | Solaris |
| Keywords: | numCPUs solaris sparc intel sysconf | Cc: |
Description
When DSS 5.5.5 up to 6.0.3 is busy (have more than around 200 connexions), it crashes. This is due to the way DSS try to find the number of CPUs available on the host for each request :
#if(__solaris__)
{
UInt32 numCPUs = 0;
char linebuff[512] = "";
StrPtrLen line(linebuff, sizeof(linebuff));
StrPtrLen word;
FILE *p = ::popen("uname -X","r");
while((::fgets(linebuff, sizeof(linebuff -1), p)) > 0)
{
StringParser lineParser(&line);
lineParser.ConsumeWhitespace(); //skip over leading whitespace
if (lineParser.GetDataRemaining() == 0) // must be an empty line
continue;
lineParser.ConsumeUntilWhitespace(&word);
if ( word.Equal("NumCPU")) // found a tag as first word in line
{
lineParser.GetThru(NULL,'=');
lineParser.ConsumeWhitespace(); //skip over leading whitespace
lineParser.ConsumeUntilWhitespace(&word); //read the number of cpus
if (word.Len > 0)
::sscanf(word.Ptr, "%"_U32BITARG_"", &numCPUs);
break;
}
}
if (numCPUs == 0)
numCPUs = 1;
::pclose(p);
return numCPUs;
}
#endif
This is not efficient, use a lot of file descriptor and, in the end , crash DSS if the connexion rate is too high.
The solution is to use sysconf instead of the hard way. Thanks to Chris Burns, setting the numCPUs variable as a STATIC variable enables us to do the check only one time at the start. This will reduce the load on a busy server too.
Final code after patch :
#if(__solaris__)
{
static UInt32 numCPUs = sysconf(_SC_NPROCESSORS_ONLN); // Only ever called once
if (numCPUs <= 0)
numCPUs = 1;
return numCPUs;
}
#endif
And this is the patch :
diff -ur DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp
--- DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp Tue May 6 01:28:59 2008
+++ DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp Wed May 28 10:27:16 2008
@@ -406,39 +414,12 @@
#if(__solaris__)
{
- UInt32 numCPUs = 0;
- char linebuff[512] = "";
- StrPtrLen line(linebuff, sizeof(linebuff));
- StrPtrLen word;
-
- FILE *p = ::popen("uname -X","r");
- while((::fgets(linebuff, sizeof(linebuff -1), p)) > 0)
- {
- StringParser lineParser(&line);
- lineParser.ConsumeWhitespace(); //skip over leading whitespace
-
- if (lineParser.GetDataRemaining() == 0) // must be an empty line
- continue;
-
- lineParser.ConsumeUntilWhitespace(&word);
-
- if ( word.Equal("NumCPU")) // found a tag as first word in line
- {
- lineParser.GetThru(NULL,'=');
- lineParser.ConsumeWhitespace(); //skip over leading whitespace
- lineParser.ConsumeUntilWhitespace(&word); //read the number of cpus
- if (word.Len > 0)
- ::sscanf(word.Ptr, "%"_U32BITARG_"", &numCPUs);
-
- break;
- }
- }
- if (numCPUs == 0)
- numCPUs = 1;
-
- ::pclose(p);
+ static UInt32 numCPUs = sysconf(_SC_NPROCESSORS_ONLN); // Only ever called once
- return numCPUs;
+ if (numCPUs <= 0)
+ numCPUs = 1;
+
+ return numCPUs;
}
#endif
This patch may breake the Solaris compatibility with versions prior to Solaris 9 (which are almost not supported anyway). I haven't tested this.
On some plateform you may need to also include unistd.h
Use the patch :
diff -ur DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp
--- DarwinStreamingSrvr6.0.3-Source-Linux/CommonUtilitiesLib/OS.cpp Tue May 6 01:28:59 2008
+++ DarwinStreamingSrvr6.0.3-Source-Solaris/CommonUtilitiesLib/OS.cpp Wed May 28 10:27:16 2008
@@ -69,10 +69,14 @@
#include <sys/sysctl.h>
#endif
-#if (__solaris__ || __linux__ || __linuxppc__)
+#if (__linux__ || __linuxppc__)
#include "StringParser.h"
#endif
+#if (__solaris__)
+ #include <unistd.h>
+#endif
+
#if __sgi__
#include <sys/systeminfo.h>
#endif
This have been tested on DSS 5.5.4, 5.5.5 and 6.0.3 on Solaris 10 Sparc, x86 and amd64.
You can find the full Solaris 10 patch, including changes for enabling compilation on Solaris sparc and intel/amd, on my blog, at http://www.lecentre.net/blog/archives/288

