Siebel crash diaries - A Patchset story
So, we have some development going on using Open UI. All you folks who have worked on Open UI will know this. There is one thing that is consistent across the board - “bugs”.
In our constant fight against those bugs we have decided to go “bleeding edge”. Imbibe every patch for our Siebel version and keep that up every month. This is against our “traditional” way of doing things, but that did not matter since there were bigger issues to solve.
As part of this story - we recently implemented Siebel the Patchset 6 on 220.127.116.11.
In the next two days we saw an increase in the number of crashes while using the application. Siebel crash increasingly became the most loved word and was reported all over the place, but were not serious enough to warrant immediate action.
But of course we are not kind to crashes in many ways. We do what every good developer does.
- Go to the server
- Get a CSV file from the FDR, get crash.txt, and get the Siebel log file
- See which view and what action crashes the application
See a more detailed post on how to debug Siebel application crashes «
Crashes are not hard to debug, but they cannot be called straight-forward. If there are more than a few actions that can cause crash – you are guaranteed to an exciting life for the next 1-2 hours.
The FDR file showed more than 2 distinct errors on the crashing thread.
We take the process id from here and get going through the log files.
Do take note while selecting the log file. Siebel Object Managers are multi-threaded. Multiple processes run on one thread and one process crash can bring down the entire thread. Don’t get confused if you don’t see crash causes in one of the log files. You are just seeing a log that is unrelated to the crash. The best way to pick the log file is by looking for the process id suffixed in the FDR file.
Log file analysis did not turn up anything great. But a pattern was observed - most of the log files and FDR files pointed to one custom view developed a while back.
Crash file was not very useful in this case.
Exception 0xc0000005 at 0x00933849 Thread: 0x0000155c, Process 0x000027cc - CONTEXT - EIP: 0x771c9f45, EFL: 0x00010216, FS: 0x00000053, GS: 0x0000002b CS: 0x00000023, DS: 0x0000002b, SS: 0x0000002b, ES: 0x0000002b EAX: 0xffffffff, EBX: 0x008fa240, ECX: 0x0098fff4, EDX: 0x166b68d4 ESI: 0x0098fff0, EDI: 0x0098fff4, EBP: 0x0018ffec, ESP: 0x0018ffdc - CALL STACK - sslcshar +0x53849 = CCFAtomics::Decrement() +0x9 sslcshar +0x19108 = SSstring::operator=() +0x28 sslcsrvr +0x5f66 = CrashDiagMgr::GetLogFileByCompName() +0xa6 ssmsgbrd +0x3746 = LessComparison::operator()() +0x1fd6 ssmsgbrd +0x747c = LessComparison::operator()() +0x5d0c ssmsgbrd +0x78d3 = LessComparison::operator()() +0x6163 sscfom +0x18cac = CSSService::InvokeMethod() +0x24c sssabsvm +0x3438 = CompMain() +0x378 siebprocmw +0x1d37 siebprocmw +0x4bc9 = SmiBeginTrace() +0x16d9 kernel32 +0x1336a = BaseThreadInitThunk() +0x12 ntdll +0x39f72 = RtlInitializeExceptionChain() +0x63 ntdll +0x39f45 = RtlInitializeExceptionChain() +0x36 - STACK DUMP - 0018ffdc: 86 4a 40 00 00 e0 fd 7e ff ff ff ff 28 74 25 77 .J@....~....(t%w 0018ffec: 00 00 00 00 00 00 00 00 86 4a 40 00 00 e0 fd 7e .........J@....~
The custom view now got into burning focus. We check the intense changes going on in the server –
- Changes to that view, screen, associated applets and BCs
- SRF updates in other parts of the application
- LOV updates, Symbolic String updates
- Runtime event updates
- Docking rule changes
Although there is no trusting any development team, it so happens that we have the best team possible on the job. But this consistent crash in only one Siebel view was turning out to be a hard nut to crack.
- SRF, Repository, Runtime updates were reverted to previously working versions to no avail
- Reverting LOVs from backups seemed to solve the problem for some time, but not for all time. We also had lot of MLOV conversions and this aggravated the permutations that we had to play with
After spending a more than 6 hours and not getting any results, we came back to our favorite topic. Oracle Siebel Patchset 6 that had been installed 2 days previously had to go. After an excruciating wait to uninstall the patch we see that the problem resolved itself.
Stumped? Yes, that’s the word.
The next cycle of “create SR”, “wait”, and “wait some more” started for the newly identified issue, while development continues on Patchset 5.
After a week passed a kind soul from Oracle noticed that we were not lying and indeed on Patchset 6. And, Patchset 6 caused random crashes in the application. The work-around provided was to set a (hitherto unheard) system preference value “Defer Object Initialization” to TRUE.
Patchset now goes back to 6, system preference is set, and the wait begins for more excitement from the next patchset.
No matter how many times we see problems with the patch I continue to be surprised why we trust the patch more than our own code. A simple uninstall would have saved those 6 hours of debugging time. But the log files/FDRs were successful in being cryptic about the errors like never before.
But why on earth should Patchset 6 not like only one view of the hundreds we have? That’s a good question for Sherlock Holmes, not for mortals.
Refer this Oracle Support Document for more details.