Page 1 of 1

A slight non-runinng issue with 1.75.4

Posted: Fri Nov 04, 2011 11:21 am
by Spooky
Hi,

I've finally decided to have a bash at compiling 1.75.4 (trunk) on my FreeBSD workstation in preparation for the MNSR. After a few additional tweaks to the makefile as detailed here - https://bb.oolite.space/viewtopic.ph ... 75#p146976 it compiles without issue. However once I start the application I'm presented with the splash screen, which disappears and then nothing... the application just exits quietly without error.

I've run the application through gdb and get the following -

(gdb) run
Starting program: /home/spooky/oolite-build/trunk/oolite.app/oolite.dbg
[New LWP 101048]
[New Thread 806e041c0 (LWP 101048)]
[New Thread 806fb9380 (LWP 100462)]
[New Thread 806e0ae40 (LWP 100938)]
[New Thread 806fbfac0 (LWP 100944)]
[New Thread 806fbf900 (LWP 101013)]
[New Thread 806fbee80 (LWP 101014)]
[New Thread 806fbeb00 (LWP 101016)]
[New Thread 806fbe080 (LWP 101020)]
[New Thread 806fbd0c0 (LWP 101041)]
[New Thread 806fbe940 (LWP 101052)]
[New Thread 806fbcf00 (LWP 101053)]
[New Thread 806fbecc0 (LWP 101055)]
[New Thread 806fbcd40 (LWP 101056)]
[New Thread 806fbcb80 (LWP 101058)]
[Thread 806fb9380 (LWP 100462) exited]

Program exited with code 01.

I've run it through multiple times and on different machines and it always seems to terminate with the 2nd spawned thread exiting. I still have the 1.75.3 code and that builds and executes without issue.

Without any obvious error and no core file to examine I set a break point at main and single stepped through the code until I saw the thread in question being spawned.

(gdb) s
OOLogOutputHandlerInit () at src/Core/OOLogOutputHandler.m:127
127 sInited = YES;
(gdb) s
129 if (sLogger != nil)
(gdb) s
131 sWriteToStderr = [[NSUserDefaults standardUserDefaults] boolForKey:@"logging-echo-to-stderr"];
(gdb) s
150 NSRecursiveLock *lock = GSLogLock();
(gdb) s
0x0000000802735760 in pthread_rwlock_trywrlock () from /lib/libthr.so.3
(gdb) info threads
3 Thread 806fb9380 (LWP 100883) 0x000000080273b3cc in __error ()
from /lib/libthr.so.3
* 2 Thread 806e041c0 (LWP 100754) 0x0000000802735760 in pthread_rwlock_trywrlock () from /lib/libthr.so.3
(gdb) thread 3
[Switching to thread 3 (Thread 806fb9380 (LWP 100883))]#0 0x000000080273b3cc in __error () from /lib/libthr.so.3
(gdb) s
Single stepping until exit from function __error,
which has no line number information.
0x0000000802735762 in pthread_rwlock_trywrlock () from /lib/libthr.so.3
(gdb) s
Single stepping until exit from function pthread_rwlock_trywrlock,
which has no line number information.
0x0000000802734f80 in raise () from /lib/libthr.so.3
(gdb) s
Single stepping until exit from function raise,
which has no line number information.
0x0000000802739a80 in pthread_setcancelstate () from /lib/libthr.so.3
(gdb) s
Single stepping until exit from function pthread_setcancelstate,
which has no line number information.
0x0000000802734f9c in raise () from /lib/libthr.so.3
(gdb) s
Single stepping until exit from function raise,
which has no line number information.
0x0000000802735aaa in pthread_rwlock_trywrlock () from /lib/libthr.so.3
(gdb) s
Single stepping until exit from function pthread_rwlock_trywrlock,
which has no line number information.
[New Thread 806e0ae40 (LWP 100408)]
[New Thread 806e04e00 (LWP 101013)]
[New Thread 806e04c40 (LWP 101014)]
[New Thread 806fbfc80 (LWP 101016)]
[New Thread 806fbf900 (LWP 101020)]
[New Thread 806fbee80 (LWP 101041)]
[New Thread 806fbdec0 (LWP 101044)]
[New Thread 806fbf740 (LWP 101052)]
[New Thread 806fbdd00 (LWP 101053)]
[New Thread 806fbfac0 (LWP 101055)]
[New Thread 806fbdb40 (LWP 101056)]
[New Thread 806fbd980 (LWP 101058)]
[Thread 806fb9380 (LWP 100883) exited]

Program exited with code 01.

To my simplistic brain it looks like its something to do with NSRecursiveLock *lock = GSLogLock(); I've compared the OOLogOutput.m and .h from 1.75.3 and 1.75.4 and they're identical. I then decided to find all the files in Core that reference NSRecursiveLock... and they're all identical too.

Does anybody have any other suggestions?

Re: A slight non-runinng issue with 1.75.4

Posted: Fri Nov 04, 2011 12:33 pm
by JazHaz
Have you tried launching with -nosplash as a parameter?

Re: A slight non-runinng issue with 1.75.4

Posted: Fri Nov 04, 2011 2:32 pm
by Spooky
Have you tried launching with -nosplash as a parameter?
Thanks for your reply, I have now. It opens a blank context where the splash screen usual goes and then does exactly the same as before.

Re: A slight non-runinng issue with 1.75.4

Posted: Sat Nov 05, 2011 4:45 pm
by JazHaz
Have you got OpenGL on that machine?

Re: A slight non-runinng issue with 1.75.4

Posted: Sun Nov 06, 2011 6:38 pm
by Spooky
Have you got OpenGL on that machine?
Once again I appreciate your assistance but as I said in my initial post it runs 1.75.3 just fine. The machine has 2 Quadro cards in it and is running the latest (285.09.05) FreeBSD 64bit ports tree drivers.

Code: Select all

GLX Information for pyro:0.0:
  direct rendering: Yes
  GLX extensions:
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig,
    GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control,
    GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context,
    GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile,
    GLX_ARB_create_context_robustness, GLX_ARB_multisample,
    GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB,
    GLX_ARB_get_proc_address

  server glx vendor string: NVIDIA Corporation
  server glx version string: 1.4
  server glx extensions:
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig,
    GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control,
    GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context,
    GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile,
    GLX_ARB_create_context_robustness, GLX_ARB_multisample,
    GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB

  client glx vendor string: NVIDIA Corporation
  client glx version string: 1.4

Re: A slight non-runinng issue with 1.75.4

Posted: Wed Jan 04, 2012 12:40 pm
by Spooky
It looks like I've managed to finally track down the problem.
WARNING your program is becoming multi-threaded, but you are using an ObjectiveC runtime library which does not have a thread-safe implementation of the +initialize method. This means that any classes not already used may be incorrectly initialised, potentially causing strange behaviors and crashes.
To put this into context, the runtime bug has been knoown for several years and only rarely causes problems ... the easy workaround being to ensure that any classes used by a new thread have already been used in the main thread before the new thread starts.
If you are worried, please build/run GNUstep with a runtime which supports the +initialize method. The GNUstep stable runtime (libobjc) and experimental runtime (libobjc2), available from the GNUstep website and subversion repository, should both work.
To disable this warning (eg. for an application which does not suffer any problems caused by this runtime bug), please set the GSSilenceInitializeWarning user default to YES.
Long and short, Oolite's multithreading changed after 1.75.3 and the FreeBSD ports build of GNUStep is at best outdated or at worst broken. I'll try and contact the maintainer however this means an update to the official ports build of Oolite is extremely unlikely.

:(

Re: A slight non-runinng issue with 1.75.4

Posted: Wed Jan 04, 2012 3:13 pm
by JensAyton
FreeBSD currently uses GCC 4.2.1 from 2007, because of a policy decision not to use GPLv3 software. The Objective-C runtime library is provided with the compiler, which is why you have an old runtime and this warning message.

My understanding is that FreeBSD will soon be switching to Clang as the default compiler. With this change, the new GNUstep runtime (libobjc2) will presumably be the default for GNUstep under FreeBSD, and the +initialize problem will go away. I don’t know what the time frame for this is. It is possible to build with Clang and libobjc2 now, but I don’t know the details and don’t know if a port can be built that way.

All that said, it’s not immediately obvious that the crash you described above is actually due to the +initialize issue, which would most likely cause deadlocks or double initializations.

Re: A slight non-runinng issue with 1.75.4

Posted: Wed Jan 04, 2012 3:24 pm
by Micha
That same error message is harmlessly (AFAICT) displayed on various Linux boxes as well. That's not to say it's harmless in your case though!

Re: A slight non-runinng issue with 1.75.4

Posted: Wed Jan 04, 2012 4:51 pm
by Spooky
Ahruman wrote:
FreeBSD currently uses GCC 4.2.1 from 2007, because of a policy decision not to use GPLv3 software. The Objective-C runtime library is provided with the compiler, which is why you have an old runtime and this warning message.
Indeed, however I had to "adjust the port" to build the development release of the GNUstep port just to get those warnings to fire correctly. Hence my statement "[the] FreeBSD ports build of GNUStep is at best outdated or at worst broken". If it was being caused by just the bundled libobjc being obsolete my changing of the gnustep version would make no odds.
Ahruman wrote:
My understanding is that FreeBSD will soon be switching to Clang as the default compiler. With this change, the new GNUstep runtime (libobjc2) will presumably be the default for GNUstep under FreeBSD, and the +initialize problem will go away. I don’t know what the time frame for this is. It is possible to build with Clang and libobjc2 now, but I don’t know the details and don’t know if a port can be built that way.
I have a 9.0-PRERELEASE machine which is self-hosting on Clang and I assume has libobjc2 (I will check later), currently however that's failing on the bundled libjs_static.a linking due to a hidden symbol not being defined. I'll try and get that sorted before starting a conversation about clang and GNUstep ;)

The key points to my posts are that 1.75.3 compiles and runs fine with GCC 4.2.1 and the standard Ports version of gnustep-base 1.19.3. Something was changed in 1.75.4 which adversely affects multi-threading on the older version of GNUstep. 1.75.3, 1.75.4 and 1.76 both compile and run fine with GCC 4.2.1 when you have the development port of gnustep-base 1.22.0 installed.

If you could give me some pointers on what may have changed to cause this incompatibility that would be grand, however I'd rather FreeBSD's GNUstep port was more current and verbose with warnings.

Thanks,

Re: A slight non-runinng issue with 1.75.4

Posted: Wed Jan 04, 2012 5:35 pm
by JensAyton
Spooky wrote:
If you could give me some pointers on what may have changed to cause this incompatibility that would be grand,
I really have no idea. :-/

Re: A slight non-runinng issue with 1.75.4

Posted: Wed Jan 04, 2012 6:09 pm
by JensAyton
Possible workaround: in OOLogOutputHandler.m, comment out this part (from line 149):

Code: Select all

NSRecursiveLock *lock = GSLogLock();
[lock lock];
_NSLog_printf_handler = OONSLogPrintfHandler;
[lock unlock];
This should stop the specific crash, and that code isn’t very important (it intercepts NSLog messages from within GNUstep and adds them to Oolite’s log). However, I wouldn’t be surprised if the same underlying problem causes a crash somewhere else.

Re: A slight non-runinng issue with 1.75.4

Posted: Thu Jan 05, 2012 9:46 am
by Spooky
Thanks for the reply,

I reverted back to gnustep-base 1.19.3, made your suggested changes and recompiled. I'm afraid it behaves exactly as before.

I'll try and get a clang build sorted today, in the long term that will be more desirable.