Latest News

Icluster2 is definitively stopped !

(2008/06/10)

Q1: How to setup MANPATH (correctly) if necessary?

Because the man command have is own internal list of directories to search for man pages, you have to setup MANPATH with this internal list :

export MANPATH="`man -w`:new_directory1[:new_directory2] ..."

Q2: My jobs are killed randomly and i read some weird messages like kernel: Out of Memory: Killed process 27845 when i use the dmesg command

Our nodes have 3Gigs of ram and 2Go of swap totaling 5 Go of available memory.
When a process ask so much memory that the 5Go are exhausted, the machine becomes unusable. In order to prevent this behavior, linux uses a mechanism named the Out of Memory killer. When the available free memory becomes to small, this mechanism is triggered inside the kernel and use an heuristic to kill brutally one process, thus freeing memory. This heuristic depends on the linux kernel version, and I don't know exactly the details for 2.4.18-e.41smp kernel used on the icluster2, but :

Q3: Since we moved to RedHat AS 3.0, my threaded programs don't run or don't behave as previously.

Since we moved to RedHat AS 3.0, the thread library is now the Native Posix Thread Library. This changes the behavior of programs using threads. One simple example :

#include <pthread.h>
#include <sys/types.h>
#include <unistd.h>

typedef void * (*voidf) (void *);

void * f(void *args){
  printf("Thread PID:%d\n",getpid());
  pthread_exit(0);


}

main(){
    pthread_t mythread;

    printf("Main PID :%d\n",getpid());
    pthread_create(&mythread, NULL, (voidf) &f , NULL);
    pthread_join(mythread, NULL);

  }
This code behaves differently on ita101 (which is still in RedHat 2.1 non NPTL) and the other nodes :
ita101:~>gcc -g -lpthread thtest.c
ita101:~>./a.out
Main PID :20834
Thread PID:20836
ita101:~>ldd a.out
        libpthread.so.0 => /lib/libpthread.so.0 (0x2000000000054000)
        libc.so.6.1 => /lib/libc.so.6.1 (0x20000000000c4000)
        /lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)
ita101:~>ldd a.out
        libpthread.so.0 => /lib/libpthread.so.0 (0x2000000000054000)
        libc.so.6.1 => /lib/libc.so.6.1 (0x20000000000c4000)
        /lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)

ita19:~>gcc -lpthread thtest.c
ita19:~>./a.out
Main PID :18032
Thread PID:18032
ita19:~>ldd a.out
        libpthread.so.0 => /lib/tls/libpthread.so.0 (0x2000000000040000)
        libc.so.6.1 => /lib/tls/libc.so.6.1 (0x2000000000070000)
        /lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)
In fact the new behavior is more correct, but it might produce unexpected bugs in your programs :-) (In this case, pthread_self() must be used to differentiate threads). The old behavior of the pthread library may be obtained by setting the variable : LD_ASSUME_KERNEL to something like 2.4.1 :
Ita19:~>export LD_ASSUME_KERNEL=2.4.1
ita19:~>./a.out
Main PID :19134
Thread PID:19136
More info :

Conclusion: The i-cluster2 users must do their best and use good memory management practices in their programs :-)

Icone page en construction