Subject: Linux-Development Digest #942
From: Digestifier <Linux-Development-Request@senator-bedfellow.MIT.EDU>
To: Linux-Development@senator-bedfellow.MIT.EDU
Reply-To: Linux-Development@senator-bedfellow.MIT.EDU
Date:     Sat, 23 Jul 94 22:13:04 EDT

Linux-Development Digest #942, Volume #1         Sat, 23 Jul 94 22:13:04 EDT

Contents:
  Re: Anyone else lust for threads? (Rob Janssen)
  Re: GOTO haters ..Re: Linux Performance Enhance ? (Erik Fortune)
  PCI ethernet cards supported? (CesareMancini(KB2NOW))
  Found "talk" bug, fixing. (Doug DeJulio)
  Re: IN2000, 1540cf support?? (Jerod Tufte)
  Re: Accessing PCI configuration spaces (Drew Eckhardt)
  Re: Linux Performance Enhance ? (Erik Fortune)

----------------------------------------------------------------------------

From: rob@pe1chl.ampr.org (Rob Janssen)
Subject: Re: Anyone else lust for threads?
Reply-To: pe1chl@rabo.nl
Date: Sat, 23 Jul 1994 21:35:23 GMT

In <30r1ep$kkv@urmel.informatik.rwth-aachen.de> dak@hathi.informatik.rwth-aachen.de (David Kastrup) writes:

>ddt@idcube.idsoftware.com (David Taylor) writes:

>>I may have missed something in the FAQ's, but assuming that there
>>is no thread support in Linux, are there plans to add or document
>>it?  I'm saddened by the fact that our next generation sound code
>>can be developed under NEXTSTEP, Irix, Solaris, etc., but apparently
>>not Linux because of a lack of threads.  Am I wrong?  *hope, hope*

>As far as I remember, the kernel supports threads. That is, the kernel
>trap used for fork has arguments telling it whether to duplicate
>data, environment or similar or not.

>There is currently no C library function to my knowledge making use of
>that. This would mean you'd have to write an Assembler template for
>GNU CC in order to use that kernel functionality.

>Search old posts for information.

It seems like it is time to re-post the clone document again :-)
Maybe this time someone picks up the challenge and uses the info to
really implement something?

Rob


~ Newsgroups: comp.os.linux.development
~ Path: pe1chl!rnzll3!sun4nl!EU.net!howland.reston.ans.net!xlink.net!sbusol.rz.uni-sb.de!coli.uni-sb.de!news.coli.uni-sb.de!uhf.saar.de!midget.saar.de!wg.saar.de!bof
~ From: bof@wg.saar.de (Patrick Schaaf)
~ Subject: Re: Looking for Docs on "clone" system call ?
~ References: <1994Jan3.004554.7942@aio.jsc.nasa.gov>
~ Organization: Yoyodyne Posting Systems, Bellona
~ Date: Mon, 3 Jan 1994 16:09:50 GMT
~ Message-ID: <CJ2A8E.p4C@wg.saar.de>
~ Lines: 230

sohail@trixie (Sohail M. Parekh) asks:
>I was recently told about the "clone" system call but I have been unable to
>find any documentation on it. Could someone point me in the right direction.

I have no idea whether this is in the kernel hackers guide somewhere; if
not, and the following makes any sense, feel free to include it after
adjusting my english appropriately.

Since the question comes up from time to time, I decided to do a bit
kernel reading.  The following might be a useful tour on the matter.
This is from reading the source only, so I might be totally wrong...

The kernel I base this on is pl14i, but I don't think the area was busy
recently.

Any comments? Linus?

Enjoy
  Patrick


clone() - a slightly bent fork()
====================================

clone() has its own entry in the syscall tables, but runs sys_fork().
This is found in kernel/fork.c

The syscall expects the following parameters:
  %eax          __NR_clone      (the syscall number)
  %ebx          stack pointer   (where the child stack should be)
  %ecx          clone flags

OR some of the following constants together to build the flags argument:

  COPYVM
    page tables will be copied to the new process, separating the address
    spaces of parent and child.  This is forced for apparent reasons when
    the stack pointer for the child is the same as for the parent, and it is
    the default for the real fork().

    If COPYVM is not set (this would be normal for a thread implementation),
    mm/memory.c:clone_page_tables() sets up sharing the page directory
    between parent and child.  This should mean that any changes in memory
    mapping in the child or parent will affect the other process, i.e. mmap()
    might work.  I wonder who uses those vm_area thingies.  It looks like
    ->stk_vma is not ok for the clone (a copy of the parents area).  The clone
    has to ensure there is memory at its own stack pointer, probably
    using mmap() for anonymous memory, but it will still get the self-growing
    stack behaviour in the original stk_vma.  I don't want to think about
    what this means for consistency.  When allocating the stack, remember
    those thingies grow towards lower addresses.

    On process exit, if the page directory is still shared with someone,
    the reference is decremented (mm/memory.c:free_page_tables()); the
    tables are left intact until the last decrement.

    It even looks like you can have any of your clones or the parent
    use exec(); that uses mm/memory.c:clear_page_tables(), which does
    the right thing to the page directory.

  COPYFD
    usually, a fork() increments the reference count of the 'struct file'
    for all open file descriptors, parent and child thus share file offset
    and flags.  When COPYFD is given, the child receives duplicates of
    the 'struct file's sharing only the inode (as if they had been
    open()ed independently).

COPYVM and COPYFD come from <linux/sched.h>

You additionally OR a signal number into the flags, which will be sent to the
parent on child exit instead of the usual SIGCHLD.
If I have the parent die before the clones are dead, this might be a way for
normal user processes to confuse the heck out of init.  Whoops, this is
handled in kernel/exit.c:notify_parent().  It is reassuring to find all
nasty things you can think of already handled properly :-)

Besides the sharing mentioned, the clones are independent processes.
When one of them opens a file, nobody else sees it.  When one of them closes
a file, the others keep it open.  Maybe you can pass newly opened files via
Unix domain sockets or SysV message queues - is this implemented?

Also not shared is the SysV IPC stuff, and a lot of Posix process
attributes (pids, pgrps, sessions, controlling ttys, whatever).
This especially means that all clones are scheduled indepentanty,
can run with varying persona, and block without interference.


Related to clones is kernel/exit.c:sys_wait4().  You can OR __WCLONE
into the options parameter to modify wait4() behaviour as follows:

  exit signal SIGCHLD for the child (normal fork() case)
    __WCLONE given:
      does not wait on that child
    __WCLONE not given:
      waits
  exit signal not SIGCHLD (child was created by clone())
    __WCLONE given:
      waits
    __WCLONE not given:
      does not wait on that child

Looks like if you want to manage clone() termination, you should choose
an exit signal other than SIGCHLD, and use wait4(..,..,__WCLONE,..) to
wait for the clone()s only, and wait4(..,..,0,..) to wait for normally
fork()ed processes only.

__WCLONE comes from <linux/wait.h>


clone()'d processes are rather heavyweight for threads, and might not be
appropriate for the threading you want to do.  I do not want to say they
are useless, though; maybe a two-level approach to threading (I hear
that Sun does it) works for all cases (implement user-level threading
within each clone).


Whoever managed to follow me up to here should be rewarded; here is a
an working example for using clone():

#include <errno.h>
#include <stdio.h>
#include <signal.h>
#include <linux/sched.h>
#include <linux/wait.h>
#include <linux/unistd.h>

/* NOTE: the clone() starts up on a new stack. Thus, we cannot return
 * from this function in the clone (we would have to copy stack frames
 * for main and the call). Instead, we call the given function, and
 * call _exit() with whatever it returns.
 */

#define STR(x) #x
#define DEREF_STR(x) STR(x)

int do_clone(unsigned long esp, unsigned long flags, int (*func)(void))
{
  __asm__ (
           /* make the syscall */
           "movl %2, %%edx\n\t"
           "movl %0, %%ebx\n\t"
           "movl %1, %%ecx\n\t"
           "movl $" DEREF_STR(__NR_clone) ", %%eax\n\t"
           "int $0x80\n\t"
           /* error? */
           "jnc 1f\n\t"
           "movl %%eax, _errno\n\t"
           "movl $-1, %%eax\n\t"
           "jmp 3f\n\t"
           "1:\n\t"
           "testl %%eax, %%eax\n\t"
           "jne 3f\n\t"
           /* the clone */
           "call *%%edx\n\t"
           "pushl %%eax\n\t"
           "call _exit\n\t"
           /* not reached */
           "1:\n\t"
           "jmp 1b\n\t"
           /* the parent */
           "3:\n\t"
           "ret\n\t"
           : /* outputs */
           : /* inputs */  /* %0 */ "m" (esp),
                           /* %1 */ "m" (flags),
                           /* %2 */ "a" (func)
  );
  errno = EINVAL;
  return -1;
}

int clone_pid = -1;
int do_terminate = 0;

/* we use sigusr1() for child termination signalling */
void sigusr1(int sig)
{ unsigned long status;
  int pid;

  printf("parent: got SIGUSR1, waiting for children... clone_pid=%d\n",
         clone_pid);
  pid = wait4(clone_pid, &status, __WCLONE, (struct rusage *)0);
  if (pid < 0) {
    perror("wait4");
    return;
  }
  printf("parent: wait4 returned %d\n", pid);
  if (clone_pid == pid)
    do_terminate = 1;
  signal(SIGUSR1, sigusr1);
  return;
}

char clone_stack[4*4096];

int clone_function(void)
{
  clone_pid = getpid();
  fprintf(stderr, "clone running, pid = %d\n", clone_pid);
  sleep(5);
  fprintf(stderr, "clone terminating\n");
  return 0;
}

int main(int argc, char **argv)
{ int pid;

  printf("parent pid = %d\n", getpid());

  signal(SIGUSR1, sigusr1);

  pid = do_clone((unsigned long)(clone_stack+sizeof(clone_stack)-1),
                 SIGUSR1,
                 clone_function);

  if (pid < 0)
    perror("clone");
  else if (pid == 0)
    fprintf(stderr, "funny, clone() returned pid=0 in parent\n");
  else {
    printf("parent: clone running, pid = %d. waiting for termination.\n", pid);
    while (!do_terminate) {
      printf("parent: child did not signal termination, yet.\n");
      sleep(1);
    }
    printf("parent: looks like our kid is gone. BTW, clone_pid = %d\n", clone_pid);
  }
  return 0;
}


-- 
=========================================================================
| Rob Janssen                | AMPRnet:   rob@pe1chl.ampr.org           |
| e-mail: pe1chl@rabo.nl     | AX.25 BBS: PE1CHL@PI8UTR.#UTR.NLD.EU     |
=========================================================================

------------------------------

From: erik@westworld.esd.sgi.com (Erik Fortune)
Subject: Re: GOTO haters ..Re: Linux Performance Enhance ?
Date: 21 Jul 1994 20:09:25 GMT


In article <cairnss.774748222@ucsu.Colorado.EDU>, cairnss@ucsu.Colorado.EDU writes:
> Maybe it's the FORTRAN in me but I prefer 
>       start:
>       if (cond)
>          {
>            stat;
>            goto start;
>          }
> to the "top-down" invention 
>       while (cond) {
>         stat;
>       }
> 
> IF I was a compiler I would have much less difficulty
> generating efficient code from the first example.
You clearly are not a compiler:-).  

Seriously, a decent compiler can do a *lot* more with the second 
statement than with the first.   In fact, any compiler that 
generates worse code for your while loop than for your goto
should probably be removed from your hard drive:-).

Modern compilers are *very* sophisticated and they can take into 
account subtleties of processor characteristics that would make your 
head spin in very short order.

For example, if performance is important (as opposed to the size of 
the generated code), a compiler can unroll the loop as much as is a 
appropriate.  For multiple-issue processors, this can be a *big* win 
(close to an <n>-fold win, where <n> is the number of instructions the 
processor can issue at once).   

Loop unrolling is an incredibly simple example. Most processors place 
restrictions on the kinds of instructions that can be executed 
simultaneously; a good compiler can reorder code into completely
non-intuitive (but functionally identical) chunks to maximize
the amount of overlapped execution.

> Just because performance doesn't mean anything in our MS-DOS
> laden world, doesn't mean the GOTO is inappropriate or
> not useful.   If the CPU can use them, I can use them.
You can use gotos, but that would (usually) be wrong.
They don't even help with performance.

Using a while loop says "execute this block of code a number 
of times" very clearly;  using gotos says, um, I'm not sure
what is says.

When writing programs, you should say what you mean at as high a 
level as possible.  You'll do the compiler and the people who
are faced with your code later on a *big* favor.

-- Erik

------------------------------

From: mancic@rembrandt.its.rpi.edu (CesareMancini(KB2NOW))
Subject: PCI ethernet cards supported?
Date: 23 Jul 1994 22:38:05 GMT

are there any PCI e-net cards that are supported (in any way)
the ACM is putting together a Linux machine, and we're planing on getting 
PCI with the NCR scsiany help is appreciated

-- 
Cesare Mancini  |  mancic@rpi.edu   | Linux, the choice of a GNU generation.
Rensselaer Polytechnic Institute    | You can't spell geek without EE.
Amateur Radio KB2NOW                | "I'm pink, therefore I'm Spam"

------------------------------

From: ddj@zardoz.elbows.org (Doug DeJulio)
Subject: Found "talk" bug, fixing.
Date: 23 Jul 1994 19:55:47 GMT


As some of you may know, the "talk" command distributed with Slackware
2.0 fails for some of the people some of the time.  On my system, it
works for local talk requests and hangs forever for remote talk
requests.

Someone recently mentioned on IRC that they thought it might be fixed
by the "talk.FvK" patch from sunacm.  So, I found that patch and
figured out what it's supposed to do.

It turns out that the patch is already *in* the Slackware talk (at
least in the source area).  BUT, the patch also doesn't do what it's
supposed to.

The original talk got the local address by doing a gethostbyname on
the hostname.  This does not work for people with multiple interfaces
with different addresses -- talk would only work over the interface
that matched the address of your hostname.  The "talk.FvK" patch makes
a quick connect to the remote host and tries to grab the address of
the interface the connection was made through directly from the
socket, via getsockname().

I traced through the get_addrs() call in gdb.  On my system, when it
pulls the address out via getsockname(), it gets 127.0.0.1.  I set the
my_machine_addr variable by hand from within gdb to my sl0 address,
and told the program to continue -- and it all worked perfectly.

So, the "talk.FvK" patch has to be updated.  Given a remote address,
what's the quickest way to get the address of the local interface that
traffic to that address goes through?  I know I can do it by leafing
through /proc/net/route, but is there some more elegant or portable
way?  I'll start implementing the /proc/net/route method tonight, in
any event.

-- 
Doug DeJulio
ddj@zardoz.elbows.org
http://www.pitt.edu/~ddj/

------------------------------

From: jet@b62528.student.cwru.edu (Jerod Tufte)
Subject: Re: IN2000, 1540cf support??
Date: 23 Jul 1994 22:49:41 GMT

Eric Youngdale (ericy@cais.cais.com) wrote:
:       Some people are of the opinion that the old slow release was more 
: unstable.  Any attempt to use the in2000 for swapping would crash the 
: system with the old driver.

:       I would really like to see someone take care of these remaining 
: problems.  Bill (who came up with the most recent versions) does not see 
: these problems, so it is difficult for him to debug the problem.  If 
: someone else would like a challenge, and would like to give this a shot,
: it would be a good project.  Since the old driver was more "stable" in 
: these circumstances, it may be possible to locate some critical difference.

hmm.  I've not seen any problems at all with swapping in the many months
since 0.3 came out.  I've run it hard, under all kinds of conditions
and never a crash.  This is with a BIOS version 1-03 (driver reports
hardware version 39) and a Maxtor 7345 (340meg) drive.  Is there any
chance that Bill or someone will release a driver updated for the latest
kernels?  I'm more than happy to provide any info I can to try to debug
this, or if someone would like to provide a pointer or two, I can
try to mess with it again myself.

Jerod
-- 
       WARNING:  In case of rapture, this computer will be manned.
Drink Jolt!, All the sugar and twice the caffeine.   PopUlating the World
Check out BruceNet at http://b62528.student.cwru.edu/   "Groovy!" --Ash
****jet@b62528.student.cwru.edu  <<< finger me for PGP2.5 public key*****

------------------------------

From: drew@frisbee.cs.Colorado.EDU (Drew Eckhardt)
Subject: Re: Accessing PCI configuration spaces
Date: 21 Jul 1994 20:27:50 GMT

In article <1994Jul18.203105.1370@dagoba.escape.de>,
Frank Strauss <strauss@dagoba.escape.de> wrote:
>Firstly, I'd like to thank Drew Eckhardt for his work on PCI BIOS
>support and the NCR SCSI family driver. I read his article in the german
>iX magazine, and I got the idea to implement a /proc/pci entry for
>reading PCI configuration spaces of available PCI devices. This leads
>to the problem to access the information about the installed device
>before accessing their individual information. In detail:
>
>  How may I find Vendor IDs and Device IDs for the installed devices?

You read the 16 bit configuration registers at offset 0 and 
2 respectively in configuration space for a given bus, device,
and function.

Of course, this assumes that you allready know what bus/device/functions
are used for devices.  

Otherwise, you can try to read configuration registers for function 0 
at all possible device addresses, and check the return code for a bad 
device (all the relevant defines are in include/linux/pci.h of a kernel 
patched with the NCR driver).

If bit 7 of the header type register is 1, then it is a multi-function
device and you should check the other function numbers as well.

>  What is the pci_index for when calling pcibios_find_device?

It specifies which instance of a given PCI device you refer
to.  Ie, you may have multiple ethernets (DEC makes a busmastering
PCI ethernet controller with four PCI ethernet chips behind 
a PCI to PCI bridge) or multiple SCSI chips in a server.  An
index of 0 referse to the first one, 1 the second, etc.

>  What may pcibios_find_class_code be used for?

You can use it to find all devices of a particular class and 
programming interface.  Ie, a search for base class 3, 
sub class 0, programming interface 0 would let you find all
VGA compatable devices in the system.

>  Is there a place where a list of Vendors and Devices is managed?

The PCI SIG maintains the vendor/device assignments, contact 
information is given in the various PCI sources in the NCR 
patches (try include/linux/pci.h).

-- 
Drew Eckhardt drew@Colorado.EDU
1970 Landcruiser FJ40 w/350 Chevy power
1982 Yamaha XV920J Virago

------------------------------

From: erik@westworld.esd.sgi.com (Erik Fortune)
Crossposted-To: comp.lang.c
Subject: Re: Linux Performance Enhance ?
Date: 22 Jul 1994 00:44:40 GMT


In article <774702537snz@genesis.demon.co.uk>, fred@genesis.demon.co.uk writes:
> I agree that C is one of the languages where it can be implemented but it
> isn't the only one (clearly it can be implemented in machine code so
> it isn't inherently a C language issue).

Um, if you find any features of C that *can't* be implemented in machine
code, I'm sure we'd all love to hear about it...

-- Erik

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: Linux-Development-Request@NEWS-DIGESTS.MIT.EDU

You can send mail to the entire list (and comp.os.linux.development) via:

    Internet: Linux-Development@NEWS-DIGESTS.MIT.EDU

Linux may be obtained via one of these FTP sites:
    nic.funet.fi				pub/OS/Linux
    tsx-11.mit.edu				pub/linux
    sunsite.unc.edu				pub/Linux

End of Linux-Development Digest
******************************
