This demonstration illustrates that with Embedded Bit’s boot time reduction techniques it’s possible to dramatically reduce the cold boot time of an embedded Linux device. Here we’ve demonstrated how machine vision processing can be achieved in less than a second from a cold boot.
The demo consists of an embedded Linux device mounted on a tripod and connected to a camera, display and 7-segment display. The device uses the camera to count the number of yellow balls present on the table beneath and display the count on the 7-segment display. The device also outputs the camera image and ball detection illustration on the LCD display.
When the big red button is pushed the device is reset and after a short cold Linux boot, the 7-segment display indicates the detected number of balls. A little while later once the LCD monitor has woken up, detection is illustrated.
The Boot Time
It takes just 0.8 seconds from the first instructions being executed in the boot loader to having processed the first frame and displayed the first ball count. The application is started within 0.5 seconds of reset. The demonstration illustrates a boot time reduction of 95% from the original boot time of 15 seconds.
The video may give the impression that the boot time is longer, there are several reasons for this:
- The Overo EarthSTORM is hard-wired such that before executing software instructions from the MMC card – it will attempt to boot from the serial port and USB. These attempts increase the overall boot time.
- The big red button interrupts power supply to the board – when power is reapplied the hardware takes some time before taking the processor out of reset.
The Boot Time Reduction Process
As is common with boot time reduction projects – the process started once the product (our ball counting device) was completed.
The boot time reduction process we used was optimization based. This means that instead of applying proprietary and costly intellectual property technologies such as advanced hibernation or replacement boot loaders – we took the existing code base and used our experience and expertise to detect and remove boot delays.
The process is iterative and involves using a variety of instrumentation features provided by the software itself and external hardware devices such as logic analyzers. We also use our own software that can analyse boot output for common causes of boot delays. All this instrumentation gives great clarity as to where the delays in the system arise. Once we are armed with this information we can modify the code base to work-around the delays. In this demo some of the key techniques applied included:
- Removed features that weren’t required. The demo had a very specific purpose of booting from MMC, counting balls, displaying them on the 7-segment display and HDMI output and so everything else was removed. This reduced the size of bootloader, kernel, filesystem and application images such that transferring them from the MMC card would take less time and any associated initialization time of these features would be eliminated.
- To further reduce the amount of data that had to be transferred from the MMC card, we also used tools and techniques to reduce the binary size of application – for example by stripping unused symbols and machine code from binaries.
- To improve the data transfer rate from the MMC card we used DMA and optimized code paths for quicker execution.
- We evaluated the factors which effect getting data into RAM – for example the use of compression, choice of filesystem, choice of storage on the MMC card, etc.
- We ensured that the hardware was being used to it’s full potential by ensure caches were enabled and clock rates were set appropriately.
- There was no need for UBoot – so we removed that.
- We made the application the init process to remove the need for user-space initialization scripts.
The result is optimized software that is targeted exclusively at a particular hardware platform to perform a limited set of required functionality.
Due to the simplicity of the application and limited feature set – the entire process was performed within weeks of effort.
The Hardware Platform
The demo runs on a Gumstix Waysmall Silverlode computer. This consists of an Overo EarthSTORM Computer-On-Module (COM) and Tobi expansion board. This provides a single core ARM Cortex-A8 implemented inside a TI Sitara AM3703 running at up to 1Gz. It has 512 MB of RAM and NAND and has a range of peripherals including MMC/SD, HDMI, USB and Audio.
Connected to the Silverlode is a Gumstix Caspa CMOS camera (Aptina MT9V032), 7-segment display and HDMI display.
The big red button is a push-to-break switch which interrupts the power supply to the Silverlode – providing a convenient means to reset the board and see the cold boot time of the device.
The Software Platform
The following components are used:
- Codesourcery toolchain (arm-2010q1)
- SPL (UBoot) (2012.04.01)
- Linux (2.6.34)
- Minimal Buildroot filesystem (2012.08)
- OpenCV (2.4.2) and custom application
The processor boots directly from the microSD card and all components are stored there.
There is no DSP on the AM3703 and so all image processing is performed on the ARM. The processing is very simple and performs the following steps:
- HSV colour space conversion
- Gaussian image smoothing
- Hue based thresholding
- Hough circle detection
- RGB colour space conversion
Can It Be Even Quicker?
Yes, we think so. We stopped as soon as we reached our sub-second target.