Introduction
The state of the art in microcontroller technology has evolved to a very high plateau in the short history of embedded systems. Still, further evolution is necessary because of driving factors such as higher performance, lower cost, greater integration, and lower power consumption. Although the above are key selection criteria, a less heralded but nonetheless important factor to consider is ease of software development.
In some cases, a particular microcontroller may be a perfect fit from a hardware standpoint, but software development problems can jeopardize both the development of a product and its longevity in the marketplace. [True Story: an up and coming company in wireless communication based a key product line on a microcontroller that was uniquely suited for the application. Unfortunately there was only one C compiler in existence for it. As competitive pressures required the integration of more features, the code size began to creep up to the size limit of the on-chip memory. As the compiler vendor was unwilling to make necessary improvements in the tool, the company faced two unpleasant choices; re-design using another part and port code or pay an independent software vendor for a major compiler upgrade. Fortunately for the company the latter choice proved cost effective and timely. Nevertheless, this is a situation that a company would really rather avoid.]
Both the core architecture and the implementation of peripherals bear heavily on the success of a microcontroller architecture for a given application. This article focuses primarily on the core, which in this case is the Cortex M3 core based on the ARM v7-M architecture. Vendor specific peripherals such as DMA implementation, timers, I/O and such are beyond the scope of this article.
Unlike the desktop/laptop/netbook PC market, the useful life expectancy of embedded systems products can vary radically. Cell phones have about the same or less life expectancy than a PC, but an appliance, industrial control, aerospace or military system may be expected to be in service considerably longer. This phenomenon exerts differential pressures on microcontroller development. For many applications, an 8051 which had its genesys 30 years ago is perfectly adequate, but a disk drive needs the latest, fastest, cheapest part obtainable. Consequently embedded developers frequently consider older architectures for their applications, having to live with attendant problems inherent in older microcontroller designs. The table below summarizes some of these issues and how the ARM Cortex architecture addresses them. This article focuses on how the ARM Cortex M3 core lends itself to satisfying more demanding applications while at the same time making the software development experience smoother and faster.
|
Problem |
Cortex Solution |
| Cost |
|
| Disjointed Memory Space |
|
| Power Consumption |
|
| Software Implementation |
|
| Safety Critical |
|
| Capability & Efficiency |
|
Table I. Problems with traditional microcontrollers and Cortex solutions
Cost
In the selection of a microcontroller for a design, several attributes are obviously desirable. First and foremost in the minds of many developers is cost. Designs are almost always done with moderate to severe cost constraints on the B.O.M. While paramount importance, cost considerations almost always go hand-in-hand with functionality. In other words, the buck is important, but bang for the buck is the real main concern most of the time. The Cortex attacks this problem by reducing gate count. The fewer gates, the smaller the die, better the yield, and lower fabrication cost. Lower power consumption resulting from lower gate count is a nice side benefit as well. ARM estimates that efficiencies inherent to the Thumb II instruction set that runs on all Cortex M3 controllers, reduces memory demand by about 26% for typical applications. Less memory needed equals lower cost on the B.O.M. Most reviewers of Cortex agree that this architecture makes considerable progress towards optimizing the cost/benefit ratio in the microcontroller choice.
Memory Model
Veterans of 8 and 16-bit implementation recall, not with fondness, that word length imposes certain conditions on the way memory is addressed, especially in older 8-bit architectures. Although modern compiler toolchains ameliorate this problem greatly, it still imposes awkwardness and unnecessary complexity. 32-bit architectures solve the problem with one stroke, allowing a flat 4 Gigabyte memory space. The system memory map is common to all Cortex M3 implementations, with predefined regions for read-only, read-write, system peripherals and external peripherals. This also reduces porting between different vendor offerings and simplifies development.
Power Consumption
For many devices, particularly handheld and/or battery operated products, power consumption is the Holy Grail. Again, reduction of gate count is a first line attack on this problem as well. A technique called clock gating selectively turns off unused functionality helping to reduce power consumption considerably. Further, Cortex hardware allows implementation of a sleep/wakeup on interrupt/process/go-back-to-sleep strategy that has a proven track record in reducing power consumption to lowest possible levels. As we will see below, improvements to the Thumb II instruction mean that more work gets done while the processor is awake resulting in longer sleep times, which also extends the power budget.
Aids to Software Implementation
In most projects, software development occupies a significant, if not largest fraction of the development effort. This is especially true in groundbreaking designs. Cortex has a number of features that help software developers get code to the shipping ready state faster. The Nested Vector Interrupt Controller is standardized on the Cortex M3. Learn it once and you are done. Should you have to switch chip vendors within the Cortex M3 family, that part of the application will not require porting. Developers moving to Cortex from ARM7TDMI will welcome the simplified stack model which uses only two stacks, Main and Process, but will also find the Thumb II instruction set, which is standard across Cortex processors is backwards compatible with the older ARM Thumb instruction set.
A huge jump in software development capability is afforded by vastly superior debugging capability conferred by the CoreSight debugging architecture, which is standard on the core. This debugging capability is different because it is “built in,” not “bolted on.” CoreSight enables debugging capability in low cost debuggers heretofore obtained only by buying much more expensive debugging equipment. Developers migrating from ARM7 or older ARM9 will relish the fact that the pin count required by Single Wire Debugging (SWD) is at least 50% less than standard JTAG interface. More control is achieved with additional on-die debugging resources such as breakpoints and watchpoints. The Single Wire Viewer (SWV) allows output of real time data values to inexpensive debuggers and without incurring application overhead, which is a capability unique to Cortex. Partial trace data is available via the ITM unit with minimal application overhead and no further investment over and above a basic JTAG debugger. This is analogous to a low overhead version of printf() debugging. In many cases, this level of partial real-time tracing is good enough to solve practical problems. Investing in a simple JTAG debugging tool (ca $75 – $299USD) capable of exploiting these features, buys an unprecedented amount of debugging power, much more than anything in the 8 or 16-bit world. For demanding applications, full ETM capability is optionally implemented by some Cortex vendors, and allows full tracing of code and data transactions with the appropriate external debug hardware. Effective debugging solutions cut weeks if not months off the problem resolution and testing phases of software development.
Because of the market share of ARM7TDMI, ARM9, XScale and ARM11 architectures, there are a large number of vendors of ARM development tools in addition to open source ARM tools. These majority of these vendors realized that the short step to adding Cortex capabilities was essential. In this sense, ARM and by continuation Cortex, earns the moniker of the “8051 of the 32-bit world.” There is a wide variety of ARM/Cortex development tools to suit all conceivable budgets and requirements. Open source tools can be obtained for no initial cost. (Note: “no initial cost” does not mean “free.” Users of open source pay with their time one way or another.) Proprietary tools vary considerably in cost from low to high. Higher end tools can be expensive, but offer unrivaled power and capabilities, albeit with a steep learning curve. Tools costs are frequently more than offset by code savings derived from highly efficient compilers, which allow the use of cheaper, lower memory resourced parts. Similar considerations apply to other software components of the solution, e.g. operating systems, flash file components, communication stacks, graphic libraries etc. Most middleware intended for embedded systems use have had ARM ports available for some time, and a Cortex port coming soon if not currently available.
Safety Critical Applications
As embedded applications become more complex and demanding, there is a concurrent rise in the demand for safety critical systems. One essential key to meeting safety critical standards is restricting access to certain regions of code so that wild pointers and other rogue code cannot overwrite critical sections. Cortex controllers with optional Memory Protection Unit enable you to meet this goal. No comparable capability is found on most 8 and 16-bit processors of a comparable cost/performance profile.
Capability and Efficiency
Many Cortex features allow designers to keep up with demands imposed by application complexity. The first line of attack on application complexity is simple and straightforward – clock speed. The faster the clock, the more instructions are executed and the more work done in a given unit of time. The issue how much work gets done within a given clock cycle is very important as well. Here, 32-bit systems have an undeniable advantage owing to the fact that they get more instructions/data to the processor in a given fetch than 8 or 16-bit controllers. Enhancements in the Thumb II instruction set play a role. ARM estimates that Thumb II instructions execute about 35% faster than Thumb instructions on an ARM7TDMI core at comparable clock speeds for comparable applications. Your mileage may vary, but increased efficiency of the Thumb II instruction set has been borne out by the experiences of RTOS and other middleware vendors. Single cycle multiply and hardware assisted divide yields considerable efficiency enhancements in computationally intensive applications.
A tradeoff inherent to pipelined architectures such as ARM is that branches require pipeline flushes that negate some of the advantages accrued by pipelining. Branch prediction partially solves this problem, and no doubt contributes to the purported 35% increase in execution efficiency.
A legitimate critique of older ARM architectures was the inability to manipulate individual bits in a word without tedious and expensive read/modify/write cycles. Aficionados of the 8051 particularly missed their bit addressable area that lived between 0×20 and 0×2F in memory. As well as being computationally expensive, read/modify/write strategy frequently caused further overhead by necessitating interrupt disabling in critical code sections. Bit banding is a technique that from a software perspective looks like direct manipulating a given bit. Bit manipulations via banding are atomic, obviating the need to surround critical sections with interrupt disable/enable instructions, which decreases execution overhead.
Taking a look at instructions added to the original Thumb instruction set sheds further light on claims of enhanced execution efficiency. More and more embedded applications these days include network processing, which makes heavy use of packed data structures. Instructions like BFI (Bit Field Insert) and BFC (Bit Field Clear) increase the efficiency with which data subfields within such structures can be manipulated, greatly reducing overhead of network transactions. Instructions such as UBFX and SBFX (Unsigned and Signed Bit Field eXtraction) are useful in making I/O intensive applications more efficient. A well known RTOS vendor has used the CLZ (Count Leading Zeroes) instruction to good effect, stating that it was possible to cut the time needed to find the highest priority task in a context switch by 50%. This is a major overhead reduction in applications with intensive task switching. Rounding out the field are flow control instructions such as IF-THEN and table branch instructions such as TBB, TBH (Table Branch Byte/Half word) that speed up common branching operations and assist in pipeline branch speculation.
Exception handling in Cortex is both more deterministic and faster that previous ARM architectures and is competitive in this regard with the simpler, fixed schemes of older 8 and 16-bit architectures. This is achieved by optimizing the way the NVIC handles back to back interrupts, and introducing hardware assistance in register saving upon entering exception handling routines. An additional technique known as tail chaining enhances the way late arriving interrupts are handled and eliminates considerable overhead in their scheduling. ARM states that interrupt latency for most cases is 12 clock cycles for Cortex as opposed to 24-42 cycles for ARM7TDMI.
The standardization offered by the NVIC from vendor to vendor would be a major improvement in and of itself, but there is more. As the name implies, interrupts can be conveniently nested without having to implement difficult and error prone assembly language support. Detailed control of interrupt prioritization is possible at run time as well as at system configuration time. The NVIC is also designed to show the developer how a system enters various exception states, such as bus fault, memory fault, and usage fault. Before this kind of visibility became available, users of older architectures, developers frequently endured the frustrating exercise of determining why the core was behaving unexpectedly without detailed information to guide them to a solution. The NVIC is also intimately involved in power saving sleep states. It includes two exceptions, SVC (System Service Call) and PendSV (Pend System Service Call) designed to insulate the application program from hardware under control of the OS, and to provide more orderly handling of context switching. Finally, the NVIC provides a dedicated system tick timer SYSTICK, expressly designed for operating systems, which has the additional benefit of freeing up the general purpose timer(s) for other uses when an OS is not used. Without exaggeration, it can be said that the NVIC is “exception handling on steroids.” Some of the features of the NVIC can be found on 8 and 16-bit architectures, but none of the older parts can match the NVIC in terms of overall performance and flexibility.
Summary
There are many considerations applied to the choice of a microcontroller for an embedded systems design. Much of the time, backward compatibility with existing code or significant investment in development tools or middleware demands staying with an older controller. In cases where these are not important factors, the Cortex M3 architecture approaches the “no-brainer” category for a wide variety of general purpose embedded products. One vendor of software and hardware development tools for embedded systems has gone so far as to say that 80-90% of all power critical microcontroller applications that have been squarely in the territory of a well known 16-bit part can be handled by the Cortex M3. (He further adds that the remainder can be handled by the upcoming Cortex M0!)
Below is a summary table that rates the Cortex M3 on several critically important selection criteria for inclusion into an embedded systems design.
| Factor | Assessment |
| Cost |
|
| Cost/Performance |
|
| Efficiency |
|
| Exception Handling |
|
| RTOS |
|
| Power Consumption |
|
| Debugging Capability |
|
| Safety Critical |
|
| Development Tools and Middleware |
|
| Support/Sourcing
(General purpose uCs) |
|
| Integration |
|
There is no shortage of clever marketing material in the embedded products world. Vendors frequently over-hype, over-promise, over-sensationalize and under-deliver. Is the Cortex the greatest thing since the Swiss Army knife? Maybe, maybe not, but it is clear that Cortex offers many attractive and useful features as shown in Table II. In eleven important selection categories, Cortex M3 rates Excellent in most of them and Competitive in the rest. Perhaps Cortex is not revolutionary, but it is certainly an important evolutionary step in the development of microcontroller technology.


