Core Technology Analysis and Development Practice
In real-time control systems, computational performance often determines the response speed and control accuracy of the system. The Control Law Accelerator (CLA) integrated into the Texas Instruments (TI) C2000 series microcontroller is a revolutionary hardware acceleration technology. It is a fully programmable, independently running 32-bit floating-point coprocessor designed specifically for math intensive computations. CLA can execute real-time control algorithms in parallel with the C28x main CPU, theoretically doubling the overall computing performance. It is particularly suitable for handling low latency control loops, filtering algorithms, and complex mathematical operations.
This article is aimed at embedded engineers, providing a systematic technical guide from multiple dimensions such as CLA's independent operation mechanism, programming model, data sharing, task triggering, debugging techniques, and common difficult problems, to help developers quickly grasp the development points of CLA and avoid engineering pitfalls.
Independence and synchronization mechanism of CLA
CLA is a computing unit independent of C28x CPU. Once configured by the main CPU, CLA can autonomously execute algorithms without the need for intervention from the main CPU. It has its own independent bus structure, register group, pipeline, and processing unit. More importantly, CLA can directly access a large number of peripheral registers, such as ePWM, HRPWM, eCAP, eQEP, ADC, DAC, comparator subsystems, etc., making it very suitable for handling time sensitive control tasks.
The synchronization between C28x and CLA mainly relies on triggering mechanisms. The main CPU can start CLA tasks by writing specific registers or using peripheral interrupts. At the same time, CLA can also send interrupts to the main CPU to notify task completion or occurrence of floating-point overflow/underflow and other abnormal events. This bidirectional interrupt mechanism allows two processors to work in coordination, for example, C28x is responsible for system level communication and diagnostics, while CLA focuses on the underlying fast control loop.
CLA programming essentials: C compiler and data type restrictions
CLA fully supports C programming language, and TI's TMS320C28x code generation toolchain integrates the CLA C compiler. Developers can write CLA programs just like writing regular C code, but due to the constraints of the CLA architecture, there are several limitations to C language support. For more information, please refer to the "CLA Compiler" chapter in the compiler user guide.
Differences in key data types
There is a significant difference in data type interpretation between C28x and CLA, which is one of the most easily overlooked issues in development.
Integer type: On CLA, int is 32-bit; On C28x, int is 16 bits.
Pointer type: C28x treats pointers as 32-bit data types (address bus width is 22 bits, must be represented by 32 bits); The address bus width of CLA is only 16 bits, so pointers are interpreted as 16 bits.
If a structure is defined in the shared header file and contains pointer members, the interpretation of memory layout by C28x and CLA will be inconsistent, resulting in pointer dereference errors. For example:
c struct {float a;float *b;float *c;} X;
In C28x, b and c each occupy 32 bits (two 16 bit words); In CLA, b and c each occupy 16 bits (one word). If an attempt is made to access * (X.c) in the CLA task, the actual accessed address will be offset incorrectly.
Solution: Use a union to align the pointer with a 32-bit integer, forcing the CLA compiler to allocate the pointer to the lower 16 bits while occupying 32-bit space. For example:
c typedef union {float *ptr;uint32_t pad;} CLA_FPTR;
A more general suggestion is to always use fixed width types defined in std int. h, such as int16_t and uint322-t, to avoid directly using fuzzy types such as int and unsigned int. In addition, shared global variables must be defined in the. c file of C28x, and can only be declared as external variables in CLA code. This is because the data page mechanism of C28x has stricter restrictions, and CLA can access the data defined on the C28x side, otherwise it cannot.

Task triggering and nesting support
The tasks of CLA are similar to interrupt service routines (ISR), with each task initiated by a trigger source. The trigger source can be:
Peripheral interrupts (such as ePWM cycle matching, ADC conversion completion, etc.). To determine which peripheral can trigger which task, please refer to the Device Technical Reference Manual (TRM).
Software forced triggering: C28x can start tasks through IACK instructions or by writing bits to CLA's forced register (MIFRC). For example, IACK # 0x0003 triggers both Task 1 and Task 2 simultaneously.
Another CLA task (with limitations): On some devices, CLA cannot directly force another task, but it can be triggered by interrupting C28x and then triggered by C28x software, or indirectly by writing ePWM registers.
Task nesting
Type 0 and Type 1 CLA: do not support task nesting, only one task can be executed at a time.
Type 2 CLA (such as F28004x, F2838x): Supports Background Task mode. Background tasks can run continuously for communication or cleaning work, and can be preempted by high priority foreground tasks, achieving first level nesting. The code for background tasks can set an uninterruptible critical zone.
Code size limit
Type 0 CLA: The program space is 12 bit addressing, which means 4096 words (16 bits per word). All CLA instructions are 32-bit, so the maximum number of instructions is about 2048.
Type 1 and above: The program space is 16 bit addressing and can use space of up to 64K words. The starting address is configured by the interrupt vectors (MVECT1~MVECT8) corresponding to each task. The end of the task is indicated by the MSTOP instruction.
Memory and Peripheral Access Rules
CLA does not have access to all memory blocks and peripherals, and its access permissions depend on the specific device model.
Memory Access
The memory mapping table in the device data manual clearly indicates the RAM blocks that CLA can access. There are usually two types of dedicated message RAM:
CPU to CLA Message RAM: C28x read-write, CLA read-only.
CLA to CPU Message RAM: CLA is readable and writable, C28x is read-only.
In addition, certain memory blocks can be configured as CLA program storage or data storage.
Peripheral access
In the latest device family, CLA has an increasing number of peripherals that can be directly connected. Typical example:
F2803x: ADC result ePWM+HRPWM、 Comparator register.
F2806x: Add eCAP and eQEP.
F2807x/2837x: ADC module ePWM、eCAP、eQEP、 Comparator subsystem DAC、SPI、McBSP、uPP、EMIF、GPIO。
Warning: If the device contains multiple CLAs, they may be connected to different sets of peripherals. Be sure to follow the block diagram and register mapping in the device data manual.
Access arbitration priority
When C28x and CLA access the same resource (such as shared RAM or peripheral registers) simultaneously, the hardware automatically arbitrates, and the priority order is defined in the "Arbitration" section of TRM. Special attention should be paid: If C28x performs a read-write operation on a peripheral register, and CLA modifies the same register between read and write, the modification of CLA will be overwritten and lost. Therefore, the best practice is to avoid two processors writing to the same register.
Common debugging problems and solutions
CLA and C28x share the same JTAG port, and the debugging view of Code Composer Studio (CCS) will display both C28x and CLA cores simultaneously. Developers can independently pause and step through CLA code, observing registers and memory.
The following are typical faults and their troubleshooting methods:
5.1 CLA task never starts
Attempt to forcefully start the task using software and confirm that the task vector configuration is correct. If the software triggers successfully but the peripheral triggers fail, check the initialization timing of the peripheral: the CLA task only responds to the level transition edge of the interrupt source. If the peripheral generates an interrupt flag before CLA initialization, the interrupt will be missed. The solution is to clear the interrupt flag of the peripheral before initializing CLA.
Check if the task is enabled in the MIER register (EALLOW protection).
For Type 0 CLA, ensure that the task start address (MVECT register) is an offset relative to the program space first address, rather than an absolute address.
5.2 Unable to Force Tasks with Software (IACK)
Verify the following points:
The software forcing function (EALLOW protection) is enabled in the MCTL register.
The corresponding interrupt in the MIER register has been enabled.
According to the device TRM, the task trigger source has been configured as software.
The IACK parameters are correct (e.g. IACK # 0x0001 triggers task 1).
Refer to the software example in C2000Ware.
During single step debugging, after running to MSTOP, other tasks were executed
When CLA runs to MSTOP, if there are other tasks that are both in a pending state and enabled, they will automatically start executing. To avoid this phenomenon, the MIER register can be temporarily modified during debugging to prohibit all other tasks.
5.4 Variables in CLA code do not update
A common reason is that the linker command file (. cmd) allocates the. scratchpad or. bss_cola segments to read-only CLA program storage. These two segments must be placed in the readable and writable CLA data RAM, especially. scratchpad, which is used by the compiler for temporary storage. Placing them in a read-only area will result in undefined behavior.
5.5 CLA breakpoint failure after reset
The GEL file of CCS will automatically re enable the CLA clock and breakpoints during reset for debugging purposes. If you do not want this behavior, you can comment or delete the corresponding line in the GEL file.

Comparison of Depth between CLA and C28x+FPU
Many engineers are confused: since C28x also integrates FPU (Floating Point Unit), why do we still need CLA? The difference between the two lies not only in the instruction set, but also in the execution model and performance characteristics.
Instruction set relationship
The CLA instruction set is a subset of the C28x+FPU instruction set, but with a small number of unique instructions. For example, CLA does not support Repeat Block Instructions (RPTB), but supports some local integer operation instructions (AND, OR, XOR, ADD, SUB, shift) as well as local branch/call/return instructions.
List of Key Differences
Project CLA C28x+FPU
Execution method independent of C28x parallel running and C28x fixed point instruction sharing pipeline
4 floating point registers (MR0-MR3) and 8 registers (R0H-R7H)
Two 16 bit auxiliary registers (MAR0, MAR1) and eight 32-bit registers (XAR0-XAR7)
Pipeline 8 stages, completely independent 8 stages, finger/decoding and fixed point sharing
Single step behavior: The pipeline advances 1 cycle and completely refreshes the pipeline
Direct and indirect incremental addressing modes, all C28x addressing modes without data page pointers
The interrupt source comes directly from the peripheral (device related) and is extended through PIE
Task nesting Type0/1 is not supported; Type2 supports Level 1 background software to enable nesting
Floating point multiplication/conversion 1 cycle, 2p cycles without delay slots (2 cycles with delay slots)
Repetitive instructions are not supported and support repeated MACF32 and RPTB
Share RAM, message RAM, and interrupt with C28x communication on the same CPU internal register copy
Program memory access limited to CLA program/data/message RAM, all memory
Performance benchmark considerations
For mathematical functions such as division, sine, cosine, etc., the instructions of CLA and C28x+FPU are not completely equivalent. The difference comes from:
The cycle difference between multiplication and type conversion (CLA single cycle, FPU 2p cycle, but FPU can arrange another instruction in the delay slot).
The difference between branch and call instructions (CLA supports delayed branch, where the three instructions before and after the branch are always executed, requiring developers to fill the delay slot reasonably to improve performance).
Register resource difference (CLA only has 4 floating-point registers, making it more prone to register overflow).
There are fewer addressing modes, which may add additional data movement instructions.
Therefore, the actual acceleration effect depends on the specific structure of the code. For control algorithms that can fully utilize CLA's independent parallelism and no delay slot waste, CLA typically brings significant acceleration.
Method for measuring the execution time of CLA tasks
Accurate measurement of CLA task time is crucial in real-time systems. Recommend the following two methods:
Method 1: Use PWM counter
CLA can access the counter of ePWM module (TBCTR). Read the counter values at the beginning and end of the task and calculate the difference:
c uint16_t ct1, ct2, delta;ct1 = EPwm1Regs.TBCTR;//Measured codect2 = EPwm1Regs.TBCTR;delta = ct2 - ct1;
Pay attention to handling counter overflow situations (using unsigned subtraction to automatically modulo). This method has extremely low overhead, but its resolution is limited by the ePWM clock frequency.
Method 2: Utilize GPIO pins
If CLA can directly access GPIO (with device support), the pin can be set high before the task starts and low at the end, and then the duration of high level can be measured with an oscilloscope. This method is intuitive and not affected by counter overflow, but introduces additional GPIO operation instructions that require evaluation of testing accuracy.
Consider the cost of triggering and completing notifications
There is a fixed overhead for CLA task start and end notifications. To accurately measure the pure algorithm execution time, it is recommended to use an idle timer on C28x to record the entire cycle from software triggering to MIRUN flag clearing, and then subtract the CLA task external overhead (which can be measured in an empty task).
How to terminate CLA task for C28x
In certain abnormal situations, the main CPU needs to forcibly terminate the running CLA task.
If a task has been triggered but has not yet started execution, the main CPU can cancel the task by clearing the corresponding flag bit through the MICLR register.
If the task is already in progress, a Soft Reset signal can be written to the MCTL register, which will terminate the current task and clear the MIER register. To completely reset all registers of CLA, the Hard Reset option can be used.
Note: Forcefully terminating may result in inconsistent data, and the CLA state should be reinitialized after resetting.
Interpretation of common linker warnings
When using the C2000 code generation tool v20.2. x LTS and above versions, the linker may issue the following warning:
"Symbol, X, referenced in a.obj, assumes that data is blocked but is accessing non-blocked data in b.obj. Runtime failures may result"
This warning is used to detect data access consistency: when a target file assumes that data is "blocked" access, while the data in another target file is defined as "non blocked" access, the linker will warn that runtime may fail. A typical scenario is when the global variables defined in CLA are used by C28x code. The solution is to ensure that the access properties of all shared data are consistent, which can be found in the compiler version release notes.
