Instruction set relationship
The CLA instruction set is a subset of the C28x+FPU instruction set, but with a small number of unique instructions. For example, CLA does not support Repeat Block Instructions (RPTB), but supports some local integer operation instructions (AND, OR, XOR, ADD, SUB, shift) as well as local branch/call/return instructions.
List of Key Differences
Project CLA C28x+FPU
Execution method independent of C28x parallel running and C28x fixed point instruction sharing pipeline
4 floating point registers (MR0-MR3) and 8 registers (R0H-R7H)
Two 16 bit auxiliary registers (MAR0, MAR1) and eight 32-bit registers (XAR0-XAR7)
Pipeline 8 stages, completely independent 8 stages, finger/decoding and fixed point sharing
Single step behavior: The pipeline advances 1 cycle and completely refreshes the pipeline
Direct and indirect incremental addressing modes, all C28x addressing modes without data page pointers
The interrupt source comes directly from the peripheral (device related) and is extended through PIE
Task nesting Type0/1 is not supported; Type2 supports Level 1 background software to enable nesting
Floating point multiplication/conversion 1 cycle, 2p cycles without delay slots (2 cycles with delay slots)
Repetitive instructions are not supported and support repeated MACF32 and RPTB
Share RAM, message RAM, and interrupt with C28x communication on the same CPU internal register copy
Program memory access limited to CLA program/data/message RAM, all memory
Performance benchmark considerations
For mathematical functions such as division, sine, cosine, etc., the instructions of CLA and C28x+FPU are not completely equivalent. The difference comes from:
The cycle difference between multiplication and type conversion (CLA single cycle, FPU 2p cycle, but FPU can arrange another instruction in the delay slot).
The difference between branch and call instructions (CLA supports delayed branch, where the three instructions before and after the branch are always executed, requiring developers to fill the delay slot reasonably to improve performance).
Register resource difference (CLA only has 4 floating-point registers, making it more prone to register overflow).
There are fewer addressing modes, which may add additional data movement instructions.
Therefore, the actual acceleration effect depends on the specific structure of the code. For control algorithms that can fully utilize CLA's independent parallelism and no delay slot waste, CLA typically brings significant acceleration.
Method for measuring the execution time of CLA tasks
Accurate measurement of CLA task time is crucial in real-time systems. Recommend the following two methods:
Method 1: Use PWM counter
CLA can access the counter of ePWM module (TBCTR). Read the counter values at the beginning and end of the task and calculate the difference:
c uint16_t ct1, ct2, delta;ct1 = EPwm1Regs.TBCTR;//Measured codect2 = EPwm1Regs.TBCTR;delta = ct2 - ct1;
Pay attention to handling counter overflow situations (using unsigned subtraction to automatically modulo). This method has extremely low overhead, but its resolution is limited by the ePWM clock frequency.
Method 2: Utilize GPIO pins
If CLA can directly access GPIO (with device support), the pin can be set high before the task starts and low at the end, and then the duration of high level can be measured with an oscilloscope. This method is intuitive and not affected by counter overflow, but introduces additional GPIO operation instructions that require evaluation of testing accuracy.
Consider the cost of triggering and completing notifications
There is a fixed overhead for CLA task start and end notifications. To accurately measure the pure algorithm execution time, it is recommended to use an idle timer on C28x to record the entire cycle from software triggering to MIRUN flag clearing, and then subtract the CLA task external overhead (which can be measured in an empty task).
How to terminate CLA task for C28x
In certain abnormal situations, the main CPU needs to forcibly terminate the running CLA task.
If a task has been triggered but has not yet started execution, the main CPU can cancel the task by clearing the corresponding flag bit through the MICLR register.
If the task is already in progress, a Soft Reset signal can be written to the MCTL register, which will terminate the current task and clear the MIER register. To completely reset all registers of CLA, the Hard Reset option can be used.
Note: Forcefully terminating may result in inconsistent data, and the CLA state should be reinitialized after resetting.
Interpretation of common linker warnings
When using the C2000 code generation tool v20.2. x LTS and above versions, the linker may issue the following warning:
"Symbol, X, referenced in a.obj, assumes that data is blocked but is accessing non-blocked data in b.obj. Runtime failures may result"
This warning is used to detect data access consistency: when a target file assumes that data is "blocked" access, while the data in another target file is defined as "non blocked" access, the linker will warn that runtime may fail. A typical scenario is when the global variables defined in CLA are used by C28x code. The solution is to ensure that the access properties of all shared data are consistent, which can be found in the compiler version release notes.