http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/t/60530
a problem with NEON under GCC
300
tomrgb
Hi,
I try to use the NEON in a
OMAP3530 on a DevKit8000 board. I run a program without an OS – directly under the U-boot
bootloader. My toolchain worked fine so far. Now, when I try to use the NEON my program hangs on NEON’s intructions. Debugging with a XDS100v2 debugger showed that a program runs to a NEONs instruction (VLD1.32 {D16, D17}, [R7]) to be precise) and jumps to
address 0x00014004, which is Undefined Instruction exception (CPSR.M == 0x1B after jump). The NEON is turned on for sure.
The weird is that the program operation depends on location and size of an array with which I want to work. I didn’t find clear rule, but I manage to run almost successfully program with small array in a stack area. It hanged on a second NEON command, not
first. In one case program couldn’t be run at all. The U-boot hanged after GO command without printing “„## Starting application at 0x80000000”. It looks like problem with a stack, but without NEON’s instructions program works fine. I try to use NEON Intrinsics
and auto-vectorize, but result is the same.
I init the neon with a function:
void NeonInit(void)
{
RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 16);//L2 Cache memory is ON when domain is ON
RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 8);//L2 Cache memory is retained when domain is in RETENTION state
RegisterSet(&CM_CLKSTCTRL_NEON, 0, 2, 0);//Automatic transition of clock state is disabled
RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 2);//Logic & L1 Cache are retained when domain is in RETENTION state
RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 0);//Power state control: ON
RegisterSet(&PM_WKDEP_NEON, 1, 1, 1);//NEON domain is woken-up upon MPU domain wake-up.
RegisterSet(&PM_PWSTCTRL_NEON, 0x3, 2, 0);//Power state control: ON
if (((CM_IDLEST_NEON>>0) & 0x1)==1) print("
NEON standby mode"); else print("
NEON is active");
if (((PM_PWSTST_NEON>>0) & 0x3)==3) print("
NEON Power is ON"); else print("
NEON Power is OFF");
}
A function with neon instructions looks like this:
void NeonArrayInit(int * xOutputArray, int * xInitValArray, int xLength)
{
int i;
printc('a');
uint32x4_t z4 = vld1q_u32((uint32_t *) xInitValArray);
printc('b');
uint32_t *ptrz = (uint32_t *) xOutputArray;
printc('c');
for(i=0;i<(xLength/4);i++)
{
vst1q_u32(ptrz, z4);
ptrz+=4;
}
printc('d');
}
I compile code with Sourcery G++ Lite:
...
arm-none-eabi-gcc -c MPU_neon.c -o MPU_neon.o -Wall -march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
...
arm-none-eabi-gcc main.o startup.o [other files] MPU_neon.o -T script.ls -o main.out -Xlinker -Map=main.map.txt -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
arm-none-eabi-objcopy -O binary main.out main.bin
Script file looks like this:
SECTIONS
{
. = 0x80000000;
.main . : { main.o }
.text : { *(.text) }
.data : { *(.data) }
.bss :
{
PROVIDE (__bss_start = .);
*(.bss)
PROVIDE (__bss_end = .);
}
.bss_f : { bss_file.o (.bss)}
.stack ALIGN(256) :
{
. += 0x1000;
PROVIDE (_stack = .);
PROVIDE (_stack_top = .);
}
}
This settings work fine when I don't use NEON's intructions. I tried to change compiler for gcc-none-linux-gnueabi, because I saw that U-boot is compiled with it. But it showed many errors so I give up.
The Program compiles without any errors or warnings. I don’t see anything weird in the MAP file.
What can cause this problem?
OMAP3530 devkit8000 NEON ARM NONE EABI stack
300
tomrgb
-
7645
Steve Kipisz
I think you also need to enable CP10 and CP11 to enable NEON. I took a look in the Linux kernel and it enables CP10 and CP11. Looking at ARM documentation you would use CP15 to read the Coprocessor Access Control Register, set the bits, then write it back
out.
MRC p15, 0, , c1, c0, 2 ; Read Coprocessor Access Control Register
MCR p15, 0, , c1, c0, 2 ; Write Coprocessor Access Control Register
You might want to use a an OS like Linux that has full NEON support.
Steve K.
-
5830
Jeff L
Tomrgb,
Are you enabling NEON/VFP?
Below is a code snippet I've used to enable NEON for testing without an OS.
;*------------------------------------------------------
; SET THE EN BIT, FPEXC[30] TO ENABLE NEON AND VFP
;*------------------------------------------------------
MOV r0,#0x40000000
FMXR FPEXC,r0
Please let me know if this solves your issue?
Regards,
If this post resolves your issue, please click on the verified answer.
Regards,
Jeff L
-
300
tomrgb
In reply to Jeff L:
Hi,
You're both right! But there is nothing about these registers in OMAP's user manual. I didn't know that ARM needs a separate configuration. I just found a useful example on the web:
http://code.google.com/p/puppybits/source/browse/lib/neon.c?r=d4a4059e39cebfb8ec230254b558b02dcac77816
The comment in neon_init function says it all :)
Now my NEON initialization function looks like this:
void NeonInit(void)
{
unsigned int v;
// *** this took a long time to discover ...
// First, need to enable access to co-processors c10 and c11 - vfp and neon
//Coprocessor Access Control Register
asm volatile("mrc p15, 0, %[res], c1, c0, 2" :[res] "=r" (v));//v = mrc("c1, c0, 2");
v |= 0xf<<20;
asm volatile("mcr p15, 0, %[val], c1, c0, 2" ::[val] "r" (v));//mcr("c1, c0, 2", v);
asm volatile("isb");// required apparently
//Enable NEON instructions in FPEXC ("c8, c0, 0") register.
asm volatile("mcr p10, 7, %[val], c8, c0, 0" ::[val] "r" (1<<30));
RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 16);//L2 Cache memory is ON when domain is ON
RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 8);//L2 Cache memory is retained when domain is in RETENTION state
RegisterSet(&CM_CLKSTCTRL_NEON, 0, 2, 0);//Automatic transition of clock state is disabled
RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 2);//Logic and L1 Cache are retained when domain is in RETENTION state
RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 0);//Power state control: ON
RegisterSet(&PM_WKDEP_NEON, 1, 1, 1);//NEON domain is woken-up upon MPU domain wake-up.
RegisterSet(&PM_PWSTCTRL_NEON, 0x3, 2, 0);//Power state control: ON
}
And NEON works fine. With auto-vectorize array copping is 2 times faster. With Neon intrinsics copping is 4 times faster.
But still there is a problem with program hanging at the start when I change position of the arrays. But it's probably other issue so case is close for now.
Thanks for help
Best regards
Tom