DSP

a problem with NEON under GCC

2019-07-13 15:31发布

http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/t/60530

a problem with NEON under GCC

300 tomrgb tomrgb Hi, I try to use the NEON in a OMAP3530 on a DevKit8000 board. I run a program without an OS – directly under the U-boot bootloader. My toolchain worked fine so far. Now, when I try to use the NEON my program hangs on NEON’s intructions. Debugging with a XDS100v2 debugger showed that a program runs to a NEONs instruction (VLD1.32 {D16, D17}, [R7]) to be precise) and jumps to address 0x00014004, which is Undefined Instruction exception (CPSR.M == 0x1B after jump). The NEON is turned on for sure.   The weird is that the program operation depends on location and size of an array with which I want to work. I didn’t find clear rule, but I manage to run almost successfully program with small array in a stack area. It hanged on a second NEON command, not first. In one case program couldn’t be run at all. The U-boot hanged after GO command without printing “„## Starting application at 0x80000000”. It looks like problem with a stack, but without NEON’s instructions program works fine. I try to use NEON Intrinsics and auto-vectorize, but result is the same.   I init the neon with a function:   void NeonInit(void) {        RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 16);//L2 Cache memory is ON when domain is ON        RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 8);//L2 Cache memory is retained when domain is in RETENTION state               RegisterSet(&CM_CLKSTCTRL_NEON, 0, 2, 0);//Automatic transition of clock state is disabled        RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 2);//Logic & L1 Cache are retained when domain is in RETENTION state        RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 0);//Power state control: ON        RegisterSet(&PM_WKDEP_NEON, 1, 1, 1);//NEON domain is woken-up upon MPU domain wake-up.        RegisterSet(&PM_PWSTCTRL_NEON, 0x3, 2, 0);//Power state control: ON          if (((CM_IDLEST_NEON>>0) & 0x1)==1) print(" NEON standby mode"); else print(" NEON is active");        if (((PM_PWSTST_NEON>>0) & 0x3)==3) print(" NEON Power is ON"); else print(" NEON Power is OFF"); }
A function with neon instructions looks like this:   void NeonArrayInit(int * xOutputArray, int * xInitValArray, int xLength) {        int i;        printc('a');        uint32x4_t z4 = vld1q_u32((uint32_t *) xInitValArray);        printc('b');        uint32_t *ptrz = (uint32_t *) xOutputArray;                  printc('c');          for(i=0;i<(xLength/4);i++)        {              vst1q_u32(ptrz, z4);              ptrz+=4;        }        printc('d'); }     I compile code with Sourcery G++ Lite:   ... arm-none-eabi-gcc -c MPU_neon.c  -o MPU_neon.o -Wall -march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp ... arm-none-eabi-gcc main.o startup.o [other files] MPU_neon.o -T script.ls -o main.out -Xlinker -Map=main.map.txt -mfpu=neon -ftree-vectorize -mfloat-abi=softfp   arm-none-eabi-objcopy -O binary main.out main.bin     Script file looks like this:   SECTIONS { . = 0x80000000; .main . : { main.o } .text : { *(.text) } .data : { *(.data) }     .bss :   {     PROVIDE (__bss_start = .);     *(.bss)     PROVIDE (__bss_end = .);   }          .bss_f  : { bss_file.o (.bss)}     .stack ALIGN(256) :   {        . += 0x1000;     PROVIDE (_stack = .);        PROVIDE (_stack_top = .);   } }   This settings work fine when I don't use NEON's intructions. I tried to change compiler for gcc-none-linux-gnueabi, because I saw that U-boot is compiled with it. But it showed many errors so I give up.
The Program compiles without any errors or warnings. I don’t see anything weird in the MAP file. What can cause this problem?  
 OMAP3530 devkit8000 NEON ARM NONE EABI stack 300 tomrgb tomrgb
  • 7645 Steve Kipisz Steve Kipisz I think you also need to enable CP10 and CP11 to enable NEON. I took a look in the Linux kernel and it enables CP10 and CP11. Looking at ARM documentation you would use CP15 to read the Coprocessor Access Control Register, set the bits, then write it back out. MRC p15, 0, , c1, c0, 2 ; Read Coprocessor Access Control Register
    MCR p15, 0, , c1, c0, 2 ; Write Coprocessor Access Control Register You might want to use a an OS like Linux that has full NEON support. Steve K.
  • 5830 Jeff L Jeff L Tomrgb, Are you enabling NEON/VFP? Below is a code snippet I've used to enable NEON for testing without an OS.         ;*------------------------------------------------------      ; SET THE EN BIT, FPEXC[30] TO ENABLE NEON AND VFP         ;*------------------------------------------------------        MOV      r0,#0x40000000         FMXR     FPEXC,r0 Please let me know if this solves your issue? Regards, If this post resolves your issue, please click on the verified answer. Regards, Jeff L
  • 300 tomrgb tomrgb In reply to Jeff L: Hi,


    You're both right! But there is nothing about these registers in OMAP's user manual. I didn't know that ARM needs a separate configuration. I just found a useful example on the web: 

    http://code.google.com/p/puppybits/source/browse/lib/neon.c?r=d4a4059e39cebfb8ec230254b558b02dcac77816

    The comment in neon_init function says it all :)

    Now my NEON initialization function looks like this:

    void NeonInit(void)
    {
        unsigned int v;

        // *** this took a long time to discover ...

        // First, need to enable access to co-processors c10 and c11 - vfp and neon
        //Coprocessor Access Control Register
        asm volatile("mrc p15, 0, %[res], c1, c0, 2" :[res] "=r" (v));//v = mrc("c1, c0, 2");
        v |= 0xf<<20;
        asm volatile("mcr p15, 0, %[val], c1, c0, 2" ::[val] "r" (v));//mcr("c1, c0, 2", v);

        asm volatile("isb");// required apparently

        //Enable NEON instructions in FPEXC ("c8, c0, 0") register.
        asm volatile("mcr p10, 7, %[val], c8, c0, 0" ::[val] "r" (1<<30));
           
        RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 16);//L2 Cache memory is ON when domain is ON
        RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 8);//L2 Cache memory is retained when domain is in RETENTION state
       
        RegisterSet(&CM_CLKSTCTRL_NEON, 0, 2, 0);//Automatic transition of clock state is disabled
       
        RegisterSet(&PM_PWSTCTRL_MPU, 1, 1, 2);//Logic and L1 Cache are retained when domain is in RETENTION state
        RegisterSet(&PM_PWSTCTRL_MPU, 0x3, 2, 0);//Power state control: ON
       
        RegisterSet(&PM_WKDEP_NEON, 1, 1, 1);//NEON domain is woken-up upon MPU domain wake-up.
        RegisterSet(&PM_PWSTCTRL_NEON, 0x3, 2, 0);//Power state control: ON
    }

    And NEON works fine. With auto-vectorize array copping is 2 times faster. With Neon intrinsics copping is 4 times faster.
    But still there is a problem with program hanging at the start when I change position of the arrays. But it's probably other issue so case is close for now. Thanks for help Best regards Tom