Author Topic: BBB, PRUs, GPIO latency and determinism (Octoscroller)  (Read 595 times)

Offline ags

  • Newbie
  • *
  • Join Date: Mar 2014
  • Location:
  • Posts: 13
  • Kudos: 0
BBB, PRUs, GPIO latency and determinism (Octoscroller)
« on: February 06, 2017, 07:12:50 PM »
I am exploring creating a variation of a BBB-based standalone player to drive a megatree. This is a good excuse to learn and build something based on the PRUs, which is an interesting concept. I've been struggling to figure out how I'll drive 128 channels with the PRUs, particularly due to the limited pins brought out through the pinmuxes.


I've been studying the existing PRU assembly code for the Octoscroller and direct WS2811 drivers, and I finally realized that these are using standard GPIO (not the dedicated low-latency pins of the PRU itself). If this is correct, then I'm really surprised. While a memory write is non-blocking (according to documentation, "~1 cycle" execution), I presume it is queued and actually hits the register - and therefore the pin - at some later time depending on system load. In other words, while the PRU may run without stalling (as it might when reading) the actual pin state changes seem to be subject to jitter. Has this already been researched and resolved (found to not be an issue)? Am I missing something? Has anyone seen problems if the BBB is running multiple tasks (or is it a requirement that this be avoided)? I was hoping that offloading the bit-banging to the PRUs would allow me to both read/feed the pixel frame data from a userland app as well as playback audio.

Online David Pitts

  • Administrator
  • *****
  • Join Date: Mar 2013
  • Location: Falcon, CO
  • Posts: 3,712
  • Kudos: 61
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #1 on: February 06, 2017, 07:24:00 PM »
Many are using BBB to drive 48 strings. I would suggest running FPP setup to run WS2811 pixels and give it a try to see if it meets your requirements.
PixelController, LLC
PixelController.com

Offline ags

  • Newbie
  • *
  • Join Date: Mar 2014
  • Location:
  • Posts: 13
  • Kudos: 0
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #2 on: February 06, 2017, 09:00:30 PM »
Re-reading the OP I realize I was not clear in my intent. The BBB/Octoscroller and WS2811 drivers work. They work well, with many people successfully using them in their displays. If it seemed that I was criticizing the design that's not intended. It's also refreshing to see an online response that was tempered and calm in such a situation. Kudos to you.


So I've read pretty much everything I can find on the PRUs. Either I have misunderstood (that driving standard GPIOs uses L3/L4 interconnect with unpredictable delay) or the "jitter" is within the WS2811 tolerance - or there were other protections crafted elsewhere to mitigate this (on the linux side perhaps). I guess I should ask if anyone has put a scope on the outputs to see if there is any jitter - and if so how much? If others have run 48 strings successfully then that's sufficient proof for me. I'm trying to understand how it works. Thanks.

Online CaptainMurdoch

  • Administrator
  • *****
  • Join Date: Sep 2013
  • Location: Washington
  • Posts: 7,941
  • Kudos: 142
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #3 on: February 06, 2017, 09:46:28 PM »
It sounds like you are mixing up the PRU and the main system CPU.  The Octoscroller and ws281x code run on the PRU which are two separate processors. They are not affected by anything running on the main CPU under Linux unless there is manual locking or syncing going on like with the ws281x code which waits for a flag set from the library under linux.  The Octoscroller code just runs as fast as possible and doesn't run in sync with the library code in Linux.

We didn't write the original
LEDscape code though, you may want to read more about that, we just have our own fork on github and updated versions of some of the PRU files.  David also added the serial PRU files for DMX and pixelnet.
-
Chris

Offline ags

  • Newbie
  • *
  • Join Date: Mar 2014
  • Location:
  • Posts: 13
  • Kudos: 0
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #4 on: February 06, 2017, 10:07:17 PM »
Yes, I did look at the LEDscape code as well as all the FPP PRU code.


I also understand your point about the separation between main CPU (A8) and the PRUs. When coded to only use resources within the PRU-ICSS, the PRUs are deterministic. That's actually why I was trying to use the PRU direct outputs - they are considered "local resources" to the PRU-ICSS. What I found was that the LEDscape, Octoscroller, WS2811 and matrix code is using standard GPIO. They are not local to the PRU-ICSS. Driving a standard GPIO (output) requires writing to a (global) memory mapped address. This means that it passes through the OCP-HostMaster onto L3/L4 global system interconnect fabric. Being a global resource it is owned by linux and subject to other requests. As an example, reading from DRAM local to the PRU-ICSS requires 1+"# of 32-bit words read" cycles. Reading one 32-bit word from system memory requires 60+ cycles - and may be further delayed based on other demands. That's why I was surprised.


However, all the current code works. I think what matter's isn't so much the delay getting to an output pin as much as the consistency. Even if writes are delayed 50 PRU clocks, that's 250uSec. If the delay is relatively constant, it seems reasonable that any skew from audio/video output to other devices would not be perceptible. However, if the delay varies from 50 to 25 to 75 or 100 PRU clocks, that could result in invalid timing for the devices being driven (even the relatively slow WS2811s). But it *does* work, and that's why I'm trying to figure out where my understanding and/or assumptions are incorrect.

Online David Pitts

  • Administrator
  • *****
  • Join Date: Mar 2013
  • Location: Falcon, CO
  • Posts: 3,712
  • Kudos: 61
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #5 on: February 06, 2017, 10:51:31 PM »
Seems like you have done your research. We just used the LedScape library and it worked for 48 strings. Chris and I never really got into reading all the supporting documentation that exists and never really questioned why it was working. Interesting to know if you find anything out. We do know as we added high speed serial port data in the other PRU that timing loops were delayed in the Pixel PRU. Maybe that was the reason for delay.

Best way proceed would be to connect one up and use O=Scope and another tools to see if it will work for you.

Online CaptainMurdoch

  • Administrator
  • *****
  • Join Date: Sep 2013
  • Location: Washington
  • Posts: 7,941
  • Kudos: 142
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #6 on: February 06, 2017, 11:31:54 PM »
Ditto, you just exceeded my knowledge of the BBB PRU interface.  Never really had to learn as David said.  My quick fix solution to the pixel issue when we added the serial code was to make a new neutered PRU file which drove fewer GPIO pins.  It still reads the same amount of data from memory though since the code always reads 48 outputs worth of pixels.

Offline ags

  • Newbie
  • *
  • Join Date: Mar 2014
  • Location:
  • Posts: 13
  • Kudos: 0
Re: BBB, PRUs, GPIO latency and determinism (Octoscroller)
« Reply #7 on: February 07, 2017, 10:01:28 AM »
1) IMO you have done an impressive job with the BBB and entire Falcon line of s/w & h/w. Far more flexible and expansive in scope than my efforts; I tend towards pushing the limits for specific use cases. I'd say you (with the development community you've created/fostered) have had a transformative effect on the hobby.
2) I have a specific interest in learning about the BBB, which I see can have a multitude of applications marrying the power, scope and flexibility of linux on cheap, fast processors with a hard real-time ability provided by the PRUs. So I'm investing in developing expertise here.
3) I'm connecting with the BBB community, including the kernel maintainer and h/w designer to see if I can confirm my understanding. I will share what I learn that could be useful.
4) On the one hand, the symptoms you report seem aligned with my understanding. If you only used resources contained within the PRU-ICSS I think there would be no impact between the different PRUs, and each PRU would be completely deterministic in time. On the other hand, if you are reading the same amount of data from memory and yet the timing delay was resolved, that implies that reducing the number of writes to the GPIO ports (fewer output pins used) has had an effect, which I didn't expect (if the writes still take 1 PRU cycle and are queued by the interface fabric (bus controllers). It's just a guess, but I wonder if what is happening is that each transition for each pin is a separate request (rather than a burst transfer) and ends up causing congestion that ultimately slowed down the ability to push data into the shared memory from userspace on the app side. That's all I've got at the moment...

 

Back to top