News:

LATEST RELEASE:  FPP 6.1 - Download from here - https://github.com/FalconChristmas/fpp/releases/tag/6.1

+-+-

+-User

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?

+-Site Stats

Members
Total Members: 15481
Latest: nicholas.buckler
New This Month: 103
New This Week: 17
New Today: 7
Stats
Total Posts: 126913
Total Topics: 15565
Most Online Today: 76
Most Online Ever: 7634
(January 21, 2020, 02:14:03 AM)
Users Online
Members: 2
Guests: 50
Total: 52

Timing irregularities when using WS2811 pixels on BBB GPIO

Started by fauxton, December 01, 2021, 01:47:56 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

fauxton

Hello, we've been using FPP and falcon pixel controllers with great success for quite a few years now.  We've run into a scenario where it would be beneficial to have a pixel controller in our own form factor to work better for our application.  We have been testing using the beagle board black as the pixel controllers on our FPP system.  Of course, the FPP control and communication works flawlessly but I'm running into problems with presumably how the pixel data is sent out the GPIO on the BBB.

I'm quite familiar with the WS2811 data timing as I have created several other large projects running on 16 bit micros, creating my own graphics library and sending the data out for roughly 1000 pixels at about 140 FPS using only code that I wrote with no external libraries.  I have a significant background in this type of work.

This is not in any way a complaint, I am just looking for a confirmation that what I am seeing is normal or maybe an explanation of how the hardware is being used on the BBB to control the timing for the pixel data. 

Side note: The pixel data timings are rock solid on the falcon pixel controllers that utilize the Xilinx Spartan part.

What I am seeing is tiny delays in the pixel data in the middle of (or between?) data bits.  When pixels are electrically connected physically near the output drivers, they tend to work fine.  As soon as I add a few feet of wire, it seems that the timing glitches in addition to the many adverse effects that come from sending 600ns pulses on several feet of wire end up with a signal that the pixels can't reliably read.  I have a lot of experience with getting the WS2811 data down long runs of wire and have designed several drivers that work great for as much as 50 feet without using differential signals or any termination.  Also, the falcon pixel controller drivers handle wire length with no issue. 

I have no questions about signal integrity on long wires, just focusing on how the software works on the BBB to see if there is a way around this.



To make it easy to see the problem I am describing, I set up a BBB using FPP display testing "test mode fill color".  I can assure this happens the same way during normal sequence playback or anything else I have tried

Here we see an excerpt of data (all protocol ones, RGB full) at 10us/div

You can see a delay at 8 bits in, and then 24 bits later.  They seem to happen randomly but always in 8, 16, 24 increments (makes sense because of the RGB data)

Same sample but at 1us/div

It seems like the delay always happens in the off period of each bit.

Here is a sample of all data zeros (RGB all off) at 10us/div


And that sample zoomed in to 1us/div


It would be super if someone could confirm if this is a known issue caused by some limitation.  It would be even cooler if there was a way to fix it!

As is, we are unable to use the BBB for pixel outputs on our FPP systems since this timing issue causes pixel glitching.


Thank you for reading through my long post.  I look forward to being a part of this online community to give and receive help!
I am building pixel interface capes for BBB - http://www.ledpixeldriver.com

dkulp

I'm a little confused/concerned about why you seen pixel glitches when there are hundreds of folks here using BBB based controllers with ws281x pixels and aren't seeing glitches.

Anyway, to answer the question, every 8 bits, the BBB PRU goes out to memory to load the next 8 bits of data into the PRU registers.  There is also some other book keeping it may need to do during this period (particularly if there are smart remotes configured).     The memory access may have delays, especially if it needs to go off to main memory.  The first 19.5K of data is put on the PRU's local memory which is faster, but once the data goes beyond that, the delays you see would likely be even longer.

The other thing that causes delays (but I'm slightly surprised its only on the 8bit boundaries) is the GPIO writes.   We use the GPIO register access to set the pins.  That involves an 4 byte write to a memory location at each transition.   That involves going off the PRU again to the main ARM address space to write to the GPIO register.   Depending on which pins are used, it could have to do this for all four GPIO registers.   Each write to that address space involves messages on the peripheral busses on the am335x chip.   If something else in the system is accessing something on those busses, there could be a slight delay with the write.  That said, we do set the PRU as having highest priority so it's rare.

If we limited the outputs to only 20 pins (even fewer on the PocketBeagle) total instead of 48, we could change the pinmux to R30 outputs from the PRU's.  That would then not require the GPIO writes.  The code would need to be updated to handle that, but I don't think that's major.   Couple of macros would need adjusting, some updates to BBB48String to detect when it could do it, etc...     It would be 8 outputs on one PRU and 12 on the other.   (it might be 14 on the second, not 100% sure on that).  Basically, the reason we use the GPIO registers instead of R30 is to get the full 48 strings. 

Finally, do you know which pins you used for timing?    If using a GPIO0 based pin, the delays there are bigger than those on GPIO1-3.   GPIO0 hangs off the slower L4 bus instead of the L3 bus and thus has additional points of contention.    

One additional note:  on the BBB AI, I believe ALL of the pins are accessible on one of the 4 PRU's.   Each of the PRU's also have more memory and with the pins split up, they likely could each have all their data locally and not need to go to main memory.   Thus, if someone spent time porting FPP to the AI, there are definitely other options.  That said, the AI is expensive and runs very hot using a lot more power.  That's why the port to the AI was never really started.  

Anyway, as I said, there are hundreds of folks not seeing glitches/issues so I'm not really sure why you are.
Daniel Kulp - https://kulplights.com

pixelpuppy

Quote from: fauxton on December 01, 2021, 01:47:56 PMI'm quite familiar with the WS2811 data timing as I have created several other large projects running on 16 bit micros, creating my own graphics library and sending the data out for roughly 1000 pixels at about 140 FPS using only code that I wrote with no external libraries.  I have a significant background in this type of work.
I'm very curious how that is possible with ws2811 timing.  As I'm sure you know, the ws2811 has two speeds, hi and lo.  The high speed is 800KHz which is also 1.25us per bit or 30us per pixel.

At 30us per pixel, 1000 pixels would take 30ms which is about 33FPS and that is not accounting for the required 250uS inter-frame 'reset' gap (changed from the original 50us 'reset' gap).  On the other hand, 140FPS is a frame timing of about 7ms and at 30us per pixel that's about 233 pixels per frame.

Can you elaborate a bit on how you're squeezing 30ms worth of bit timing into a 7ms timeframe?
-Mark

fauxton

dkulp, This is exactly the type of answer I was looking for, thank you.

I am definitely not saying there is a problem with the code, just was looking for the exact explanation you were able to provide.

Here is an excerpt from my prototype interface board (just a power supply for the BBB and level shifters / drivers for the strings, seen in green blocks)

I used the GPIO assignments from the "RGBcape48F" schematic.  I'm not trying to do anything exotic, just wanted connections and form factor to be exactly what our application needed.  When we use the Falcon pixel controllers, all of the power distribution is wasted space / cost since we have to implement a completely different method of power distribution external to the board.

What I was trying to explain in the first post was that it seems like the delays in the middle of the data only seem to cause a problem when the signal integrity is getting near the edge of usable because of excess wire length.  Basically I'm finding that the Falcon pixel controllers, as well as several of my other projects that run pixels with the same exact driver circuits, "Go way further" than I can get the BBB signals to go.  When I have the scope attached to the data, I can see the delays showing up at the same time as the flicker type glitches on the pixels.

Maybe I'm missing something else.  Thank you for taking the time to help me.  I'm open to any suggestions.  I'm working on documenting the signals from each type of source to better understand what else might be limiting the distance.
I am building pixel interface capes for BBB - http://www.ledpixeldriver.com

fauxton

pixelpuppy, 

Not all on one string :)  8 strings of roughly 120 pixels (depending on application)
I am building pixel interface capes for BBB - http://www.ledpixeldriver.com

dkulp

If I were you, I'd redesign the board to avoid all the GPIO0 pins.   I don't believe there are 32 of them on P8, but there are a bunch more on P9 that you can pick up to fill.  The code has to do some strange things if GPIO0 is used due to the delays it causes so avoiding it can definitely help.

In anycase, for longer distance runs, we always recommend having f-amps (or null pixels) on hand which would reshape the signals.   I generally recommend using one if more than 15-20ft, but even that depends on the wire that's used.    That's likely why it works for most people.
Daniel Kulp - https://kulplights.com

fauxton

Can you point me to where I can find the "definitions" as in the GPIO assignment for the "cape type" selections?

I assume I need to decide on a cape type that avoids GPIO0 pins and then use the GPIO pin/channel assignment from that board on my board.  I don't necessarily need 32 channels so being restricted from a few is no problem.  8 or 16 would be plenty of channels.  I'm curious to see if this solves the little delays I'm seeing.

As the hardware guy in my group, I'm pretty new to the whole FPP system.  The software guys have been using the falcon pixel controllers for years so they have always handled the configuration.  I don't know if there is a way to set my own GPIO choices or if it's just picking a predefined cape type.

Thanks again for all your help
I am building pixel interface capes for BBB - http://www.ledpixeldriver.com

dkulp

They are in JSON files in:
https://github.com/FalconChristmas/fpp/tree/master/capes/bbb/strings

I'd suggest the F8-B-20-v2 if you need <=20.     The first 12 are all on GPIO2 so you can have 12 strings with only the single GPIO register needing updating.    The next 8 are on GPIO3.  Again, avoids GPIO0. 

If you add an i2c eeprom to the board, you can have a custom mapping.  A bit more complex though and obviously some added cost.
Daniel Kulp - https://kulplights.com

fauxton

Perfect, thank you!


For comparison I recorded some video of the data in real-time when using the 48 channel cape type.  I set up channel 1 for just two pixels so that it's easy to trigger on the same spot of the data.  Using GPIO2-3 on P8,8

All zeros




All ones




Then I selected the F8-B (no serial) V2 cape which shows 20 channels in the output configuration.  Fortunately, channel one for this cape is connected to channel 8 on my prototype so I could test very easily.  Using GPIO2-24 on P8,28

All zeros




All ones







You're a legend, mate.  This is exactly the help I was hoping I could find!  The signal is solid now.
Thanks a million
I am building pixel interface capes for BBB - http://www.ledpixeldriver.com

yo3ham

Quote from: dkulp on December 02, 2021, 02:36:05 PMI'd suggest the F8-B-20-v2 if you need <=20.    The first 12 are all on GPIO2 so you can have 12 strings with only the single GPIO register needing updating.    The next 8 are on GPIO3.  Again, avoids GPIO0.
@dkulp : I just read now this post of 2021 while working on my DIY cape. Your conclusion is that it's better to avoid GPIO0 pins if using a longer cable/more timing-sensitive setup.
Is this also valid for FPP 5.5 or has the code changed?
What about FPP6.0 ?


CaptainMurdoch

The GPIO0 issue is a hardware limitation on the BBB.  Dan has some workarounds in place but the performance is better if you use the other GPIO registers and void GPIO0.  There are several other threads on here where the topic has come up, so you might be able to find more info by searching the forum for GPIO0.
-
Chris

zmatt

Quote from: dkulp on December 01, 2021, 02:54:55 PMGPIO0 hangs off the slower L4 bus instead of the L3 bus
Just a minor correction: none of the GPIO controllers are directly connected to the L3 interconnect, they all connect to L4 interconnects.

GPIO1/2/3 are attached to L4_PER, which is the main L4 interconnect where the majority of peripherals reside.

GPIO0 is attached to L4_WKUP, the L4 interconnect of the wakeup power domain, along with other peripherals that are used for power management or can be used as wakeup source in deep-sleep.

Both of these L4 interconnects hang off the L3 interconnect (specifically the low-speed part thereof, called L3S), though with two important differences:

First, the latency to L4_WKUP is just higher than to L4_PER, perhaps due to the power domain crossing. It's a bit tricky to determine latency of writes, but read requests to the L4_WKUP interconnect appear to be consistently 4 cycles slower than to L4_PER.

Second, there are four connections from L3S to L4_PER:
  • port 0 used by requests from the Cortex-A8
  • port 1 used by requests from PRUSS
  • port 2 used by requests from EDMA transfer controllers 0 and 1
  • port 3 used by requests from EDMA transfer controller 2 and for JTAG debugging
which allows for a certain amount of parallelism between requests from these different initiators to different peripherals on the L4_PER, and reduces jitter seen by PRUSS due to requests from other initiators. In contrast, there's only a single connection to the L4_WKUP, forcing all requests from all initiators to this interconnect to be serialized.

Support FPP

+- Recent Topics

FPP 6.x Debian MiniPC - Virtual Matrix issue by Sobal44
Today at 12:40:25 AM

Is a plugin approach suitable for GPU/framebuffer IO? by Cureck
November 28, 2022, 11:33:06 PM

FPP 6.x Matrix Issues by Poporacer
November 28, 2022, 08:32:23 PM

f48 and new differential receivers by yarhoward
November 28, 2022, 08:27:31 PM

RGBW on PiCap by EricD
November 28, 2022, 07:37:58 PM

DIY voucher request by scottn3xcc
November 28, 2022, 07:02:15 PM

Stuck on initial startup screen by haydenjensen1111
November 28, 2022, 07:01:11 PM

Wiring Diagram for the RJ45 port for DMX by K-State Fan
November 28, 2022, 06:49:08 PM

Uploading Models to FPP Remote by rudybuddy
November 28, 2022, 06:44:04 PM

Receiver Issues and Freezing up by rudybuddy
November 28, 2022, 06:31:24 PM

Powered by EzPortal
Powered by SMFPacks Menu Editor Mod