Software for the UUUSB board
(and CY7C68013 in general)


Introduction

Modern software is horrible.

If you give a software guy the task of adding 2 + 2, he will bring together a bunch of huge software toolboxes and libraries, write a few Java and Python scripts plus modify a few configuration files...
Then he'll present you with a 500MB monster that will take 3 seconds of crunching on a X GHz processor, and will produce a result of 3.85

...and he will be proud of it, because it is fully web enabled, symmetrically virtualized, object oriented and compliant with the latest client-server transaction model.

YUCK!

USB is a relatively recent standard, so it is quite complicated. I would prefer raw brute force bandwidth, but the world has developed in such a direction that you can't get the bandwidth without swallowing an overdose of unnecessary sophistication first.

USB was designed to support multiple devices on the same bus, where each device can have multiple endpoints, configurations, interfaces and alternate settings.
It also supports different types of data transfers (bulk, interrupt, isochronous and control).

USB is fully "Plug and Pray" capable. When a device is connected to the bus, the host dynamically assigns an address to it ("enumeration") and polls it about it's capabilities and resource requirements. Our device must of course be able to respond to the host and provide it with descriptors etc.

These "housekeeping" tasks are done over endpoint zero, which is the "control" endpoint, and serves for special control messages, the Standard Device Requests, as defined in the USB specification.

Besides the standard requests, the USB specification provides the possibility of "Vendor Requests".
The CY7C68013 implements two vendor requests, RAM download and RAM upload. These can be used to reset/unreset the device (by downloading to the CPUCS register), load the firmware etc.

The simplest way of using USB 2.0

As an old school engineer, I know that the best way is the simplest way, so the first thing I wanted to do with USB was to find out what is the simplest way of using it.

Luckily, the CY7C68013 has a lot of hidden built-in intelligence, which can take care of many USB chores behind the curtains. It can also provide other goodies like default descriptors, making its use relatively simple. It enters this mode of operation (the "Default USB Device") each time it wakes up from reset and doesn't find an serial EEPROM with a predefined signature on the I2C bus. It will then enumerate automatically and provide the host with descriptors for the default configuration, all without the help of firmware.

Originally, this mode is intended to do a Cypress patented process trademarked "ReNumeration", where the CY7C68013 disconnects from the USB bus and enumerates again under the control of the downloaded firmware.
However, the default device already provides a very nice configuration, so in most cases one can work without re-numerating. This way the firmware can be kept much simpler.

My first interest was how to read big quantities of data into the PC fast (bandwidth), for my SIDI project.

First, of course, I did a web search to find out what is out there, in the sense of simple USB usage.

Searching the web, by far the best thing I could find was the Volodya project (external link).
He wrote a nice program for playing with the CY7C68013, which uses the LIBUSB library.
He also provides examples of bulk reading through the FIFO port.

I have studied his programs, and then tried to simplify further.

I have managed to combine everything needed for bulk reading through the FIFO port into a single 77-line (including comments and empty lines) C program.
It includes the firmware for the 8051, the downloading routine, all the calls needed to initialize the USB system and CY7C68013, plus the data reading loop.

Of course, for real world usage, it makes a lot more sense to keep the firmware and general USB routines in separate files. This single file exercise was just a way to find out what is the minimum amount of software needed to do something useful with USB, and to have a simple starting point for further explorations of the CY7C68013.

It is also a good pedagogical tool to get an understanding of what is needed and how things work. So next, I will go through it line by line.

Single C file USB 2.0 bulk read

First you must get and install the LIBUSB library (external link). It is the only dependence of this program. I have tested it with LIBUSB version libusb-0.1.8-36 on SUSE 9.3 (Kernel 2.6.11.4-20a-default).

My single C source file is here: simple_prg_rd.c.
Compile it with:

gcc -lusb simple_prg_rd.c -o simple_prg_rd 

Your computer must have an USB 2.0 interface. This program uses 512 byte blocks, which are not supported under USB 1.1 (max 64 byte blocks).

The CY7C68013 should be the only device connected to the USB bus. It should be connected directly, without hubs.

There must be no other modules that recognize the CY7C68013 chip loaded on your system.
Some new distros have some such modules loaded by default, via hotplug.
Among the most popular modules, that grab the Cypress, is the USBTEST module, but there are others, for example for some webcams like dib3000mb, dvb-dibusb and similar.
If the uuusb example programs do not work, check with the lsmod command.
Another way to see if there are any interfering modules, is with the dmesg command, after plugging in the UUUSB board.
Plugging it in should just add the message:

usb 4-2: new high speed USB device using ehci_hcd and address 4

(the usb number and address will most probably differ)
If you see anything else, (especially if it mentions Cypress, EZ-USB or CY7C68013) find the offending modules with lsmod and then add them to the /etc/hotplug/blacklist file. This will stop the hotplug system from loading them.

To run the single C file USB bulk read program properly, you must provide a source of data to the CY7C68013 FIFO bus, like the Simple dual channel A/D system, otherwise the reading call will time out in one second, you will get a bunch of zeros, and the ninth status number will be something negative instead of 512.

WARNING! This program can not be used with the unmodified Trust camera module, because it has the "*SLOE" pin connected to ground. This will cause the FIFO bus pins (ports B and D) to be outputs, causing output contention, which is potentially dangerous for hardware. Besides, its FIFOADR pins are grounded too, selecting FIFO 2, which is an output under the "default USB device".
To use the Trust camera module, you must modify it by raising both FIFOADR pins and the *SLOE pin to 3.3V (pins 42, 44 and 45).
Instead of rising the SLOE pin, it is also possible to change its polarity in firmware, by setting bit 4 in the FIFOPINPOLAR register, like:
0x90, 0xE6, 0x09, 0x74, 0x10, 0xF0,	//FIFOPINPOLAR=0x10   TRUST!!!
and don't forget to increase the firmware array size by 6!
Also, when using the Trust camera module, don't forget to disable the onboard serial EEPROM, as described here, otherwise it won't report itself as an unconfigured FX2!

When the hardware is set up correctly and everything works well, the output of simple_prg_rd should look like this:

mc@mcpc11/usbtest> ./simple_prg_rd
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44

 status:  0 4 7 1 12 1 0 0 512 0

mc@mcpc11/usbtest>

the "matrix" is data read from the FIFO (of course, the actual numbers read from the FIFO will very probably be different), and the "status" numbers below are the error returns from various LIBUSB calls (negative values mean errors).

Program description:

Firmware:

The C program contains the firmware in the form of hex constants (8051 machine language) in the declarations:

unsigned char firmware[60]=
			{0x90, 0xE6, 0x0B, 0x74, 0x03, 0xF0,	//REVCTL=0x03
			0x90, 0xE6, 0x04, 0x74, 0x80, 0xF0,	//FIFORESET=0x80
			0x74, 0x08, 0xF0,			//FIFORESET=0x08
			0xE4, 0xF0,				//FIFORESET=0x00
			0x90, 0xE6, 0x01, 0x74, 0xCB, 0xF0,	//IFCONFIG=0xCB
			0x90, 0xE6, 0x1B, 0x74, 0x0D, 0xF0,	//EP8FIFOCFG=0x0D
			0x80, 0xFE};				//while (1) {}

This is the smallest firmware I could develop. Besides using the "default USB device", it takes advantage of the fact, that in high speed FIFO transfers, the 8051 needs not participate.
It just needs to properly setup the CY7C68013 control registers and then do nothing (idle loop).

The CY7C68013 has a lot of control registers (>200!!), but luckily most of them can be left with their default values.

To do asynchronous BULK reading with the "default USB device", alt interface 1, on endpoints 6 or 8, only four registers have to be written.

Some of the registers require a "sync delay" after write (TRM page 15-105), so depending on your CPU and IFCLK frequencies, you might need to put NOPs (0x00) after each register write.
Here, we are running the defaults, CPU 12MHz and IFCLK 48MHz, so the required delay is the smallest, only 2 CPU cycles. Therefore, a single NOP is only needed on two occasions (before writing 0x00 to FIFORESET).

The EZ-USB FX2 Technical Reference Manual v2.2 says (page 9-19 bottom) you must write 0x03 to the REVCTL register, so this is done first.

The sequence of values poked into the FIFORESET register next, is the FIFO reset sequence, as described in the same manual, page 15-20. It serves to put the FIFO system into a known initial state.

Writing 0xCB into IFCONFIG (default=0xC0) sets ASYNC mode (Bit3=1) and SLAVE FIFO mode (Bits1,0=11).

Writing 0x0D into EP8FIFOCFG (default=0x05) sets AUTOIN (Bit 3=1).
AUTOIN means that after receiving the number of bytes specified in the EPxAUTOLENH,L registers (default=512), the buffer is automatically (without firmware intervention) committed to the USB (sent to the host PC).

The last line of the firmware is a simple endless loop. After setting up the registers, the 8051 is not needed anymore in FIFO transfers.

PC side software:

It initializes the USB system, finds the Cypress device, downloads and starts the firmware, and then reads some data from the FIFO port. (In the following description, I have skipped obvious things like variable declarations etc.)

The program starts with

usb_init();
er[2]=usb_find_busses();
er[3]=usb_find_devices();

which gets the USB system running, and looks for connected devices.
Then we must find the device with vendor id 0x4b4 (cypress) and product id 0x8613 (the CY7C68013):

p=usb_busses;
while(p!=NULL)
	{q=p->devices;
	while(q!=NULL)
		{if ((q->descriptor.idVendor==0x4b4)&&(q->descriptor.idProduct==0x8613))
			current_device=q;
		q=q->next;}
	p=p->next;}
fflush(stdout);

next we open this device, and get a handle which will be used to reference the CY7C68013 from now on:

current_handle=usb_open(current_device);

then, by writing 0x01 into the CPUCS register, we remotely send CY7C68013 into reset, to prepare it for firmware download. This is done by sending a control message with request type 0x40 (vendor request, OUT), request number 0xA0 (firmware load), address 0xE600 (the CPUCS register), index 0 (has no function here), value reset (pointer to a char value of 1), length 1, timeout 1000ms:

er[4]=usb_control_msg(current_handle, 0x40, 0xa0, 0xE600, 0, reset, 1, 1000);     //RESET
sleep(0.1);

Firmware download follows (in 16 byte chunks), using the same type of request:

for(i=0;i<60;i+=16)		//LOAD FIRMWARE
	{tlen=60-i;
	if(tlen>16) tlen=16;
	er[5]=usb_control_msg(current_handle, 0x40, 0xa0, i, 0, firmware+i, tlen, 1000);}

Now we must take CY7C68013 out of reset, to start the firmware:

er[6]=usb_control_msg(current_handle, 0x40, 0xa0, 0xE600, 0, reset+1, 1, 1000);   //UNRESET
sleep(0.1);

After that we must claim the interface zero:

er[7]=usb_claim_interface(current_handle, 0);

The CY7C68013 default USB device only contains one interface (interface 0) with four alternate settings. (TRM page 3-3)

Then we set the alternate setting to 1:

er[8]=usb_set_altinterface(current_handle, 1);

because on the CY7C68013 default USB device, alternate setting 1 has all of the FIFO endpoints set to BULK. Endpoints 2 and 4 are setup for output, endpoints 6 and 8 for input, with a 2 X 512 byte buffer each.

At this point, we are ready to read some real data!
So we read a block of 512 bytes and print it out:

er[9]=usb_bulk_read(current_handle, endpoint, buffer, 512, 1000);
for (i=0;i<512;i++) printf(" %02x", buffer[i]); printf("\n");

After we have finished, we must release the interface and close the device:

usb_release_interface(current_handle, 0);
usb_close(current_handle);

And last (not mandatory) we print out the error returns from LIBUSB calls, just to make it easier to find out what went wrong, if the results are not as expected:

printf("\n status: ");for (i=1;i<11;i++) printf (" %d",er[i]); printf("\n\n");

Bulk read with firmware in C

Like Volodya, I use the SDCC (external link). cross compiler to program the 8051 core inside the CY7C68013.
The SDCC is an overly complicated pain in the ass, for example I can't find the switch to turn off the optimization, which messes with delay loops and does other stupid things... (reaally, how could anybody come to the idea of putting optimization into a compiler for an lowest end 8bit microcontroller????) But because I don't reaally plan to do much 8051 programming, I don't want to waste time looking for something better. PHEW!! Just compare the compiled firmware size with that of the firmware in the "Single C file... !!

On the PC side, I took the downloading loop from Volodya's "fx2_programmer", and stripped it down to bare bones, to make the essential code stand out. In a working application, the error handling code should be present, of course.

To make programming simpler, a header file with the CY7C68013 register definitions is used. I've copied it from Volodya's site, and he adapted it from some Cypress file.

Funnily, I had to modify it further, changing the sfr definitions from
sfr IOB  = 0x90;
to
sfr at 0x90 IOB;
otherwise, in assignment statements like "a=IOA" I got register addresses instead of the contents...
The file is here: fx2regs2.h
I have added the "2" to the name to distinguish it from the original one.

Just place this file in the same directory as your firmware C source when compiling:
SDCC -mmcs51 xxxx.c
Now we can split the single file bulk read program into two, the host-side program simple_dnl_rd.c. and the firmware simple_dnl_fw.c.
Of course, the host-side program gets compiled by gcc and the firmware by sdcc!

Other firmware

Other firmware is still under construction. At least I plan to add firmware for I/O port access, I2C access and serial ports access.

For now I have a few (buggy) versions. If you find a bug, please give me a hint.

The ep1.c and ep1_fw.c are just some software to test data transfer over the "small" endpoint 1, which has no fifo, but is intended for 8051 access.
The firmware just returns a string of char values, incremented by 3. The host-side program sends the same string a few times, decrementing some values by one, to check that the data flows in both directions.
It is mainly useful as a means of checking if the UUUSB board is alive, without the need for additional hardware (the bulk read programs above need an external source of data).
If everything is working OK, the last part of the output should look like this:
 Before:  41 42 43 44 45

 After:  44 45 46 47 48     status = 5  5

 After:  46 47 48 4a 4b     status = 5  5

 After:  48 49 4a 4d 4e     status = 5  5

 After:  4a 4b 4c 50 51     status = 5  5

 After:  4c 4d 4e 53 54     status = 5  5

 After:  4e 4f 50 56 57     status = 5  5

 After:  50 51 52 59 5a     status = 5  5

 After:  52 53 54 5c 5d     status = 5  5

 After:  54 55 56 5f 60     status = 5  5

 After:  56 57 58 62 63     status = 5  5

These programs are mostly intended as an template / example for writing your own programs using EP1.

The ports.c and ports_fw.c should enable the use of port pins. It uses endpoint 1. The firmware has three functions: set the port directions (input or output), read all ports and write all ports.
To set port direction, the host sends a string of six bytes over EP1, the first one 0x01, and the next five the values for the OEx registers, ones meaning outputs. The firmware returns a string of five bytes, also over EP1, representing the values read from the OEx regs, just for check.
To read all ports, the host sends one byte, 0x02, and the firmware returns a string of five bytes, as read from the IOx registers.
To write to all ports, the host sends a string of six bytes, the first one 0x03, and the other five the values to be written into the IOx registers.

The PC side program just toggles ports B and D five times, so you can observe that with an multimeter, oscilloscope or LEDs.
Again, it is mostly intended to serve as an example.

The bw_meter.c and bw_meter_fw.c measures the available bandwidth of the bulk transfer. It depends on the "chunk" size, with 8192 byte chunks (16 512 byte packets) it reaches about 30MB/s. Smaller chunks give proportionally less. Watching te time between chunk arrivals, as you decrease chunk size, you can see that this interval won't go below 125us. (Does this have anything to do with the USB 2.0 microframes? no idea...) The realized bandwith therefore can not be more than 8000 * chunksize bytes per second.
The results of "bw_meter" are optimistic, because this firmware serves data on request from the host. That is, the firmware prepares a new data packet only after the host has read the previous one.
This is OK if you are simulating reading from a device like an Compact Flash card, where the data can wait for you, until the host is ready.
In real DSP life, the data from the A/D is coming in with a constant rate, regardless of whether the host is ready to read it or not, so real life BW will be lower - or some data will get lost, because the FIFO on the Cypress will overflow, if the host does not read it in time.

To try to simulate a realistic scenario, where the data is coming in at a constant rate, I wrote bw_real.c and bw_real_fw.c
Here, the firmware tries to send data packets at regular intervals. If the FIFO is not ready, it counts time until it's ready.
The host-side program measures the percentage of lost packets.
The desired data rate can be set with the value of the "del" variable in the firware.

Pavle S57RA has modified the bw_meter program so that it can be compiled under either linux or windows: bw_meter_v1.c
It uses the same firmware as "bw_meter" above.
Before compiling, comment out the inappropriate #define, according to the platform under which you will compile, for example to compile under windows:

#define win32 1
//#define linux 1

I do not plan to make other programs windows-compatible, but this one can be used as an example how to do it.

Bandwidth issues

As I have already mentioned several times, the biggest motivation to sink my teeth into the USB mess, was bandwidh hunger.
At first sight, everything seems fine, on my 3GHz Pentium IV machines, the "bw_meter" (see above) routinely measures an average of about 30MB/s, which is similar to what the USRP guys report.
However, that is only part of the story.

If you check the output of the bw_meter program, among other output, you can see something like this:
  Time between chunks, us:
  499  500  625  501  499  501  499  501  499  501  500  500  500  499  500  503
  877 4498  497  501  500  500  500  500  500  500  500  500  500  500  500  879
  495  500  501  500  500  500  626  499  751  499  499  501  500  500  500  500
  500  500  500  500  500  634  492  499  500  500  500  501  499  501  500  500
  500  500  500  500  500  499  500  500  500  500  500  501  500  625  500  500
  500  499  500  500  500  500  500  501  500  500  500  625  500  500  500  499
  501  500  500  878  497  500  500  626  499  509  491  500  500  559  566  625
  500  500  500  500  500  501  500  499  500  837  538  500  500  500  501  500
  500  500  500  500  499  500  500  500  626  500  500  500  500  500  500  500
  752  497  500  501  499  501  500  500  500  500  500  500  500  501  498  501
  500  500  500  500  500  500  500  501  625  499  499  501  499  501  500  500
  500  500  500  501  500  500  500  507  492  500  500  500  501  499  500  627
  498  501  500  499  500  500  500
These are the times between the usb_bulk_read() calls. The above example is for an average rate of 30MB/s and a chunk size of 8192.

Most of the time, we get about the expected value of chunksize/bitrate, but on a few occasions, the time is longer! (the bw_meter will report the longest wait it has encountered, and the associated maximum possible "lossless" data rate as the "worst case")

Obviously, this is the result of working on an multitasking system, which sometimes has "other things to do"!

When reading data from a device that can serve it "on demand", like a memory card, this is no problem. However, my main interest are applications, where the data comes in at a fixed rate, and will be lost, if not read in time.
With double buffering on the Cypress, when the delay is twice the expected value, (or four times with quadruple buffering), data loss will occur.

Many applications, like broadcast HDTV reception etc., where the output is intended for "human consumption", are not very critical about this, as most people will tolerate a split second of blocky picture.
But I intend to use this for radio interferometry, where every skipped bit can screw up the time alignment needed for correlation - so data loss must be prevented at any cost.

To avoid data loss, at a given data rate, only two things can be done: external FIFO buffering or reduction of dead times.
The bw_meter calculates the approximate external buffer needed to prevent data loss in the case of the longest delay it has encountered. This is an optimistic (small) value, because bw_meter has most probably not caught the longest possible delay on the system! (also, if two long delays occur close together, you will need double the buffer - but luckily, the long delays seem to be comfortably spaced...)

External buffering adds cost and complexity, so I would like to reduce the need for it as far as possible. This means finding ways of reducing the maximum delays, caused by task switching.

I haven't yet come very far in this respect. I have noticed that a lot of X activity (like scrolling a window) will certainly make things worse.
I have tried various "runlevels" (init 3 and init S commands as root), and there is some improvement, but long dead times still occur now and then.
The bw_meter also tries to pump up its priority (must be run from root to do that), and again, there is some improvement, but long dead times still occur now and then.

So I will have to do more experiments, maybe compile the kernel with a higher hearbeat rate, or use one of the "low latency" and "real time" kernels...

Stay tuned....

Up to S57UUU Home Page


Copyright info