Software for the UUUSB board
(and CY7C68013 in general)
Introduction
Modern software is horrible.
If you give a software guy the task of adding 2 + 2, he will bring
together a bunch of huge software toolboxes and libraries, write a few Java
and Python scripts plus modify a few configuration files...
Then he'll present you with a 500MB monster that will take 3 seconds
of crunching on a X GHz processor, and will produce a result of 3.85
...and he will be proud of it, because it is fully web enabled, symmetrically
virtualized, object oriented and compliant with the latest client-server
transaction model.
YUCK!
USB is a relatively recent standard, so it is quite complicated.
I would prefer raw brute force bandwidth, but the world has developed
in such a direction that you can't get the bandwidth without swallowing
an overdose of unnecessary sophistication first.
USB was designed to support multiple devices on the same bus, where each
device can have multiple endpoints, configurations, interfaces and alternate
settings.
It also supports different types of data transfers (bulk, interrupt,
isochronous and control).
USB is fully "Plug and Pray" capable. When a device is connected to the bus,
the host dynamically assigns an address to it ("enumeration") and polls it
about it's capabilities and resource requirements. Our device must of course
be able to respond to the host and provide it with descriptors etc.
These "housekeeping" tasks are done over endpoint zero, which is the "control"
endpoint, and serves for special control messages, the Standard Device Requests,
as defined in the USB specification.
Besides the standard requests, the USB specification provides the possibility
of "Vendor Requests".
The CY7C68013 implements two vendor requests, RAM download and RAM upload.
These can be used to reset/unreset the device (by downloading to the CPUCS
register), load the firmware etc.
The simplest way of using USB 2.0
As an old school engineer, I know that the best way is the simplest way,
so the first thing I wanted to do with USB was to find out what is the
simplest way of using it.
Luckily, the CY7C68013 has a lot of hidden built-in intelligence,
which can take care of many USB chores behind the curtains. It can also
provide other goodies like default descriptors, making its use relatively
simple. It enters this mode of operation (the "Default USB Device") each
time it wakes up from reset and doesn't find an serial EEPROM with a
predefined signature on the I2C bus. It will then enumerate automatically
and provide the host with descriptors for the default configuration, all
without the help of firmware.
Originally, this mode is intended to do a Cypress patented process
trademarked "ReNumeration", where the CY7C68013 disconnects from
the USB bus and enumerates again under the control of the downloaded
firmware.
However, the default device already provides a very nice configuration,
so in most cases one can work without re-numerating. This way the
firmware can be kept much simpler.
My first interest was how to read big quantities of data
into the PC fast (bandwidth), for my
SIDI project.
First, of course, I did a web search to find out what is out there,
in the sense of simple USB usage.
Searching the web, by far the best thing I could find was the
Volodya project (external link).
He wrote a nice program for playing with the CY7C68013,
which uses the LIBUSB library.
He also provides examples of bulk reading through the FIFO port.
I have studied his programs, and then tried to simplify further.
I have managed to combine everything needed for bulk reading through
the FIFO port into a single 77-line (including comments and empty lines) C program.
It includes the firmware for the 8051, the downloading routine,
all the calls needed to initialize the USB system and CY7C68013, plus
the data reading loop.
Of course, for real world usage, it makes a lot more sense to keep
the firmware and general USB routines in separate files.
This single file exercise was just a way to find out what is the
minimum amount of software needed to do something useful with USB,
and to have a simple starting point for further explorations of the
CY7C68013.
It is also a good pedagogical tool to get an understanding of
what is needed and how things work. So next, I will go through it
line by line.
Single C file USB 2.0 bulk read
First you must get and install the
LIBUSB library (external link).
It is the only dependence of this program.
I have tested it with LIBUSB version libusb-0.1.8-36 on SUSE 9.3 (Kernel 2.6.11.4-20a-default).
My single C source file is here:
simple_prg_rd.c.
Compile it with:
gcc -lusb simple_prg_rd.c -o simple_prg_rd
Your computer must have an USB 2.0 interface. This program
uses 512 byte blocks, which are not supported under USB 1.1
(max 64 byte blocks).
The CY7C68013 should be the only device connected to the USB
bus. It should be connected directly, without hubs.
There must be no other modules that recognize the CY7C68013 chip
loaded on your system.
Some new distros have some such modules loaded by default, via hotplug.
Among the most popular modules, that grab the Cypress, is the USBTEST module,
but there are others, for example for some webcams like dib3000mb, dvb-dibusb
and similar.
If the uuusb example programs do not work, check with the lsmod command.
Another way to see if there are any interfering modules, is with the
dmesg command, after plugging in the UUUSB board.
Plugging it in should just add the message:
usb 4-2: new high speed USB device using ehci_hcd and address 4
(the usb number and address will most probably differ)
If you see anything else, (especially if it mentions Cypress, EZ-USB or CY7C68013)
find the offending modules with lsmod and then add them to the /etc/hotplug/blacklist file.
This will stop the hotplug system from loading them.
To run the single C file USB bulk read program properly, you must provide a source of data to the
CY7C68013 FIFO bus, like the
Simple dual channel A/D system,
otherwise the reading call will time out in one second, you will get
a bunch of zeros, and
the ninth status number will be something negative instead of 512.
WARNING! This program can not be used with the unmodified Trust
camera module, because it has the "*SLOE" pin connected to ground.
This will cause the FIFO bus pins (ports B and D) to be outputs,
causing output contention, which is potentially dangerous for hardware.
Besides, its FIFOADR pins are grounded too, selecting FIFO 2, which is
an output under the "default USB device".
To use the Trust camera module, you must modify it by raising both FIFOADR
pins and the *SLOE pin to 3.3V (pins 42, 44 and 45).
Instead of rising the SLOE pin, it is also possible to change its polarity
in firmware, by setting bit 4 in the FIFOPINPOLAR register, like:
0x90, 0xE6, 0x09, 0x74, 0x10, 0xF0, //FIFOPINPOLAR=0x10 TRUST!!!
and don't forget to increase the firmware array size by 6!
Also, when using the Trust camera module, don't forget to disable the onboard
serial EEPROM, as described
here,
otherwise it won't report itself as an unconfigured FX2!
When the hardware is set up correctly and everything works well,
the output of simple_prg_rd should look like this:
mc@mcpc11/usbtest> ./simple_prg_rd
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07
44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44
07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 07 44 0
7 44 07 44 07 44
status: 0 4 7 1 12 1 0 0 512 0
mc@mcpc11/usbtest>
the "matrix" is data read from the FIFO (of course, the actual numbers read from the FIFO will very probably be different), and the "status" numbers below
are the error returns from various LIBUSB calls (negative values mean errors).
Program description:
Firmware:
The C program contains the firmware in the form of hex constants
(8051 machine language) in the declarations:
unsigned char firmware[60]=
{0x90, 0xE6, 0x0B, 0x74, 0x03, 0xF0, //REVCTL=0x03
0x90, 0xE6, 0x04, 0x74, 0x80, 0xF0, //FIFORESET=0x80
0x74, 0x08, 0xF0, //FIFORESET=0x08
0xE4, 0xF0, //FIFORESET=0x00
0x90, 0xE6, 0x01, 0x74, 0xCB, 0xF0, //IFCONFIG=0xCB
0x90, 0xE6, 0x1B, 0x74, 0x0D, 0xF0, //EP8FIFOCFG=0x0D
0x80, 0xFE}; //while (1) {}
This is the smallest firmware I could develop. Besides using the "default
USB device", it takes advantage of the fact, that in high speed FIFO
transfers, the 8051 needs not participate.
It just needs to properly setup the CY7C68013 control registers and then do
nothing (idle loop).
The CY7C68013 has a lot of control registers (>200!!), but luckily most of
them can be left with their default values.
To do asynchronous BULK reading with the "default USB device", alt interface 1,
on endpoints 6 or 8, only four registers have to be written.
Some of the registers require a "sync delay" after write (TRM page 15-105),
so depending on your CPU and IFCLK frequencies, you might need to put
NOPs (0x00) after each register write.
Here, we are running the defaults, CPU 12MHz and IFCLK 48MHz, so the required
delay is the smallest, only 2 CPU cycles. Therefore, a single NOP is only
needed on two occasions (before writing 0x00 to FIFORESET).
The EZ-USB FX2 Technical Reference Manual v2.2 says (page 9-19 bottom) you
must write 0x03 to the REVCTL register, so this is done first.
The sequence of values poked into the FIFORESET register next, is
the FIFO reset sequence, as described in the same manual, page 15-20. It serves
to put the FIFO system into a known initial state.
Writing 0xCB into IFCONFIG (default=0xC0) sets ASYNC mode (Bit3=1) and
SLAVE FIFO mode (Bits1,0=11).
Writing 0x0D into EP8FIFOCFG (default=0x05) sets AUTOIN (Bit 3=1).
AUTOIN means that after receiving the number of bytes specified in the
EPxAUTOLENH,L registers (default=512), the buffer is automatically (without firmware intervention) committed to the USB (sent to the host PC).
The last line of the firmware is a simple endless loop. After setting up the registers,
the 8051 is not needed anymore in FIFO transfers.
PC side software:
It initializes the USB system, finds the Cypress device,
downloads and starts the firmware, and then reads some data
from the FIFO port. (In the following description, I have skipped obvious
things like variable declarations etc.)
The program starts with
usb_init();
er[2]=usb_find_busses();
er[3]=usb_find_devices();
which gets the USB system running, and looks for connected devices.
Then we must find the device with vendor id 0x4b4 (cypress)
and product id 0x8613 (the CY7C68013):
p=usb_busses;
while(p!=NULL)
{q=p->devices;
while(q!=NULL)
{if ((q->descriptor.idVendor==0x4b4)&&(q->descriptor.idProduct==0x8613))
current_device=q;
q=q->next;}
p=p->next;}
fflush(stdout);
next we open this device, and get a handle which will be used
to reference the CY7C68013 from now on:
current_handle=usb_open(current_device);
then, by writing 0x01 into the CPUCS register, we remotely send CY7C68013 into reset, to prepare it
for firmware download. This is done by sending a control message with
request type 0x40 (vendor request, OUT), request number 0xA0 (firmware load),
address 0xE600 (the CPUCS register), index 0 (has no function here), value reset
(pointer to a char value of 1), length 1, timeout 1000ms:
er[4]=usb_control_msg(current_handle, 0x40, 0xa0, 0xE600, 0, reset, 1, 1000); //RESET
sleep(0.1);
Firmware download follows (in 16 byte chunks), using the same type of request:
for(i=0;i<60;i+=16) //LOAD FIRMWARE
{tlen=60-i;
if(tlen>16) tlen=16;
er[5]=usb_control_msg(current_handle, 0x40, 0xa0, i, 0, firmware+i, tlen, 1000);}
Now we must take CY7C68013 out of reset, to start the firmware:
er[6]=usb_control_msg(current_handle, 0x40, 0xa0, 0xE600, 0, reset+1, 1, 1000); //UNRESET
sleep(0.1);
After that we must claim the interface zero:
er[7]=usb_claim_interface(current_handle, 0);
The CY7C68013 default USB device only contains one interface
(interface 0) with four alternate settings. (TRM page 3-3)
Then we set the alternate setting to 1:
er[8]=usb_set_altinterface(current_handle, 1);
because on the CY7C68013 default USB device, alternate setting 1
has all of the FIFO endpoints set to BULK. Endpoints 2 and 4 are setup for output,
endpoints 6 and 8 for input, with a 2 X 512 byte buffer each.
At this point, we are ready to read some real data!
So we read a block of 512 bytes and print it out:
er[9]=usb_bulk_read(current_handle, endpoint, buffer, 512, 1000);
for (i=0;i<512;i++) printf(" %02x", buffer[i]); printf("\n");
After we have finished, we must release the interface and close
the device:
usb_release_interface(current_handle, 0);
usb_close(current_handle);
And last (not mandatory) we print out the error returns from
LIBUSB calls, just to make it easier to find out what went
wrong, if the results are not as expected:
printf("\n status: ");for (i=1;i<11;i++) printf (" %d",er[i]); printf("\n\n");
Bulk read with firmware in C
Like Volodya, I use the
SDCC (external link). cross compiler
to program the 8051 core inside the CY7C68013.
The SDCC is an overly complicated pain in the ass, for example I can't find the
switch to turn off the optimization, which messes with delay loops and does other
stupid things... (reaally, how could anybody come to the idea of putting optimization
into a compiler for an lowest end 8bit microcontroller????)
But because I don't reaally plan to do much 8051 programming, I don't want
to waste time looking for something better. PHEW!! Just compare the compiled firmware
size with that of the firmware in the "Single C file... !!
On the PC side, I took the downloading loop from Volodya's "fx2_programmer", and stripped it down to
bare bones, to make the essential code stand out. In a working application,
the error handling code should be present, of course.
To make programming simpler, a header file with the CY7C68013 register definitions
is used. I've copied it from Volodya's site, and he adapted it from some Cypress file.
Funnily, I had to modify it further, changing the sfr definitions from
sfr IOB = 0x90;
to
sfr at 0x90 IOB;
otherwise, in assignment statements like "a=IOA" I got register addresses instead
of the contents...
The file is here:
fx2regs2.h
I have added the "2" to the name to distinguish it
from the original one.
Just place this file in the same directory as your firmware C source when compiling:
SDCC -mmcs51 xxxx.c
Now we can split the single file bulk read program into two, the host-side program
simple_dnl_rd.c.
and the firmware
simple_dnl_fw.c.
Of course, the host-side program gets compiled by gcc and the firmware by sdcc!
Other firmware
Other firmware is still under construction. At least I plan to add firmware
for I/O port access, I2C access and serial ports access.
For now I have a few (buggy) versions. If you find a bug, please give me a hint.
The
ep1.c
and
ep1_fw.c
are just some software to test data transfer over the "small" endpoint 1, which
has no fifo, but is intended for 8051 access.
The firmware just returns a string
of char values, incremented by 3. The host-side program sends the same string a
few times, decrementing some values by one, to check that the data flows in both
directions.
It is mainly useful as a means of checking if the UUUSB board is alive, without
the need for additional hardware (the bulk read programs above need an external
source of data).
If everything is working OK, the last part of the output should look like
this:
Before: 41 42 43 44 45
After: 44 45 46 47 48 status = 5 5
After: 46 47 48 4a 4b status = 5 5
After: 48 49 4a 4d 4e status = 5 5
After: 4a 4b 4c 50 51 status = 5 5
After: 4c 4d 4e 53 54 status = 5 5
After: 4e 4f 50 56 57 status = 5 5
After: 50 51 52 59 5a status = 5 5
After: 52 53 54 5c 5d status = 5 5
After: 54 55 56 5f 60 status = 5 5
After: 56 57 58 62 63 status = 5 5
These programs are mostly intended as an template / example for writing your own programs using EP1.
The
ports.c
and
ports_fw.c
should enable the use of port pins. It uses endpoint 1.
The firmware has three functions: set the port directions
(input or output), read all ports and write all ports.
To set port direction, the host sends a string of six bytes over EP1, the first one 0x01, and the
next five the values for the OEx registers, ones meaning outputs. The firmware
returns a string of five bytes, also over EP1, representing the values read from the OEx regs, just
for check.
To read all ports, the host sends one byte, 0x02, and the firmware returns a string
of five bytes, as read from the IOx registers.
To write to all ports, the host sends a string of six bytes, the first one 0x03, and the
other five the values to be written into the IOx registers.
The PC side program just toggles ports B and D five times, so you can
observe that with an multimeter, oscilloscope or LEDs.
Again, it is mostly intended to serve as an example.
The
bw_meter.c
and
bw_meter_fw.c
measures the available bandwidth of the bulk transfer. It depends on
the "chunk" size, with 8192 byte chunks (16 512 byte packets) it reaches
about 30MB/s. Smaller chunks give proportionally less. Watching te time
between chunk arrivals, as you decrease chunk size, you can see that this interval won't go below 125us. (Does this have anything to do with the USB 2.0 microframes? no idea...) The realized bandwith therefore can not be more than 8000 * chunksize bytes per second.
The results of "bw_meter" are optimistic,
because this firmware serves data on request from the host. That is, the firmware
prepares a new data packet only after the host has read the previous one.
This is OK if you are simulating reading from a device like an Compact Flash card,
where the data can wait for you, until the host is ready.
In real DSP life, the data from the A/D is coming in with a constant rate, regardless
of whether the host is ready to read it or not, so real life BW will be
lower - or some data will get lost, because the FIFO on the Cypress will overflow, if the host does not read it in time.
To try to simulate a realistic scenario, where the data is coming in
at a constant rate, I wrote
bw_real.c
and
bw_real_fw.c
Here, the firmware tries to send data packets at regular intervals.
If the FIFO is not ready, it counts time until it's ready.
The host-side program measures the percentage of lost packets.
The desired data rate can be set with the value of the "del" variable
in the firware.
Pavle S57RA has modified the bw_meter program so that it can be compiled
under either linux or windows:
bw_meter_v1.c
It uses the same firmware as "bw_meter" above.
Before compiling, comment out the inappropriate #define, according to the platform under which you will compile, for example to compile under windows:
#define win32 1
//#define linux 1
I do not plan to make other programs windows-compatible, but this one can
be used as an example how to do it.
Bandwidth issues
As I have already mentioned several times, the biggest motivation to
sink my teeth into the USB mess, was bandwidh hunger.
At first sight, everything seems fine, on my 3GHz Pentium IV machines,
the "bw_meter" (see above) routinely measures an average of about 30MB/s,
which is similar to what the USRP guys report.
However, that is only part of the story.
If you check the output of the bw_meter program, among other output, you can see
something like this:
Time between chunks, us:
499 500 625 501 499 501 499 501 499 501 500 500 500 499 500 503
877 4498 497 501 500 500 500 500 500 500 500 500 500 500 500 879
495 500 501 500 500 500 626 499 751 499 499 501 500 500 500 500
500 500 500 500 500 634 492 499 500 500 500 501 499 501 500 500
500 500 500 500 500 499 500 500 500 500 500 501 500 625 500 500
500 499 500 500 500 500 500 501 500 500 500 625 500 500 500 499
501 500 500 878 497 500 500 626 499 509 491 500 500 559 566 625
500 500 500 500 500 501 500 499 500 837 538 500 500 500 501 500
500 500 500 500 499 500 500 500 626 500 500 500 500 500 500 500
752 497 500 501 499 501 500 500 500 500 500 500 500 501 498 501
500 500 500 500 500 500 500 501 625 499 499 501 499 501 500 500
500 500 500 501 500 500 500 507 492 500 500 500 501 499 500 627
498 501 500 499 500 500 500
These are the times between the usb_bulk_read() calls. The above example is for
an average rate of 30MB/s and a chunk size of 8192.
Most of the time, we get about the expected value of chunksize/bitrate, but on a
few occasions, the time is longer! (the bw_meter will report the longest wait
it has encountered, and the associated maximum possible "lossless" data rate
as the "worst case")
Obviously, this is the result of working on an multitasking system, which
sometimes has "other things to do"!
When reading data from a device that can serve it "on demand", like a memory
card, this is no problem. However, my main interest are applications, where
the data comes in at a fixed rate, and will be lost, if not read in time.
With double buffering on the Cypress, when the delay is twice the expected
value, (or four times with quadruple buffering), data loss will occur.
Many applications, like broadcast HDTV reception etc., where the output
is intended for "human consumption", are not very critical about this, as
most people will tolerate a split second of blocky picture.
But I intend to use this for radio interferometry, where every skipped bit
can screw up the time alignment needed for correlation - so data loss must
be prevented at any cost.
To avoid data loss, at a given data rate,
only two things can be done: external FIFO buffering
or reduction of dead times.
The bw_meter calculates the approximate external buffer needed to prevent
data loss in the case of the longest delay it has encountered. This is
an optimistic (small) value, because bw_meter has most probably not caught
the longest possible delay on the system! (also, if two long delays occur
close together, you will need double the buffer - but luckily, the long
delays seem to be comfortably spaced...)
External buffering adds cost and complexity, so I would like to reduce the
need for it as far as possible. This means finding ways of reducing the
maximum delays, caused by task switching.
I haven't yet come very far in this respect. I have noticed that a lot of
X activity (like scrolling a window) will certainly make things worse.
I have tried various "runlevels" (init 3 and init S commands as root),
and there is some improvement, but long dead times still occur now and then.
The bw_meter also tries to pump up its priority (must be run from root
to do that), and again, there is some improvement, but long dead times still occur now and then.
So I will have to do more experiments, maybe compile the kernel with a higher
hearbeat rate, or use one of the "low latency" and "real time" kernels...
Stay tuned....
Up to S57UUU Home Page
Copyright info