Nicky Pages' Digital Solutions

NTSC, PAL & Interlace Explained

The Motion Picture Camera & CinemaMotion picture cameras are based on photographic film just like your everyday hand held photographic camera. Hollywood movies use 35mm film but professional camera men often use 16mm and the home enthusiast will usually be content with 8mm. To record a movie, motion picture film is spun around a big reel inside a camera and exposed 24 times a second. As a result it will capture 24 photographs, or what we call 24 "frames" every second (fps). Each frame is one complete photograph, it is not digitally stored or compressed - you could almost literally cut each one out and stick it in your family photo album if you wanted! Once the movie is made, the film is developed, placed onto a projector, and projected onto the cinema screen.

Resolution

In terms of resolution its not really possible to compare a 35mm film to a VHS or VGA resolution because, like any photographic film, its resolution is based on a myriad tiny light sensitive crystals embedded into the film. When these are struck by light they change colour to match the light that has hit them producing a photo. But a 35mm film, based on average crystal size would be about 5000 x 5000 pixels. This is also the resolution Photoshop artists such as Craig Mullins use to create movie backdrops for the cinema. Nevertheless, the human eye can barely see the equivalent of 3000 x 3000 pixels of such a small area. So when a 35mm movie is scanned into a computer to try and get its full resolution for digital editing, it will be scanned in at 4096 horizontal pixels, also known as 4K.

Television

Television, on the other hand, is a whole other ball of wax! As you probably know, a TV screen is basically a empty glass box (or tube) with all the air sucked out of it. Inside the front of this glass box it is covered with a mesh of red, green and blue phosphor dots. At the back of the tube it has three devices (called electron guns) that shoot three beam's of electricity at these phosphor dots. When the electricity hits the dots they glow and a colour picture is produced. Increase the beam strength and you can brighten the amount of red, green or blue light produced at any part of the screen. This, in effect, allows the colours to mix into just about any colour and brightness imaginable. You might compare this to mixing coloured paints together to form new colours. Whatever way you look at it, this produces a colour picture that looks almost like real life.

Interlace

Next is the important point! To produce a picture, these electric beam are controlled by electromagnets to scan from side to side across the TV screen (as illustrated in the picture below). The beam fly across the screen in the same motion our eyes use when we are reading a book. They start from the left, finish one line and then shoot back to start the next line.

When TV's were invented in the 1920s the type of phosphor used to produce the colours did not respond very fast. This meant it was impossible to get a picture in one shot; instead we would get a flickery strobing effect moving down the screen! To solve this they decided that instead of putting the lines on the screen one at a time (i.e. lines: 1, 2, 3, 4, 5) they would put them on every other line in one pass (i.e. 1, 3, 5, 7, 9) and then in-between the previous lines on the second pass (i.e. lines: 2, 4, 6, 8 etc.). This allowed a whole picture to be produced in two very fast scans and allowed enough time for the slow phosphor dots to recover. This, then, prevented any strobing effects from appearing - success! This process is called interlacing!

Resolution

An analog TV's resolution refers to the number of horizontal lines displayed on the screen. This is broken up into the active and non-active areas. The non-active or blanked (A) area is not used for the actual television picture and is basically always 'blanked'. The extra signal information that would have been put here is often used for closed captioning, synch info or other information such as VITC. But obviously the bit we are interested in is the active part which refers to where the actual picture will appear (B).

NTSC

The TV industry is dominated by two main standards for TV design: PAL and NTSC. NTSC is one of my pet hates basically because of it's rather low quality and use of weird framerates. NTSC stands for the National Television Systems Committee, it is the colour video standard used in North America, Canada, Mexico and Japan. Some engineers have said it should stand for Never Twice Same Color because no two NTSC pictures look alike :). Due to the electric system used in the US it was decided to scan the lines across the NTSC TV screen at about 60Hz (or 60 half frames per second) which produced 29.9700299700299700... frames every second. NTSC resolution is about one sixth less than that of PAL - about 89 lines less. This may not seem so bad, but divide a sheet of paper into six even parts and chop one off of the bottom and you will have a lot of detail lost. NTSC uses 525 scan lines, number 1 through 525, of which only 485 make up the active picture.

Scan line 1-20 are the vertical blanking level for field 1. Note: in NTSC (525 scan lines), “line 1” is defined as the first line in the field blanking period of field 1. NTSC line counting is one-based, not zero-based.

Scan line 21-263 are visible as field 1.
Scan line 264-282 are the vertical blanking level for field 2.
Scan line 283-525 are visible as field 2.
Scan line 263, the last scan line of field 1, is a half-line, which only appears on the left side of the screen, and on that side is the bottom-most scan line, below scan line 525.
Scan line 283, the first scan line of field 2, is a half-line, which only appears on the right side of the screen, above scan line 21.

So depending if one counts scan lines 263 and 283 integrally (as I do), or only as 0.5 of a scan line (as some people do), you either have 486 scan lines, or 485 scan lines.
Due to the variances in CRTs, some visible scan lines may be cropped on less expensive CRT monitors.

(Added by John “Eljay” Love-Jensen)

PAL

PAL stands for Phase Alternating Line, it is the TV standard used for Europe, Hong Kong and the Middle East. It was a new standard based on the old NTSC system but designed to correct the NTSC colour problems produced by phase errors in the transmission path. PAL resolution is 625 horizontal lines but only 576 of these are used for the picture. PAL is higher quality than NTSC, it keeps a sharper picture and remains closer to the original format produced by motion picture cameras. Due to the European electric standards it was decided to interlace PAL lines every other line at 50Hz producing 25 whole frames every second.

TELECINE

This is the bit you've all been dying to read. Unfortunately I have not written this with a bunch of amazing solutions in mind. The idea is more to help you understand what is going on with your video so you can decide how you will process it better.

Just so you don't get confused you should be clear on what the difference is between a frame and field. A 'field' is basically every other scan line of a picture. Two fields stuck together makes a single frame on a TV set! In the picture below only one field is displayed on the left. Its hard to see because only every other line is displayed. The picture on the right is a whole frame. It is produced when we stick both fields together.

THIS IS ONE FIELD THIS IS TWO FIELDS

(OR A HALF FRAME) (OR A WHOLE FRAME)

Single fields that start from line 1 of the TV screen are called 'odd' because they go in odd numbers (i.e. 1, 3, 5 etc.). Fields that start from the second line to fill the gaps of the first are called 'even' because they go in even numbers (i.e. 2, 4, 6 etc.). Fields that start from line 1 are more often also called "Top" fields because they start from the first "top" line on the screen. Whereas single fields that start from the second line down are called "Bottom" fields. Okay, now everything you read should make perfect sense =)

TELECINE

As I have already mentioned, a motion picture camera captures its images at 24 frames every second. Each frame is a full image. An NTSC television, however, must play 30 frames per second, and these frames must be interlaced into two fields both top and bottom! So basically what we are saying is we must play 60 half frames (or fields) every second. The only way we are going to be able to play a 24 fps motion picture on NTSC television is to change it from 24 fps to 30 fps and interlace these frames into two fields making 60 half frames per second. This transformation process is done with a machine called a Telecine. A Telecine machine does something called pulldown, which, in its simplest explanation, "pulls down" an extra frame every fourth frame to make five whole frames instead of four!

3:2 Pulldown NTSC

3:2 pulldown is a name that confuses people basically because the term "pulldown" is rather ambiguous - in other words, it's not really pulling down anything! The process sounds complex but its really quite straightforward and I have designed a picture to illustrate. The top row in the picture below represents four frames from a motion picture camera. These are full frames and not yet interlaced they are represented as A, B, C, D.

Now look at the second line in our picture below. The Telecine machine takes the first whole frame A and splits it into three fields (stop reading and take a look now). For the first field it uses the top field (T) which means it takes lines 1, 3, 5 etc., from the original digitized picture. The next field taken from A is the bottom field, so it will take lines 2, 4, 6 and so on. The third file we see labeled (Tr) is just a copy of the first field again (so I labeled it T(r) to mean: top repeated').

Now the Telecine machine goes onto the next frame B. This time it just takes the top and bottom fields. Then we move on to the third frame C; it splits it up into three fields, bottom (B) top (T) and a repeat of the bottom one again (Br). Finally, the forth frame D is split into the top and bottom fields. Thats it, that is all a telecine machine does!

In short, this results in a field order of 3 fields, 2 fields, 3 fields, 2 fields! Or, if its easier to understand, our picture above shows it as: 3 yellow, 2 green, 3 blue, 2 red.

So that is why it is called 3:2 pulldown, it goes in a sequence of 3, 2, 3, 2 and so on. It can be said to "pull down" a whole frame and split it into three fields and two fields. Finally, after the Telecine machine has finished the forth frame D, it will start the process all over again with the next four pictures of the movie.

In short we end up with:

At Ab At / Bb Bt / Cb Ct CB / Dt Db

But because it always goes: top, bottom, top, bottom, top, bottom etc., we would just say it without indicating top or bottom fields. So instead of the above we would describe it as:

AAA BB CCC DD

Whatever way you look at it in the end you end up with 5 whole frames instead of 4. This turns a 24 fps movie into a 30 fps movie!

Interlacing the picture back together

Lets look at the picture again. Look at the third line down. Here we can see how these fields would be woven back together to produce a whole picture again, as we would see on a TV or computer screen. The top field of frame A is woven together with the bottom field of frame A. Then the repeated top field of frame A is woven together with the bottom field of frame B. The top field of frame B is woven with the bottom of frame C. The top field of frame C is woven together with the top repeated field of frame C. And finally, the top field of frame D is woven together with the bottom field of frame D.

That's quite a mouthful to explain in words but examine the picture, it should really explain itself. Since each frame is stuck together instead of describing telecine by saying it uses top, bottom, top, bottom in the order:

AAA BB CCC DD

We would say:

AA AB BC CC DD

The change is only how we group the letters of course and means nothing more.

A Weird Framerate

This is not quite the end of the saga. The old black and white TV's used to play back at a perfectly round 30 fps. But as usual NTSC found a way to destroy that perfection! With the introduction of color TV it was decided (because of technical reasons) that the movie must be played back at 29.9700299700299700... fps (59.94Hz) which is basically only 99.9% of its full speed. As a result NTSC movies still have the same amount of frames they did when they were telecined, but they are played back at a fractionally slower rate.

2:2 Pulldown PAL

PAL movies also get telecined but not in the same way an NTSC movie does. A Telecine machine will use what is sometimes called 2:2 Pulldown! This basically turns every frame into two fields so they can me played on a standard PAL television. This makes 25 frames into 50 field which when played on a TV set at 50Hz will produce 25 whole frames per second. So instead of going 3, 2, 3, 2, 3, 2 it will go 2, 2, 2, 2, 2, 2! This produces the fields:

At AB / Bt Bb / CT CB / DT dB

Or just:

AA BB CC DD

Again, a PAL movie will contain all the frames from a 24fps film with no additional ones, but it will still play those frames back faster at 25 fps. In a way of speaking it is just as correct (or wrong) to say that a PAL movie is 24 fps because no frames have been added to it, they are just played back faster.

INVERSE TELECINE (IVTC)

I think I'm correct in saying that there is no such thing as an Inverse Telecine machine :). But, as the name suggests, inverse telecine is a process that turns a 30 fps movie back into a 24 fps movie. Basically what it does is take out all those extra fields that were added to the movie to make it 30fps. Its about now that I start spluttering because this is an awkward subject and I can't find any information on exactly how Inverse Telecine is performed! So instead I will describe what "looks" like should be done based on how it was telecined in the first place.

Lets go back to our picture! As you can see from the second row down, to turn the 24fps movie into 30fps we have to separated the pictures into 10 single fields (or half frames) by adding two fields that shouldn't normally be there. Counting from left to right, all we would need to do to turn or 10 fields back into 8 fields (to turn 30 fps into 24fps) is to delete fields 3 and 8. Remember we are talking fields here not frames.

But taking out fields 3 and 8 would produce a movie that had a field order of: top, bottom, bottom, top, bottom, top, top, bottom! Since you cannot weave together two bottom fields or two top fields we would need to swap them around. So imagine the order of the numbers as:

1, 2 3, 4 5, 6 7, 8

T, B B, T B, T T, B

To get the correct order we must change them to:

1, 2 4, 3 6, 5 7, 8

T, B T, B T, B T, B

Which gives us an order of: 1, 2, 4, 3, 6, 5, 7, 8 which should theoretically fix everything.

The Framerate Mystery Unraveled! 23.976 / 24 / 29.970 / 30

If the only framerates we use are 24, 25 and 29.97 then why to people speak of using 23.976? This is to do with how the movie has been created. A 25 fps movie still has the same amount of frames as a 24 fps movie because none have been added. But nevertheless a PAL television chooses to play them back at 25 fps. This makes the PAL movie play back at a slightly shorter length and means the audio will be out of synch slightly. To compensate for this, when a movie is telecined they apply to it what is called a 'pitch-correction' which speeds up the audio to match the playback speed, in the case of PAL this means they perform a pitch correction of about 4%.

The amount of frames an 3:2 pulldown telecined movie has is 30 fps. But an NTSC television will play them back slower at 29.970 fps (59.94Hz). The amount of actual frames hasn't changed, none have been added or taken out! Here is where the 23.976 part comes in! If we inverse telecine a 30 fps movie we would end up with 24 fps. But if we inverse telecined a 29.970 fps movie, because it has a slightly slower speed, instead of getting 24 fps as we should, we will end up with the slightly slower rate of 23.976 fps.

PROBLEMS WITH INTERLACED MOVIES

Interlaced movies look fine on a standard TV but for some unknown reason they appear terrible on a PC monitor!? Lets take a look at our example one last time to see why. Look at the last row where it shows how the top field of frame B is interlaced together with the bottom field of frame C.

We are getting the top and bottom fields from two completely different frames!! Imagine taking half of one picture and half of the next and trying to put them together into a single picture, its impossible! On a PC this produces what we see below. Here we have Star Treks William Riker walking across the room from left to right. Notice that the top field from the previous frame shows him a little to the left and the bottom field of the next frame shows him a little to the right. This is what produces this combing effect and no amount of shifting the lines to the left or the right will fix it!

Inverse Telecine Troubles

Look at our illustration one more time. A 3:2 pulldown movie can also be encoded as 2:3 which produces exactly the same result but is done backwards - instead of getting 3, 2, 3, 2 we will get 2, 3, 2, 3! But this doesn't matter because since a 3:2 pulldown movie can be cut and edited after it is made the very first frame doesn't always start with the top of field of A anyway! It could, for example, start with the next one across - the bottom field of A. In fact, it could start with absolutely any of the 10 fields in the sequence!

Hence as far as I can see there must be at least 10 ways to perform inverse telecine. Five assuming the first field is top and five assuming the first field is bottom. Let me know if you know exactly how IVTC works and I'll update this article to explain it better.

OTHER ISSUES

Some of the specials and extra features of a DVD seem to have been recorded from a telecined 29.970 fps source! This means that the interlaced picture is actually edited as an interlaced picture on a computer and then reinterlaced again! There is absolutely no way to fix such a problem because the lines are literally a part of the original picture now. For example, I have taken this frame from the trailer of one DVD and separated the fields into two. When I squash all the lines together from one single field I get the following picture:

Of course, I could be completely mistaken about this, but that is what appears to be the case.

Capture Cards

Most of the Graphics Cards, TV Tuners and Video Capture hardware we use to record video to the PC will not perform any kind of IVTC. Neither do they seem to give a damn what order they whack the TV fields together. This means regardless of if you use PAL or NTSC, if you want to capture any video footage at above 240 pixels high (for NTSC, PAL is 288) you will get at least some interlace problems! When you are capturing below 240 pixels the capture card will only use one field and hence interlace problems will be almost impossible. If your capture card can get larger picture without problems check the instructions to see how its doing it! You may find to your horror that it is actually just capturing at 240 and enlarging the picture after it has been captured. This is obviously a serious waste of space!

Deinterlace filters

Since to perform inverse telecine (IVTC) to make a 30 fps movie back into a 24 fps movie is so awkward there are a few alternatives that have been designed to work on just about any movie. There are only two types that I know:

Bobbing: To Bob basically means to enlarge each field into its own frame by interpolating between the lines. So from one field we are producing a full frame. Because the top fields are a line higher than the bottom the image may appear to "bob" but this is usually fixed by nudging the while frame up or down a pixel. You are only really getting half the resolution with bob but the interpolation is usually very good quality. If you are stuck for a way to bob your video my AVISynth guide offers a bob feature, check it out Here.

Blending: Flask Mpeg's deinterlace filter look for the parts of a picture where the two fields do not match and blends the combing effect together. The lower the threshold the more the two parts are blended and the less of a combing effect appears. The problem with this method is that the final picture can quite often end up a bit more blurry.

DVD & TELECINE

DVD's offer a strange twist to the whole Telecine and 3:2 pulldown business. Almost all DVD's will have the movie stored as whole pictures at 24 fps. This is the original format of the film with no Telecine. At the start of every Mpeg-2 DVD file there are certain header codes that tell it how to play back the DVD. Since it is stored digitally it can give the fields or frames from the DVD and to the hardware or software in any order it likes. It can split the movie into two fields and perform telecine instantly. To do this has three flags that can be applied to the header code: RFF (repeat first field) TFF (top field first) and FPS (frames per second).

For a PAL DVD the FPS flag can be set to 25 and the DVD will send the picture information to the hardware at 25 fps instead of 24 fps as is stored on the DVD.

For NTSC DVD's the movie needs to be 29.970 fps so the FPS flag is set to 29.970. But this looks odd because the movie is over far too soon. Imagine it like playing cards, if you throw 4 cards on the floor every second the whole pack will be finished in half the time than if you threw 2 cards onto the floor. The solution is to telecine the movie with 3:2 pulldown to increase the amount of "cards" we have to start with. To do this it uses the RFF and TFF flags are set in the header code. By setting the DVD to Repeat the First Field again you make the video display the fields in the order 3, 2, 3, 2. By setting the TFF flag you set the DVD to start from the top field so the order always goes: top, bottom, top, bottom.

Theoretically then, it should be possible to patch the header code of a DVD's Mpeg-2 file and make it play back at 24 fps instead of the 29.970 fps! In fact some people have made patches to do this, but so far, for another unknown reason they are very unreliable and the video turns out just as bad!

Progressive and Interlaced together!

I don't think I have mentioned what a progressive image is yet? A progressive image is a whole frame that it is not interlaced. Motion picture camera's capture images that are progressive. They are not telecined or split into separate fields. Computer monitors do not need to interlace to show the picture on the screen like a TV does it puts them on one line at a time in perfect order i.e. 1, 2, 3, 4, 5, 6, 7 etc.

Many DVD's are encoded as progressive pictures, with interlaced field-encoded macroblocks used only when needed for motion. Flask Mpeg tries to take advantage of this fact, because if you set it to 24 fps (or 23.976) it will give the option to reconstruct progressive images. This does not perform any deinterlacing on the video but ignore all the flags and just reads the DVD one progressive image at a time.

This is another confusing issue for me. I have no idea how a DVD movie can be both interlaced and progressive other than by the fact that a progressive movie can be played back as interlaced due to control flags. If I learn any more about this I will update my articles accordingly.

VHS, VCD & DVD

To finish, perhaps it would be nice to say a few words about the video formats too. It wasn't long after TV that VHS video recorders appeared on the scene and a yet a while latter when the Video CD-Rom's did. Of course, there were other video formats, but VHS (Vertical Helical Scan) and MPEG (Moving Picture Experts Group) won the battle, at least as far as home video was concerned. This is a little strange really because Sony's Betamax video was probably the better quality! Anyway, all video formats to date have required one form of compression or another to be able to record the huge quantities of information needed to store full motion video.

VHS

VHS video is stored just like audio on a reel of plastic tape impregnated with ground up iron. This plastic tape is spun in front of an electromagnet that replicates the strength of the TV's electric beam as they would appear scan across the screen. This caused magnetic 'kinks' in the iron parts of the tape that are almost identical to the original TV signal. A reversal of this storage process would produce the image back on the TV screen. The signal is simplified before it reached the tape therefore making it take up less space.

As anyone who has ever used video tape knows it soon looses quality. It appears grainy, looses colour accuracy and starts to produce white glitches and audio waver - a better solution was needed.

MPEG-1

As computer technology advanced CD-Rom video formats became popular and the Moving Picture Experts Group designed a compression format that could store over an hour of VHS quality video on a single CD-ROM This soon become very popular in the east but never truly caught on anywhere else. This was due to the fact that recording it was difficult and slow and the quality was not really any better than normal VHS anyway. The big big advantages of Mpeg-1 video was that it was almost impossible for it to loose picture quality like a VHS videotape! It could last perhaps over 100 years of use without any noticeable degradation of image quality!

MPEG-2

Since (at the right bitrate) Mpeg-1 was able to produce TV quality pictures superior to VHS, the Mpeg organization decided to design another version that allowed Mpeg-1 back with interlaced images so it could be used for TV broadcasts. This format was called Mpeg-2. Other features were added to Mpeg-2 to make it compress slightly better and higher quality, but the main difference was the addition of interlace support.

Since Mpeg-1 VideoCD's showed that a CD based digital video was not only a viable option, but also a very preferable that is one if the storage space was enough. When CD-ROM designs were upgraded to be able to store 4.38 gigabyte or more of information, it was decided that these new CD's would be the new storage media for video. It was called DVD to mean Digital Video Disc although it was later changed to mean Digital Versatile Disc because it was 'versatile' enough to hold other data besides video.

Resolutions

Resolutions are an important issue for amateur video enthusiasts who want to capture their video at full TV quality. Professional video editors are told to capture at 640 x 480 pixels for highest quality. But a PAL TV resolution is 576 lines. Then we have the Mpeg group saying that 352 x 288 is the full VHS video resolution! The problem seems to lie in the fact that its hard to equate a TV resolution with a computer image. The TV is built up of lines but the dot definition is rather "fuzzy" looking. So rather than me rattling on about the pro's n cons here I will merely end this article by quoting what the Ligos corporation (the creators of the LSX Mpeg-2 encoder) say in regard to this subject:

"The resolution of computer video, however, doesn't generally equate to the video world of televisions, VCRs, and camcorders. These devices have standards for resolution that are generally focused on the horizontal resolution (the number of scan lines from top-to-bottom that make up the picture). Here are some numbers for comparison:

Video Format Horizontal Resolution

Standard VHS 210 Horizontal Lines

Hi8 400 Horizontal Lines

Laserdisc 425 Horizontal Lines

DV 500 Horizontal Lines

DVD 540 Horizontal Lines

With these numbers in mind, it is important to remember this rule when bringing the worlds of computer and video together: the quality of an image will never be better than the quality of the original source material.
We suggest capturing at a resolution that most closely matches the resolution of the video source. For video sources from VHS, Hi8, or Laserdisc, SIF resolution of 352x240 will give good results. For better sources such as a direct broadcast feed, DV, or DVD video, Half D1 resolution of 352x480 is fine. There are other advantages to following these guidelines. Your files will be smaller, consuming less space on the hard drive or on recordable media like CD-R and DVD-RAM. You'll also be able to encode more quickly".

Top | Back to DV-NLE