Jan 22 2009 7:36AM | Permalink |Comments (8) |
When Jim Williams showed me his great article about diode turn-on time causing failures in switching regulators, he mentioned that the best article about sampling, or equivalent-time oscilloscopes was an old Tektronix white paper from 1964. I went home and typed “Sampling Notes” and “Tektronix” into Google and the document was not available online. So Jim was nice enough to lend me his original copy. I scanned it into my computer, and rather than just have giant pdf of page images, I did an optical character recognition (OCR) so that the text of the document is actual text as opposed to an image of the text. This makes the file size much smaller and also makes the entire document text searchable by Google. I saved each of the pages as rtf (rich text format) documents. I then took all of the figures and charts and rescanned them and compressed them into small gif files. I linked those gif files into the rtf document. I then used Acrobat 4.0 to make pdfs and stitch all the separate pages together. I added a scan date and revision number on the title page. I sent the file to the wonderful folks at Tektronix, and they were nice enough to post it on their site. They added a copyright note on the last page. Old digital is obsolete digital, but old analog is every bit as good as new analog. This was a ton of work that I did on my own time, since it really is Tektronix’s property, I could not justify doing it for EDN. I hope you folks appreciate Jim donating the original, Tek posting the file, and my work scanning, ocr-ing and making the pdf. I took a lot of effort to make the file small. The cover page and contents page are scanned images, so if you want to make the pdf really small you can strip off those two pages, but I really like to see them since they add to the character of the document.
What I learned from this is really helping me scan in Jim’s old articles for EDN from the 1970 and 1980s. This is also a personal-time effort, so I appreciate your patience. In case anyone wonders about the details, I used my new canon Canoscan 8800F. I set it for 600dpi black-and-white and scanned the image into Photoshop 6.0. I then saved each page as a gif using the “save for web” feature. I used the free Scansoft Ompnipage SE 4 that comes with the scanner for the OCR. It seems to be an update of the old Textbridge software I used to use. I brought in each page individually. I did not let the program auto-recognize the pictures and text, I used the selection tools to highlight the text and then the pictures. The OCR was pretty much perfect, other than some headaches with the subscripts and the math. (This is what is making doing Jim’s old articles so time-consuming). Scanning the page at 600dpi really seems to improve the OCR accuracy. I saved each page as an rtf file. These files were rather small, maybe 200 to 300kB, but Scansoft is smart enough to save images as compressed. My Word 2000 would immediately blow the file size of each page up to 2.4MB since Microsoft thinks that saving images as compressed should only be allowed in PowerPoint. I rescanned the images at 300dpi (I could have converted, but I did the first scan off a Xerox copy and then Jim found the original document, so I wanted to rescan anyway.) I did this scan into Photoshop 6.0 at 300dpi black-and-white. I than used the wonderful free Irfanview to crop out each figure and used Irfanview’s save-to-web plug-in to make very compact two-color gif images of all the figures.
The most time-consuming job was to go into each of the rtf files and correct all the font types and sizes and paragraph styles and text-box positions. I also forced the pages to be 8.5 x 11 in the Word 2000 program. I then deleted the existing giant figures and inserted the figures in my re-scanned gif files, but used the little known “link to image” feature in Word 2000. You can see this when you go to insert a picture; there is a little arrow on the “Insert” button. When you click on the arrow, Word lets you link to the image file rather than bringing it into the document and uncompressing it. This made the Word 2000 rtf files only 20kB instead of megabytes, and the gif figures ranged from 10kB to 20kB. The only typical Microsoft incompetence is that once you close the rft file and open it again, Word 2000 forgets the sizing of the figure images so you have to drag them out to the proper size. So before closing the rtf files, I made a pdf using the Acrobat 4.0 printer driver. I did not use Distiller since this was already turning into a life’s work. The front cover I scanned into Photoshop 6.0 at 300dpi color, and saved-to-web as a low-res jpeg. By resizing the image to half-resolution, the dithering algorithms in Photoshop smooth the image over so the jpef is much smaller— I got it down to 227kB. (I learned this from scanning old EDN pages. Since it is a screen-printed magazine page, the gifs would be rather large for all the figures. If you resample the image by reducing it’s size from 600 dpi to 300 or even 150, the separate dots of a screen-printed image smooth over. Then I used Irfanview to save-for-web as a two-color or three-color gif, depending on how many colors in the original figure. This made the magazine page figures go from hundreds of kB to 10 or 20kB.) As you would expect, the pdf of a compressed image does not reduce things too much, I was making all the pdfs at screen resolution so the pdf of the front cover was 215kB. The jpeg of the table of contents was 68kB jpeg and 64kB pdf. The gifs I made for all the figures ranged from 6kB for Fig 25 to 40kB for Fig 7. If you keep the OCR-rtf and figure-gif files in the same subdirectory, then Word 2000 can find the figures when it opens the rtf files, even if you have moved the directory to a different computer or path. The rtf files were delightfully small, from 10kB to 20kB on average. All I added was a scan date and revision number on the title page. Once I had all the pdf files for each page, including a pdf of the cover page image and the table of contents image, I opened the cover pdf in Acrobat 4.0 (writer) and just added in all the other pages and saved it as a master document. One real benefit of creating and assembling the pages in Acrobat 4.0 instead of the newer Adobe crash-ware is that the resultant pfd files can opened by most any reader, even older ones.
If I had time to do a rev 2.0 I would have spent even more time in the rtf files arranging all the columns and margins so they were exact. I am giving the originals to Tektronix since it is really their property, maybe they can give a summer intern that job. A big thank you and shout-out to Amy Higgins, Tek PR wunderkind, who helped get the files posted on the Tek site.
Related entries in: Analog |