Master Scanned Book Processing: Acrobat Pro: Comprehensive Guide: Optimal Efficiency, Searchability

Sdílet
Vložit
  • čas přidán 19. 08. 2024
  • Processing a large PDF of a book scan using Adobed Acrobat Pro DC (2017). Bookmarks, table of contents, Optical Character Recognition (OCR), Searchable Text, Editable Text and Images, Embeded Index. PDF file contains 249 pages
    BEST TIPS! SAVE TIME! BEST RESULTS!
    digitize.mosie...
    *Optimizing Scanned Book Processing in Adobe Acrobat Pro with Peter*
    In this video, I'll guide you through the meticulous steps I take to process a scanned book in Adobe Acrobat Pro, aiming to create a highly usable, readable, and compact PDF document. The process begins with the creation of an electronic table of contents using bookmarks, ensuring seamless navigation. To expedite this, I share some shortcuts, like capturing a screenshot of the table of contents for quick reference.
    *Efficient Page Labeling and Initial View Settings Adjustment*
    Next, I meticulously adjust page labels to align with the original book's page numbers, guaranteeing precision in the table of contents. I also demonstrate how to modify the document's initial view settings, making the bookmarks panel visible upon opening. These small tweaks enhance user experience and document accessibility.
    *Enhancing Text Searchability with Optical Character Recognition (OCR)*
    I delve into the Optical Character Recognition (OCR) process, transforming scanned pages into searchable and editable text. I share insights into the advantages of multiple OCR passes, emphasizing the achievement of a smaller file size and a cleaner final result. Efficiency and quality are at the forefront of this critical step.
    *Streamlining Document Search with Embedded Index*
    Following the OCR process, I embed an index to streamline document searching, making it more efficient for users. I explain my approach to saving both searchable and editable text versions, carefully considering file sizes. The video explores the quality and file size comparison of different OCR versions, culminating in the selection of a smaller, sharper, and more efficient editable text version for future use.
    *Organizing Files and Implementing Backup Workflow*
    Concluding the digitization process, I demonstrate how I organize files and archive backups. This step is crucial for maintaining an efficient workflow in digitizing my library. The emphasis is on keeping files organized, accessible, and secure, ensuring a smooth transition to a digital library.
    By following these comprehensive steps, you can optimize your scanned book processing in Adobe Acrobat Pro, creating PDF documents that are user-friendly, compact, and efficient in both navigation and searchability.
    Note for clarity: both "Searchable" and "Editable" versions are searchable; you can CTRL-F and find text in either document, and select+copy text from either document. The "Editable" version also let's you edit text, which I don't talk about in this video.
    1:05 Make electronic table of contents
    1:27 CTRL-B shortcut to create bookmark
    1:40 F2 shortcut to edit bookmark label
    2:30 Make a bookmark BOLD
    2:50 Screenshot the book's ToC using the Windows 10 Snipping tool
    3:05 Place ToC screenshot on the side to speed up subsequent steps
    3:16 Make PDF page labels match the paper book page labels
    4:05 Organize Pages function to change page labels; use prefix where needed (Cover-1, Cover-2, etc.)
    5:30 Refer to screenshot to jump from chapter to chapter, and CTRL-B to create a bookmark at each chapter
    6:25 Example of using nested hierarchy in Bookmarks
    7:12 Fast-forward creating all chapter bookmarks
    8:00 Double-check your work, look for typos and errors
    9:30 OCR description and overview
    11:00 Begin OCR using Searchable Text output
    12:35 Document properties: Initial Page View == Bookmarks Panel and page
    13:05 Document properties: Metadata Description
    14:12 OCR using Editable Text and Images output
    14:48 Add Embedded Index to speed up future document searches
    15:30 OCR Editable (continued)
    17:00 Explanation of why I prepare two different versions, using both "Searchable" and "Editable" OCR
    17:27 Compare file sizes:
    Original file size: 111 MB
    OCR 'Searchable' file size: 85 MB (76.5% of original file size)
    OCR 'Editable' file size: 15.7 MB (14.1% of original file size)
    17:47 Compare text quality of the OCR output. Editable is actually better quality (sharper, with no artifacts), even though the file size is much smaller, because it is using an scalable vector font.
    19:45 File cleanup: delete the original "fat" file, rename the small 15MB file and add it to the Calibre watch folder, so it gets added automatically to Calibre Library.
    Music @ 7:12: Everything Nice by Jingle Punks, available from the CZcams Audio Library and "free to use" in monetized CZcams videos.
    Digitize your books, digitize your library, digitize you life. Scan and declutter.

Komentáře • 53

  • @stevenwoodfield1658
    @stevenwoodfield1658 Před 9 měsíci +1

    Thank you so much for your videos! I've found them a perfect starting point for my own digitizing journey. Before I go on to the next step in the process, I come back to reference your videos. Thank you for saving me time and a headache, blessings to you!

  • @etiennedegaulle3817
    @etiennedegaulle3817 Před 3 lety +2

    FYI, as of the December 2018 version of Acrobat DC, "the embedded index in the PDF is no longer used for searching." Hopefully they are adding an internal optimized index by default.
    Great video by the way! I just started digitizing most of my library. This video save me lots of time and trial and error!

  • @larrythibodeaux7236
    @larrythibodeaux7236 Před 8 měsíci +1

    Thank you so much for this! I bought the czur shine book scanner, and it scans flat paper ok. but converting the text into a searchable pdf is not good at all. This is way better than CZUR! Adobe doesnt make unknown characters like , and it doesnt combine 2 words into 1. And it also doesnt change the order of the paragraphs when you copy and paste it. Thank you for showing me this!

  • @gabrielcastejon7914
    @gabrielcastejon7914 Před 2 lety +1

    You're a great samaritan

  • @scotto0010
    @scotto0010 Před 6 lety +3

    This is an amazing set of videos. Very concise when it can be but then plenty of detail here in the final video where it is really needed. You have a very good system and seem to have thought about everything. I would love to pick your brain about this type of project but replace the word "books" with "magazines". There are a whole host of additional issues to deal with on the magazine side. Thanks for your time and effort.

    • @PeterMosier
      @PeterMosier Před 6 lety +1

      scotto0010 Thank you so much for the kind words. You’ve made my day! As for magazines: when I first started experimenting with and learning how to do this, I did a few magazines. The results weren’t quite as good as text-only books, which benefit from Clear Scan, but they were still pretty good. Some “moire” patterns in the photos, but I didn’t play enough to improve that totally. If you have a few questions, feel free to ask in this thread. Maybe I can help, or perhaps make a video about digitizing mags. Thanks again.

    • @scotto0010
      @scotto0010 Před 6 lety

      That would be great. Th main things about magazines are
      #1 how do you deal with yellowing of old magazine pages? Also, what is the best "Dpi" to scan them at? Is it best to start big and down-sample form there? How would Clear Scan work with magazines - especially with colored pages and colored text? I suppose a lot of the questions though would entail the post-processing in Photoshop in order to make it look right. What size files would we be looking at for a "good", highly readable product? It seems that really only a few things would overlap between creating text only books and picture/image heavy magazines.

  • @MrsCalabresesTeachingChannel

    Great information! Thanks! Mac user here, I often find Adobe difficult to navigate, this helps!

  • @wjhyde
    @wjhyde Před 5 lety +1

    Thank you for this video. Very helpful.

  • @rodolfo6168
    @rodolfo6168 Před 2 lety +1

    Recommend Downsample: 300 dpi

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 2 lety

      I scanned at 300 dpi (see 10:15 in video). Are you recommending down-sampling to something lower than that? Thanks!

  • @lkj234
    @lkj234 Před 6 lety

    Great content! Keep up the good work!

  • @UncleMatte
    @UncleMatte Před 5 lety

    Thanks for this video, in showed me LOTS of things I was doing wrong, and how to improve things way beyond what I was doing. I was hoping to ask you a REALLY long question, probably to long to put here? I there any way to send it to you, or I can put it on my dropbox account? for you to read if you have a spare moment or two? .... I can barely use Facebook, no clew about twitter or any of the "other ones" .... I can post it here, but it might bore everyone! Thanks Again !!!

  • @theanthropic8114
    @theanthropic8114 Před 4 měsíci

    Thanks for the tips. For my part, I found it odd that after combining my .tiff files using Adobe Acrobat DC Pro, then converting them to searchable images at 300dpi (originally 600dpi), then to editable text and images, the file size somehow increased from 9mb (after searchable images) to over 10mb (after editable text and images).
    Any idea why? Thanks.

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 4 měsíci +1

      🤔 I have no idea why it swelled after editable text & images. Fortunately, going from 9MB to 10 MB is not a big difference. I wouldn't worry about it -- but I'm still wondering why it happened!

  • @larrythibodeaux7236
    @larrythibodeaux7236 Před 8 měsíci

    Also, can you do a video of audiofying your books with the software balabolka? Meaning making them into audio books? And buying a text to speech voice like IVONA Amy voice?

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 4 měsíci

      Ooohhh... this touches a nerve for me. I have produced some audiobooks the old-fashioned way, by performatively reading the text and then painstakingly editing the production. (voice.mosier.ca/). Automated text-to-speech (TTS) is destoying the low end of the audiobook narration vocation. Having said that, I might to a video on this topic just to compare the results to a human reader. Thanks for the suggestion!

    • @larrythibodeaux7236
      @larrythibodeaux7236 Před 4 měsíci

      @@DigitizeYourBooks Haha that sounds great!

  • @gr3yg0at
    @gr3yg0at Před 4 lety +1

    I just finished digitizing another book. As I started to read through it I noticed Adobe Acrobat had changed some of the words. I compared it to the original scan and the paper book itself and confirmed words were being changed. What I have found is words are being changed during the step when changing text with the editable text and image option. I'm curious if anyone else is seeing this happen.

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 4 lety

      I haven't noticed this but it may be possible. I have noticed where engineering graphs and drawings get modified during the "Editable text and Images" process. It is for this reason that I first do a conventional OCR, and then repeat the OCR using "Editable Text" -- just in case the "Editable" process messes up.
      My guess for what is happening: OCR is not perfect. And when using "Editable Text" method, the image of the word(s) is replaced by the OCR result. So if there is an error in the OCR, that error is now "baked in" the final text.
      Thanks for commenting. Cheers!

    • @gr3yg0at
      @gr3yg0at Před 4 lety

      @@DigitizeYourBooks I'm also curious why using the "editable text" more than doubles the file size. My book went from 22mb to 88mb. The book did have a lot of images and I wonder if this is whats causing the jump in file size.

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 4 lety +1

      Another CZcams viewer had the same issue. I suspect you are correct: images seem to be not handled well with Acrobat's "Editable Text and Images" option. I really, REALLY, wish Adobe would give us an option for "Editable Text" which doesn't try to make the images editable. In addition to file size growing, I have seen it mangle the images, but so subtly that it isn't obvious -- truly dangerous for an engineering textbook. That is the main reason that my process is two-step: (1) regular OCR and (2) Editable Text/Images OCR.
      Hope this info helps. Cheers!

  • @UncleMatte
    @UncleMatte Před 5 lety

    I tried to contact you through twitter, what a nightmare, it endless looped me to "how to" do this and that, but no way to sign up and fix things! A bit dizzy, I'll try again later. Either way your help is GREATLY appreciated?

  • @gr3yg0at
    @gr3yg0at Před 4 lety +1

    Great video. When I followed this for a pdf I have, 578 pages, the final file size is more than double the original file size. It started out as a 43 mb file, did the first OCR "searchable" text file and the file size was reduced to 31mb. One the next OCR "editable" text the final file size jumped to 116mb. That doesn't seem right. I have gone through the process twice with the same results. Any ideas?

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 4 lety

      Hmmmm, that is a real head-scratcher. I've not seen that before. Perhaps there is something unusual about this book? Perhaps lots of diagrams, that are more difficult to convert to "editable images"? Or maybe lots of weird fonts? Just a guess.
      This reminds me: I wish Adobe gave the option for "editable text" without also trying to create "editable images". I never want the images made to be editable, and have found for some engineering texts it really messes up parts of some images when making them editable (in dangerously subtle ways).
      The good news: because you followed my 2-step solution, you now ditch the 2nd (larger) version knowing that it is needlessly too big. That is why I always do it as a 2-step: in case there is a problem with the "editable text" version. Cheers!

    • @gr3yg0at
      @gr3yg0at Před 4 lety +1

      This book does have a lot of pictures and illustrations. Since you have mentioned them I am starting to think that is whats causing this issue.
      My engineer brain is thinking there must be a way to exclude the images. Now I know how Im spending my weekend.

  • @koritz123
    @koritz123 Před 6 lety +1

    Would Adobe Acrobat Pro upgrade 2015 do an equivalent job as opposed to leasing the 2017 version for a month. The 2015 upgrade is available for $65. I checked and if I'm not mistaken the 2017 version of Adobe DC leases for approximately $25 a month as of December 2017.

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 6 lety +1

      Hi koritz123, thanks for asking, I am flattered that you asked. However I do not know the answer. They key question is: does the 2015 version do the "Editable Text and Images" OCR feature? I think it was called "ClearScan" back then, and I don't know if Adobe made any changes to the algorithm when they changed the name. You should also know that the 2017 Subscription includes other services that may, or may not, be important to you, so you can consider that. Having said that, if the 2015 has ClearScan (aka Editable Text and Image OCR) and you don't need any newer features, then the 2015 version should be OK for you.

    • @koritz123
      @koritz123 Před 6 lety +1

      Digitize Your Books I found something online about the new version of Adobe Acrobat Pro DC 2017 being able to resize or rescale pages to be more easily read in Kindle and other e-reader software so that being the case I like the idea of being able to resize a PDF other than just cropping off the white part of the perimeter. So this goes along with your comments about the 2017 version having potentially more features that may be useful than maybe an Antiquated version that's a few years old.

  • @daithiocinnsealach1982
    @daithiocinnsealach1982 Před 5 lety +2

    That book Voodoo Science looks interesting, but the cover is awful. I''m even more shocked to see it's an Oxford Press book. It looks like an attempt by an amateur self-publisher, rather than a professional cover made by one of the largest and most prestigious publishers in the world...

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 5 lety +1

      Agreed: not a very impressive cover on this book. Contents are interesting, though.

  • @UncleMatte
    @UncleMatte Před 5 lety

    Before I start going farther down the "Rabbit Hole". I have an older Acrobat 7 Pro version. Is it worth it for me to spent ($100 to $150) for a much newer version of Acrobat? Thanks!

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 5 lety +1

      I would only suggest upgrading software if your current software is missing a feature you need. Specifically, I personally MUST have “Editable text and images” feature, as explained in this video. That feature has had different names in previous versions, and I don’t know whether or not v7 has that feature. If it does (by any name) then probably no need to upgrade. Cheers.

    • @UncleMatte
      @UncleMatte Před 5 lety +1

      @@DigitizeYourBooks Thanks, I "Upgraded to Ver XI. A little different, but I'll get the hang of it eventually. Thanks for your advice.

  • @DanielRamos-zx1kh
    @DanielRamos-zx1kh Před 6 lety

    Hi Peter, do you have any way of private message you?

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 6 lety +2

      I am on Twitter @PeterMosier you can follow me, and then DM me there if you like. What did you want to talk about?

    • @DanielRamos-zx1kh
      @DanielRamos-zx1kh Před 6 lety

      I just tried to DM you at Twitter but I only can If you follow me. Anyway I Just OCRed a scanned book, but there are some texts that aren't recognized by the OCR. Look here: i.imgur.com/avlUkD4.jpg
      Do you know how to get recognized these texts?

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 6 lety +1

      I suspect the problem is poor contrast. That is, instead of black and white (the normal for most books) your example had light grey text on a non-white background. From my experience, poor contrast confuses OCR.
      You can try playing with the contrast settings in your scanning software to try to increase the contrast so that it works. However, you might not be able to ever get it to OCR correctly, especially for the very light grey text.

    • @DanielRamos-zx1kh
      @DanielRamos-zx1kh Před 6 lety +1

      Digitize Your Books Thanks for your response! And you know how to type that specific part manually?

    • @DigitizeYourBooks
      @DigitizeYourBooks  Před 6 lety +1

      Manual corrections may or may not be possible, depending on which software you are using.

  • @festerbutt
    @festerbutt Před 3 měsíci

    Thanks, this will help me a lot!