Switching from a Mac with osx to Linux can be though. Especially when it comes to scanning. Many years ago I had some first interactions with the SANE project, which is the solution for scanning under Linux.
As scanning with my Fujitsu ScanSnap has been quite comfortable and not quite FOSS with osx and the vendors software it's time to migrate the results to proper utlities!
sane setup for ScanSnap scanners
dnf it's time to install the drivers.
This page provides download links.
You can copy the file to
sudo after creating the directory.
The rest of the sane setup is covered in various other blog posts, e.g. the one linked before. I won't cover the setup in this post.
PDF and OCR - where to start?
As you may have discovered it's though to find any current and still working solutions for converting your scans to PDF and add OCR text data. Some of the tutorials are simply outdated, others refer to graphical applications.
I've been searching for solutions myself and here are some notes:
scanning under Linux is still a mess
GUI applications are easier to find
pdfocr looks simple, but not a solution (as it requires a broken dependency called
tesseract is great!
* many projects that cover the issue partially are not maintaned anymore
* as always it's useful to verify last commit and release dates
After reading quite some blog posts, scripts, commit histories and so on I've decided that my solution must be build with tools I can install with
pip3 to make sure security updates can be installed easily and to have a chance that the projects are available in a later Fedora version as well.
Installing dependencies - using the script
Before using the script some dependncies should be installed:
dnf install tesseract tesseract-osd tesseract-langpack-deu ocrmypdf netpbm-utils ghostscript fish
If you're only scanning documents with English text
tesseract-langpack-deu is not required. It can also be switched for other languages, e.g. Russian with the package
You can find the script from this link on GitHub or copy it from this page. After the download adjust the device id and give it a try.
#!/usr/bin/fish # change this to your device id, see scanimage -L for a list of your devices set -x device 'epjitsu:libusb:001:019' # exit if no title is provided if not set -q argv: echo "Please enter at least a title!" exit 1 end # simply convert arguments to variables if set -q argv set -x title (echo $argv) end if set -q argv set -x resolution (echo $argv) else set -x resolution 300 end if set -q argv set -x mode $argv else set -x mode Gray end if set -q argv set -x destinationdir $argv else set -x destinationdir '/home/rullmann/Downloads' end # create temporary dir with variable name set -x tempdir /tmp/scan_(tr -dc 'a-z0-9' < /dev/urandom | head -c 32) mkdir $tempdir # create output filename for final pdf by converting the title and adding the date set -x outputfile $destinationdir/(date +%F)_(echo $title | sed -e 's/\(.*\)/\L\1/' -e 's/\ /_/g').pdf # actually scan and process the input scanadf -d $device --resolution $resolution --mode $mode -o $tempdir/%d ;and for file in (ls $tempdir/) ; pnmtops $tempdir/$file ; end | ps2pdf - | ocrmypdf -l deu+eng --rotate-pages --deskew - $outputfile --title "$title" # remove temp dir rm -r $tempdir