ocr

paperless

20 May 2020Last Commit6725 (1519/yr)Github Stars151Issues

[ en | de | el ]

Index and archive all of your scanned paper documents

I hate paper. Environmental issues aside, it's a tech person's nightmare:

In the past few months I've been bitten more than a few times by the problem of not having the right document around. Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper. I wrote this to make my life easier.

Paperless does not control your scanner, it only helps you deal with what your scanner produces

ambar

28 Apr 2020Last Commit1459 (418/yr)Github Stars3Issues

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement full-text document search into your workflow.

Tutorial: Mastering Ambar Search Queries

Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.

docs

23 May 2020Last Commit470 (69/yr)Github Stars43Issues

Teedy is an open source, lightweight document management system for individuals and businesses.

A demo is available at demo.teedy.io

A preconfigured Docker image is available, including OCR and media conversion tools, listening on port 8080. The database is an embedded H2 database but PostgreSQL is also supported for more performance.

The default admin password is "admin". Don't forget to change it before going to production.

The data directory is /data. Don't forget to mount a volume on it.

To build external URL, the server is expecting a DOCS_BASE_URL environment variable (for example https://teedy.mycompany.com)

hrcloud2

21 Nov 2019Last Commit107 (28/yr)Github Stars2Issues

YOUTUBE CHANNEL!

WIKI DOCUMENTATION!

A Fully Featured home-hosted Cloud Storage platform and Personal Assistant that Converts files, OCR's images & documents, Creates archives, Scans for viruses, Protects your server, Keeps itself up-to-date, and Runs your own AppLauncher!

HRCloud2 is a personal Cloud CMS Platform similar to ownCloud but with far greater capability that includes all the same functionality as a commercial end-user based Cloud platform. Functions like file conversion, OCR, archiving, dearchiving, A/V scanning, sharing and more. With HRCloud2 you can perform all your favorite bash and command line tools just by selecting checkboxes and clicking buttons, from anywhere.

papermerge

22 May 2020Last Commit103 (271/yr)Github Stars3Issues

In a nutshell, Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS.

Papermerge DMS on its turn will OCR the document and index it. You will be able to quickly find any (scanned!) document using full text search capabilities.

You can try it with just 3 simple commands (you need git and docker-compose):

docspell

23 May 2020Last Commit47 (55/yr)Github Stars4Issues

Docspell is a personal document organizer. You'll need a scanner to convert your papers into PDF files. Docspell can then assist in organizing the resulting mess 😉.

You can associate tags, set correspondends, what a document is concerned with, a name, a date and some more. If your documents are associated with this meta data, you should be able to quickly find them later using the search feature. But adding this manually to each document is a tedious task. What if most of it could be done automatically?