Speeding up smart PDF search for the municipality of Utrecht

To save municipality of Utrecht's property managers tons of time, we built a web-based PDF search engine, allowing them to quickly and easily search thousands of scanned documents.

Xomnia proved to be a reliable partner in building this very easy to use and powerful PDF search engine. We are now able to find the right information in minutes instead of hours. A very big improvement on how it used to be!
Else Bezemer, Municipality of Utrecht.

Case

Municipality of Utrecht’s Property Management department is responsible for maintaining real estate. The activities primarily relate to corrective maintenance, and preventive & planned maintenance. Depending on the nature of the work, the activities are carried out on a project basis.

Most projects involve searching through a digital archive containing 10,000 scans of PDF documents from over 500 real estate properties including drawings is a technical challenge, and can be quite a headache. To search through these documents, the municipality used an application with an inefficient search system that didn’t allow for targeted searches, and employees spent hours waiting for the system to search. An alternative was needed.

Solution

Xomnia designed and implemented a smart PDF search engine comparable to Google. The ranking of the documents makes use of a smart algorithm that puts the document that is most likely the correct result at the top of the list, and pops open results in a new tab.

We built a backscan PDF search engine for the municipality Utrecht. The web server Nginx was used, and the code is written in Python in the Flask framework. First the pdfs are parsed to make sure everything in the PDF is searchable. Wand is used to make the images searchable, and Tika is used for the metadata. Elasticsearch is then used to search the PDFs for the given query.

Impact

The end result has saved employees a lot of time. This means they have also saved on costs. Previously, searching for a document could take up to an hour, and now it takes just minutes.