Computer Vision / Image Recognition
Extracting information from visual data, Computer Vision / Image Recognition, is a very difficult problem. Human brains have evolved to become extremely good at processing visual information, but for computers this is one of the most challenging problems around.
Image analysis to solve real-world problems
Computer Vision / Image Recognition has been a very active research topic, and we’ve now reached a point where we can use Computer Vision or Image Recognition techniques to solve many real-world problems. Some examples are: recognizing faces or objects in images, tracking objects in video, and classifying medical images to assist with making a diagnosis. For an example of a medical diagnosis problem, see our project on diabetic retinapathy detection.
Computer Vision / Image Recognition solution approaches depend very much on the type of problem. Many image classification solutions separate two important steps: feature extraction and learning.
An image or video is just a number of pixel values to a computer. Between low-level information like pixel values, and high-level semantic information such as “cat” or “face” there is a huge gap (also referred to as the “Semantic Gap”). Instead of trying to infer high-level information directly from pixel values, it is often better to first extract meaningful visual features such as variations in color, texture or shape.
Visual features bridge the semantic gap and make it easier for machine learning methods to infer meaningful information from the data.
Once our visual data is represented by meaningful features instead of raw pixel values, we can apply regular machine learning algorithms to our data. By feeding the algorithm with many labeled examples during a training phase (supervised learning) it can learn patterns that will allow it to classify new data. The image shows an example of a resulting image classification pipeline we use for our Computer Vision / Image Recognition.
Choosing and creating visual features to extract from our pixel values can be a difficult and time-consuming process. Deep learning allows us to skip this step altogether. The use of deep neural networks has been gaining in popularity due to advances in computing power, and in techniques to train these networks. For a description of deep learning, see our deep learning page.
By using neural networks with many layers, we can automatically learn relevant visual features from the data instead of extracting them ourselves. The layers of neurons in the neural network can be seen as feature extractors, with each consecutive layer learning more complex and higher-level information based on the information in the previous layer. In the example below, the network starts with nodes that represent raw pixel values. A few layers deeper are nodes that have learned to capture visual features like diagonal lines. Finally, very deep into the network are nodes that capture high-level semantic concepts like “face” and “cat”.
Another advantage of using deep learning for Computer Vision & Image Recognition problems is the fact that we can also use unlabeled data to pre-train the neural network (unsupervised learning) before actually training it with labeled data. So depending on the type and amount of data, deep learning could be a great solution to a visual classification problem.
Can we help you?
Besides Computer Vision & Image Recognition we’d love to share our big data knowledge with you, please get in touch! Keen on knowing more about us? Xomnia is a big data analytics firm based in Amsterdam, the Netherlands. We are experts in data science, big data engineering and advanced business intelligence. Our services include: big data consultancy & projects, big data training and big data traineeships.