From theory to the perfect tuber: how computers learned to see
Have you ever wondered how a machine can tell the difference between a clump of earth and a potato? What comes naturally to us humans – effortlessly recognising objects in a fraction of a second – was an almost impossible task for computers for decades. The story of how machines learned to ‘see’ is fascinating and leads directly to the technology used today in modern sorting systems such as those from Karevo.
The long road to machine vision
In the early days of artificial intelligence (AI) in the 1960s, researchers were optimistic. It was believed that the problem of machine vision (computer vision) could be solved in a short time by teaching computers fixed rules. But the reality was more complex: a computer does not see shapes, only a long list of numerical values (pixels). For decades, scientists struggled to teach machines to reliably recognise simple objects under different lighting conditions or viewing angles.
Dr Fei-Fei Li and the data revolution
An important turning point came with a visionary scientist: Dr Fei-Fei Li. While many of her colleagues were trying to write better algorithms, Li realised that the problem lay elsewhere: the machines lacked experience. Like a child learning about the world through observation, computers needed examples – lots of them.
Inspired by the human ability to distinguish between tens of thousands of object categories, she launched the ImageNet project. Her goal was to create a database that mapped the entire visual world. The result was a collection of millions of images, sorted and named by humans.
2012: The breakthrough by the ‘deep learning trio’
But the data alone was not enough. An architecture was needed that could process this flood of information. This is where three names come into play that changed the world of technology forever: Geoffrey E. Hinton, Alex Krizhevsky and Ilya Sutskever.
In 2012, this team took part in the ImageNet competition. While other researchers were still working with traditional methods, Krizhevsky, Sutskever and Hinton relied on so-called convolutional neural networks (CNNs) – algorithms modelled on the functioning of the human brain and visual perception.
Their model, known as AlexNet, was the first to use the computing power of modern graphics cards (GPUs) to train their model on huge amounts of data. The result was sensational: they drastically reduced the error rate in image recognition and left the competition far behind. This moment could also be described as the birth of modern deep learning – the very technology that controls self-driving cars, powers ChatGPT and even sorts potatoes.
Karevo: High technology on the farm
At Karevo, we utilise precisely these achievements. Our sorting machines combine Dr Li's principles (large amounts of potato image data) with the neural network architectures revolutionised by Hinton, Krizhevsky and Sutskever.
Our sorting system, the Karevo Duo85, uses state-of-the-art computer vision to not only imitate the human eye, but to surpass it in endurance and objectivity. We have trained our AI to understand the specific characteristics of potatoes.
The system does not work with rigid, deterministic rules, but has learned to identify defects based on visual patterns. This enables us to achieve a detection accuracy of approximately 95%. Our system reliably detects:
Diseases and defects: From green colouring and scab to wireworm infestation, growth cracks, and wet and dry rot.
Foreign objects: Stones and clods are reliably distinguished from the harvest and sorted out.
Just like Dr. Li's models, which learned to distinguish between hundreds of dog breeds, Karevo is constantly learning thanks to machine learning and is constantly being improved by new training data.
Conclusion
The technology, which can now sort up to 5 tonnes of potatoes per hour, is based on decades of research. Thanks to pioneers such as Dr Fei-Fei Li, computers are now able to understand the complex nuances of nature. For farmers, this means less manual labour, higher quality assurance and sorting where, as we say at Karevo, every pixel counts.
Further resources Quellen
Li, Fei-Fei (2023): The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI. Flatiron Books: A Moment of Lift Book.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105). Curran Associates, Inc.
Marino, S., Beauseroy, P., & Smolarz, A. (2019). Weakly-supervised learning approach for potato defects segmentation. Engineering Applications of Artificial Intelligence, 85, 337–346. https://doi.org/10.1016/j.engappai.2019.06.024