Computer literacy, help and repair

We are writing a script to search for books in images using Python and OpenCV. OpenCV

Hello! I was faced with the task of implementing road sign recognition from a video stream. Since I have not encountered tasks of this kind before, the implementation process itself implies a preliminary long “smoking” of forums and ruthless mockery of other people's examples. Therefore, I decided to collect everything I read in one place for future generations, and also, in the course of the story, ask Habr a few questions.

Preludes.

So, after studying all the tools that can be used to implement the task, I settled on the Microsoft Visual Studio © 2010 development environment, using the wonderful OpenCV library.

The very process of working with OpenCV involves preliminary dances with a tambourine, about which there are enough detailed descriptions:

The second act of dancing with a tambourine.

As a result, I turned towards the training of cascades. Having “smoked” in this direction, I realized that I need two tools, createsampes and haartraining. But I didn’t have their exe`s, and they refused to compile. At that time, I had the version of OpenCV 2.4.4, configured by , but in the article I first read about using Cmake during installation. As a result, I decided to download version 2.3.1 and reinstall the library. After which I was able to launch the necessary tools through command line and the question arose how to work with them. All the "and" dotted the articles, which show the parameters with which you need to run createsampes and haartraining with a detailed description of these parameters.

Code from scratch.

Finally abandoning the old method, the code was rewritten to substitute trained cascades.

Code 2.0

#include "stdafx.h" #include #include #include using namespace cv; int main(int argc, char** argv) ( Mat frame, gray; string object_cascade = "haarustupi.xml"; CascadeClassifier haar(object_cascade); VideoCapture cap(0); namedWindow("Video", 1); vector objects; while (true) ( ​​cap >> frame; cvtColor(frame, gray, CV_BGR2GRAY); haar.detectMultiScale(gray, objects, 1.9, 10, 0,Size(50, 50)); for (vector ::const_iterator r = objects.begin(); r != objects.end(); r++) rectangle(frame, r->tl(), r->br(), Scalar(0, 0, 255)); imshow("video", frame); if (waitKey(33) >= 0) break; ) return(EXIT_SUCCESS); )

We set up the environment in the same way as in the previous project.

Repetitions are the fathers of learning.

It's up to the "small" to train the cascades.)
Here begins the most interesting. After that, I decided to write about all these ordeals on Habr and ask for advice.
I have prepared 500 images sized 1600x1200. and one image with a sign sized 80x80. One image will be enough, because we are detecting a specific object, not a huge variety of faces.

So, having prepared the pictures and created the neg.dat file with the structure

Negative/n (1).jpg negative/n (2).jpg negative/n (3).jpg negative/n (4).jpg ... negative/n (500).jpg

run the opencv_createsamples.exe file via CMD with the following parameters

C:OpenCV2.3.1buildcommonx86opencv_createsamples.exe -vec C:OpenCV2.3.1buildcommonx86positive.vect -bg C:OpenCV2.3.1buildcommonx86neg.dat -img C:OpenCV2.3.1buildcommonx86ustupi.jpg -num 500 -w 50 -h 50 -bgcolor 0 -bgthresh 0 -show

the -show parameter shows the generated positive images, but they, unlike those indicated in other articles
pictures, it turns out like this

small

That is, the utility crops the bg-image to the size of the positive image. Changing the -w and -h parameters does not give any result, and the background is still almost invisible. If anyone knows what's going on here, please share your thoughts.. I reduced the size of the negative images to 800x600 - the same result.

C:OpenCV2.3.1buildcommonx86opencv_haartraining.exe -data C:OpenCV2.3.1buildcommonx86haarustupi -vec C:OpenCV2.3.1buildcommonx86positive.vect -bg C:OpenCV2.3.1buildcommonx86neg.dat -npos 500 -nneg 500 -nstages 6 -nsplits 2 -w 20 -h 24 -mem 1536 -mode ALL -nonsym -minhitrate 0.999 -maxfalsealarm 0.5

after which you will receive the long-awaited xml-file, which can be loaded into the source code of the program.
As a result, the cascade learns a little and, with a large number of false positives, reacts to the image of the give way sign that I like.
But I can't achieve accurate positives, I think, because the background is clipped in positive images. And pictures as in manuals in any way do not turn out. But there is still an option to increase the number of training stages and, having loaded your computer for the whole day, wait until the cascade is more “educated”. Which is what I plan to do before other ideas pop up.

Epilogue

This is how my first HelloHabr article turned out. I look forward to your comments on the style of presentation of the material. And of course, advice on the topic.
I hope after the received advice there will be something to continue the story.

An open source computer vision and machine learning library. It includes more than 2500 algorithms, which include both classical and modern algorithms for computer vision and machine learning. This library has interfaces in various languages, including Python (we use it in this article), Java, C++, and Matlab.

Installation

You can see the installation instructions for Windows, and for Linux -.

Import and view an image

import cv2 image = cv2.imread("./path/to/image.extension") cv2.imshow("Image", image) cv2.waitKey(0) cv2.destroyAllWindows()

Note When reading in the way above, the image is in the color space not RGB (as everyone is used to), but BGR. It may not be that important in the beginning, but once you start working with color, it's worth knowing about this feature. There are 2 solutions:

  1. Swap 1st channel (R - red) with 3rd channel (B - blue), and then red will be (0,0,255) , not (255,0,0) .
  2. Change color space to RGB: rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    And then in the code, work no longer with image , but with rgb_image .

Note To close the window that displays the image, press any key. If you use the window close button, you can stumble upon freezes.

Throughout the article, the following code will be used to display images:

Import cv2 def viewImage(image, name_of_window): cv2.namedWindow(name_of_window, cv2.WINDOW_NORMAL) cv2.imshow(name_of_window, image) cv2.waitKey(0) cv2.destroyAllWindows()

framing

Dog after cropping

Import cv2 cropped = image viewImage(cropped, "Cropped dog")

Where image is image .

Change of size

After resizing by 20%

Import cv2 scale_percent = 20 # Percentage of original size width = int(img.shape * scale_percent / 100) height = int(img.shape * scale_percent / 100) dim = (width, height) resized = cv2.resize(img, dim , interpolation = cv2.INTER_AREA) viewImage(resized, "After resized by 20%")

This function takes into account the aspect ratio of the original image. See other image resizing features.

Turn

Dog after turning 180 degrees

Import cv2 (h, w, d) = image.shape center = (w // 2, h // 2) M = cv2.getRotationMatrix2D(center, 180, 1.0) rotated = cv2.warpAffine(image, M, (w , h)) viewImage(rotated, "Dog rotated 180 degrees")

image.shape returns height, width and channels. M - rotation matrix - rotates the image 180 degrees around the center. -ve is the angle of rotation of the image clockwise, and +ve , respectively, counterclockwise.

Translation in grayscale and in black and white image by threshold

Dog in grayscale

Black and white dog

Import cv2 gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) ret, threshold_image = cv2.threshold(im, 127, 255, 0) viewImage(gray_image, "Greyscale Dog") viewImage(threshold_image, "Black and White Dog ")

gray_image is the single channel version of the image.

The threshold function returns an image with all pixels darker (less than) 127 replaced by 0 and all brighter (greater than) 127 replaced by 255.

For clarity, another example:

Ret, threshold = cv2.threshold(im, 150, 200, 10)

Here everything that is darker than 150 is replaced by 10, and everything that is brighter is replaced by 200.

The rest of the threshold functions are described.

Blur/smooth

Blurred dog

Import cv2 blurred = cv2.GaussianBlur(image, (51, 51), 0) viewImage(blurred, "Blurred dog")

The GaussianBlur function (Gaussian blur) takes 3 parameters:

  1. Original image.
  2. Tuple of 2 positive odd numbers. The larger the number, the greater the strength of the smoothing.
  3. sigmaX and sigmaY. If these parameters are left equal to 0, then their value will be calculated automatically.

Drawing rectangles

Outline the muzzle of the dog with a rectangle

Import cv2 output = image.copy() cv2.rectangle(output, (2600, 800), (4100, 2400), (0, 255, 255), 10) viewImage(output, "Rectangle the dog's face")

This function takes 5 parameters:

  1. The image itself.
  2. Upper left corner coordinate (x1, y1) .
  3. Bottom right corner coordinate (x2, y2) .
  4. The color of the rectangle (GBR/RGB depending on the selected color model).
  5. The line width of the rectangle.

line drawing

2 dogs separated by a line

Import cv2 output = image.copy() cv2.line(output, (60, 20), (400, 200), (0, 0, 255), 5) viewImage(output, "2 dogs separated by a line")

The line function takes 5 parameters:

  1. The image itself on which the line is drawn.
  2. Coordinate of the first point (x1, y1) .
  3. Coordinate of the second point (x2, y2) .
  4. Line color (GBR/RGB depending on the selected color model).
  5. Line thickness.

Text on image

Image with text

Import cv2 output = image.copy() cv2.putText(output, "We<3 Dogs", (1500, 3600),cv2.FONT_HERSHEY_SIMPLEX, 15, (30, 105, 210), 40) viewImage(output, "Изображение с текстом")

The putText function takes 7 parameters:

  1. Image directly.
  2. Text for the image.
  3. Coordinate of the lower left corner of the beginning of the text (x, y) .
  4. Persons found: 2

    Import cv2 image_path = "./path/to/photo.extension" face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml") image = cv2.imread(image_path) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) faces = face_cascade .detectMultiScale(gray, scaleFactor= 1.1, minNeighbors= 5, minSize=(10, 10)) faces_detected = "Faces detected: " + format(len(faces)) print(faces_detected) # Draw squares around faces for (x, y , w, h) in faces: cv2.rectangle(image, (x, y), (x+w, y+h), (255, 255, 0), 2) viewImage(image,faces_detected)

    detectMultiScale is a generic function for both face and object recognition. In order for the function to look for exactly faces, we pass it the appropriate cascade.

    The detectMultiScale function takes 4 parameters:

    1. Processed image in grayscale.
    2. The scaleFactor parameter. Some faces may be larger than others because they are closer than others. This setting compensates for perspective.
    3. The recognition algorithm uses a sliding window during object recognition. The minNeighbors parameter specifies the number of objects around the face. That is, the greater the value of this parameter, the more similar objects the algorithm needs to determine the current object as a face. A value that is too small will increase the number of false positives, while a value that is too large will make the algorithm more demanding.
    4. minSize - directly the size of these areas.

    Contours - object recognition

    Object recognition is performed using color image segmentation. There are two functions for this: cv2.findContours and cv2.drawContours .

    This article details object detection using color segmentation. Everything you need for her is there.

    Saving an image

    import cv2 image = cv2.imread("./import/path.extension") cv2.imwrite("./export/path.extension", image)

    Conclusion

    OpenCV is a great library with lightweight algorithms that can be used in 3D rendering, advanced image and video editing, tracking and identifying objects and people in video, finding identical images from a set, and much, much more.

    This library is very important for those who develop projects related to machine learning in the field of images.

The most important sources of information about the outside world for a robot are its optical sensors and cameras. After receiving the image, it is necessary to process it to analyze the situation or make a decision. As I said earlier, computer vision combines many methods of working with images. During the operation of the robot, it is assumed that the video information from the cameras is processed by some program running on the controller. In order not to write code from scratch, you can use ready-made software solutions. At the moment, there are many ready-made computer vision libraries:

  • Matrox Imaging Library
  • Camellia Library
  • Open eVision
  • HALCON
  • libCVD
  • OpenCV
  • etc…
These SDKs can vary greatly in functionality, licensing terms, and programming languages ​​used. We will take a closer look at OpenCV. It is free for both educational purposes and commercial use. Written in optimized C/C++, supports C, C++, Python, Java interfaces and includes implementations of over 2500 algorithms. In addition to standard image processing functions (filtering, blurring, geometric transformations, etc...), this SDK allows you to solve more complex tasks, which include object detection in a photo and its “recognition”. It should be understood that the tasks of detection and recognition can be completely different:
  • search and recognition of a specific object,
  • search for objects of the same category (without recognition),
  • only object recognition (already finished image with it).
To detect features in an image and check for a match, OpenCV has the following methods:
  • Histogram of Oriented Gradients (HOG) - can be used for pedestrian detection
  • Viola-Jones algorithm - used to search for faces
  • SIFT (Scale Invariant Feature Transform) Feature Detection Algorithm
  • SURF Feature Detection Algorithm (Speeded Up Robust Features)
For example, SIFT discovers sets of points that can be used to identify an object. In addition to the above methods, OpenCV has other algorithms for detection and recognition, as well as a set of algorithms related to machine learning, such as k nearest neighbors, neural networks, support vector machines, etc. In general, OpenCV provides tools sufficient for solving the vast majority of computer vision problems. If the algorithm is not included in the SDK, it can usually be programmed without problems. In addition, there are many author's versions of algorithms written by users based on OpenCV. It should also be noted that in recent years, OpenCV has greatly expanded and become somewhat "heavyweight". In this regard, various groups of enthusiasts create "lightweight" libraries based on OpenCV. Examples: SimpleCV, liuliu ccv, tinycv… Useful sites
  1. http://opencv.org/ - Main project site
  2. http://opencv.willowgarage.com/wiki/ - Old project site with documentation on old versions

When solving computer vision problems, one cannot do without the use of specialized software. I want to introduce you to this - OpenCV - an open source library in C ++. It has a set of tools for digitizing images, further processing through numerical algorithms or a neural network.

Basic image processing algorithms: image interpretation, camera calibration by reference, optical distortion elimination, similarity detection, object movement analysis, object shape detection and object tracking, 3D reconstruction, object segmentation, gesture recognition.

You can download the library on the official website http://sourceforge.net/projects/opencvlibrary/

Structure of the OpenCV library

cxcore - core
* contains basic data structures and algorithms:
- basic operations on multidimensional numeric arrays
- matrix algebra, mathematical functions, random number generators
- Writing/restoring data structures to/from XML
- basic functions of 2D graphics

CV - Imaging and Computer Vision Module
- basic operations on images (filtering, geometric transformations, color space conversion, etc.)
- image analysis (selection of distinguishing features, morphology, search for contours, histograms)
- motion analysis, object tracking
- detection of objects, in particular faces
- camera calibration, elements of spatial structure restoration

Highgui - a module for input / output of images and videos, creating a user interface
- capturing video from cameras and video files, reading/writing static images.
- functions for organizing a simple UI (all demo applications use HighGUI)

Cvaux - experimental and deprecated features
- spaces. vision: stereo calibration, self calibration
- stereo match search, clicks in graphs
- finding and describing facial features

CvCam - video capture
- allows you to capture video from digital video cameras (support has been discontinued and this module is not available in the latest versions)


Installing OpenCV under Linux

After downloading the latest version of OpenCV from the developer's site http://sourceforge.net/projects/opencvlibrary/, you need to unpack the archive and build it using CMake version 2.6 or higher.

Installing CMake is standard:

sudo apt-get install cmake

To display OpenCV windows, you will need to install the GTK + 2.x and libgtk2.0-dev libraries

apt-get install libgtk2.0-dev

Building the library:

Tar -xjf OpenCV-2.2.0.tar.bz2 cd OpenCV-2.2.0 cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local ./ make make install

To test the installed library, you can build examples and run something:

cd samples/c/ chmod +x build_all.sh ./build_all.sh ./delaunay

If instead of a test image you see the error "error while loading shared libraries: libopencv_core.so.2.2: cannot open shared object file: No such file or directory", then this means that the program cannot find the libraries. You need to explicitly specify the path to them:

$ export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

If after that again the error:
OpenCV Error: Unspecified error (The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Carbon support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script) in cvNamedWindow, file /usr/src/OpenCV-2.2.0/modules/highgui/src/window.cpp, line 274 terminate called after throwing an instance of "cv::Exception" what(): /usr /src/OpenCV-2.2.0/modules/highgui/src/window.cpp:274: error: (-2) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Carbon support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function cvNamedWindow
So you forgot to install GTK+ 2.x: libgtk2.0-dev. Run the installation (see above).

When the installation is complete, the header files will be available in the /usr/local/include/opencv directory, and the library files will be in /usr/local/lib

Let's build the program with OpenCV:

test.cpp

// // for testing // // robocraft.ru // #include #include #include #include int main(int argc, char* argv) ( IplImage* image=0, *dst=0; // image name char filename = "Image0.jpg"; // get image image = cvLoadImage(filename, 1); printf( "[i] image: %s\n", filename); assert(image != 0); // show the image cvNamedWindow("image"); cvShowImage("image", image); // wait for the key to be pressed cvWaitKey( 0); // release resources cvReleaseImage(& image); cvReleaseImage(&dst); // remove windows cvDestroyAllWindows(); return 0; )

Makefile

CC:= g++ CFLAGS:= -I/usr/local/include/opencv -L/usr/local/lib OBJECTS:= LIBRARIES:= -lopencv_core -lopencv_imgproc -lopencv_highgui .PHONY: all clean all: test test: $(CC ) $(CFLAGS) -o test test.cpp $(LIBRARIES) clean: rm -f *.o

Starting a build with the make command.


hello world!

OpenCV is installed and ready to go. Let's write our first Hello World app!

#include #include int main(int argc, char** argv) ( // set the height and width of the image int height = 620; int width = 440; // set the point for text output CvPoint pt = cvPoint(height/4, width/2); // Create an 8-bit, 3-channel image IplImage* hw = cvCreateImage(cvSize(height, width), 8, 3); // Fill the image with black cvSet(hw,cvScalar(0,0,0)); / / font initialization CvFont font; cvInitFont(&font, CV_FONT_HERSHEY_COMPLEX,1.0, 1.0, 0, 1, CV_AA); // display text on the image using font cvPutText(hw, "OpenCV Step By Step", pt, &font, CV_RGB(150, 0, 150)); // create a window cvNamedWindow("Hello World", 0); // show an image in the created window cvShowImage("Hello World", hw); // wait for the key to be pressed cvWaitKey(0); // release resources cvReleaseImage(&hw); cvDestroyWindow("Hello World"); return 0; )

Image upload

This example will be the basis of all your OpenCV programs. We will upload an image from the Image0.jpg file to the environment

#include #include #include #include IplImage* image = 0; IplImage* src = 0; int main(int argc, char* argv) ( // the name of the image is set as the first parameter char* filename = argc == 2 ? argv : "Image0.jpg"; // get the image image = cvLoadImage(filename,1); // clone the image src = cvCloneImage(image); printf("[i] image: %s\n", filename); assert(src != 0); // window for displaying the image cvNamedWindow("original",CV_WINDOW_AUTOSIZE); / / show the image cvShowImage("original",image); // print information about the image to the console printf("[i] channels: %d\n", image->nChannels); printf("[i] pixel depth: % d bits\n", image->depth); printf("[i] width: %d pixels\n", image->width); printf("[i] height: %d pixels\n", image- >height); printf("[i] image size: %d bytes\n", image->imageSize); printf("[i] width step: %d bytes\n", image->widthStep); // wait for a key press cvWaitKey(0); // release resources cvReleaseImage(& image); cvReleaseImage(&src); // delete window cvDestroyWindow("original"); return 0; )

Supported image format types:

  • Windows bitmaps - BMP, DIB
  • JPEG files - JPEG, JPG, JPE
  • Portable Network Graphics - PNG
  • Portable image format - PBM, PGM, PPM
  • Sun rasters - SR, RAS
  • TIFF files - TIFF, TIF

To access an image, you can make the following calls:

Image->nChannels // number of image channels (RGB, although in OpenCV - BGR) (1-4) image->depth // bit depth image->width // image width in pixels image->height // image height in pixels image->imageSize // memory occupied by the image (==image->height*image->widthStep) image->widthStep // distance between vertically adjacent image points (number of bytes in one line of the image - may be required for independent bypass all pixels in the image)

Video upload

Loading a video is not much more complicated than loading an image, except that there will be a loop that iterates over frames.
The delay between frames is set to 33 milliseconds. this delay allows you to process the video stream at a standard rate of 30 frames per second.

#include #include #include #include IplImage*frame=0; int main(int argc, char* argv) ( // filename given as first parameter char* filename = argc == 2 ? argv : "test.avi"; printf("[i] file: %s\n", filename ); // window for displaying the image cvNamedWindow("original",CV_WINDOW_AUTOSIZE); // get information about the video file CvCapture* capture = cvCreateFileCapture(filename); while(1)( // get the next frame frame = cvQueryFrame(capture) ; if(!frame) ( break; ) // here you can insert // a processing procedure // show the frame cvShowImage("original", frame); char c = cvWaitKey(33); if (c == 27) ( // if ESC is pressed, exit break; ) ) // release resources cvReleaseCapture(&capture); // delete window cvDestroyWindow("original"); return 0; )

To capture video from the camera, you need to slightly modify the code - instead of the cvCreateFileCapture() function, cvCreateCameraCapture() will be used. When you press ESC, playback will be interrupted and the window will close, and when you press Enter, the current frame will be saved to a jpg file.

#include #include #include #include int main(int argc, char* argv) ( // get any connected camera CvCapture* capture = cvCreateCameraCapture(CV_CAP_ANY); //cvCaptureFromCAM(0); assert(capture); //cvSetCaptureProperty(capture, CV_CAP_PROP_FRAME_WIDTH, 640);/ /1280); //cvSetCaptureProperty(capture, CV_CAP_PROP_FRAME_HEIGHT, 480);//960); // get the frame width and height double width = cvGetCaptureProperty(capture, CV_CAP_PROP_FRAME_WIDTH); double height = cvGetCaptureProperty(capture, CV_CAP_PROP_FRAME_HEIGHT); printf("[i] %.0f x %.0f\n", width, height); IplImage*frame=0; cvNamedWindow("capture", CV_WINDOW_AUTOSIZE); printf("[i] press Enter for capture image and Esc for quit!\n\n"); int counter=0; charfilename; while(true)( // get frame frame = cvQueryFrame(capture); // show cvShowImage("capture", frame); char c = cvWaitKey(33); if (c == 27) ( // ESC is pressed break; ) else if(c == 13) ( // Enter // save the frame to a file sprintf(filename, "Image%d.jpg", counter); printf("[i] capture... %s\n", filename); cvSaveImage(filename, frame); counter++; ) ) // release resources cvReleaseCapture(&capture); cvDestroyWindow("capture"); return 0; )

OpenCV v1.0 shows and saves a picture of the minimum camera resolution of 320x240.


Pattern recognition of objects

There is a cvMatchTemplate() function for recognizing areas on the original image by a template. The function superimposes an image template on the current image and, according to the selected algorithm, searches for a correlation between them. The determination of the boundaries of the found template on the source image is performed by the cvMinMaxLoc function, and to normalize the search algorithm, cvNormalize().

// // example cvMatchTemplate() // match image with template // #include #include #include #include IplImage* image = 0; IplImage* templ = 0; int main(int argc, char* argv) ( // the image name is given as the first parameter char* filename = argc >= 2 ? argv : "Image0.jpg"; // get the image image = cvLoadImage(filename,1); printf( "[i] image: %s\n", filename); assert(image != 0); // template char* filename2 = argc >= 3 ? argv : "eye.jpg"; printf("[i] template : %s\n", filename2); templ = cvLoadImage(filename2,1); assert(templ != 0); cvNamedWindow("origianl", CV_WINDOW_AUTOSIZE); cvNamedWindow("template", CV_WINDOW_AUTOSIZE); cvNamedWindow("Match" , CV_WINDOW_AUTOSIZE); cvNamedWindow("res", CV_WINDOW_AUTOSIZE); // template size int width = templ->width; int height = templ->height; // original and template cvShowImage("origianl", image); cvShowImage(" template", templ); // image to store comparison result // result size: if image is WxH and templ is wxh, then result = (W-w+1)x(H-h+1) IplImage *res = cvCreateImage(cvSize ((image->width-templ->width+1), (image->height-templ->height+1)), IPL_DEPTH_32F, 1); // compare image with template cvMatchTemplate(image, templ, res, CV_TM_SQDIFF); // show what we got cvShowImage("res", res); // determining the best position for comparison // (search for minima and maxima on the image) double minval, maxval; CvPoint minloc, maxloc; cvMinMaxLoc(res, &minval, &maxval, &minloc, &maxloc, 0); // normalize cvNormalize(res,res,1,0,CV_MINMAX); cvNamedWindow("res norm", CV_WINDOW_AUTOSIZE); cvShowImage("res norm", res); // select the area with a rectangle cvRectangle(image, cvPoint(minloc.x, minloc.y), cvPoint(minloc.x+templ->width-1, minloc.y+templ->height-1), CV_RGB(255, 0 , 0), 1, 8); // show image cvShowImage("Match", image); // wait for the key to be pressed cvWaitKey(0); // release resources cvReleaseImage(&image); cvReleaseImage(&templ); cvReleaseImage(&res); cvDestroyAllWindows(); return 0; )

This article uses the C++ interface, FREAK, and multiple object detection. The reliability of object detection using FREAK is lower than SURF, but its operation is much faster, which allows the algorithm to be used on mobile and embedded systems. An example of work is shown in the figure:

Let's look at the source code that allows this to be achieved. The code is given in full for those who want to quickly insert it into their project.
#include #include #include #include #include #include #include #include keypointsImageTemple, keypointsImage; Mat descriptorsImageTemple, descriptorsImage; std::vector matches; // Initialization of the feature detector class, 1000 - threshold value for sifting // unimportant features SurfFeatureDetector detector(1000); // Class for FREAK features. Features comparison modes can be configured: // FREAK extractor(true, true, 22, 4, std::vector matcher; // Detection double t = (double)getTickCount(); detector.detect(ImageTemple, keypointsImageTemple); detector.detect(Image, keypointsImage); t = ((double)getTickCount() - t)/getTickFrequency(); std::out<< "detection time [s]: " << t/1.0 << std::endl; // Извлечение особенностей t = (double)getTickCount(); extractor.compute(ImageTemple, keypointsImageTemple, descriptorsImageTemple); extractor.compute(Image, keypointsImage, descriptorsImage); t = ((double)getTickCount() - t)/getTickFrequency(); std::cout << "extraction time [s]: " << t << std::endl; // Сравнение t = (double)getTickCount(); matcher.match(descriptorsImageTemple, descriptorsImage, matches); t = ((double)getTickCount() - t)/getTickFrequency(); std::cout << "matching time [s]: " << t << std::endl; // Отобразить на изображении Mat imgMatch; drawMatches(ImageTemple, keypointsImageTemple, Image, keypointsImage, matches, imgMatch); imwrite("matches.jpeg", imgMatch); std::vectorobj; std::vector scene; for(int i = 0; i< matches.size(); i++) { obj.push_back(keypointsImageTemple[ matches[i].queryIdx ].pt); scene.push_back(keypointsImage[ matches[i].trainIdx ].pt); } Mat H = findHomography(obj, scene, CV_RANSAC); std::vector Scene_corners(4); perspectiveTransform(obj_corners, scene_corners, H); //-- Draw lines between the corners (the mapped object in the scene - image_2) line(imgMatch, scene_corners + Point2f(ImageTemple.cols, 0), scene_corners + Point2f(ImageTemple.cols, 0), Scalar(0, 255 , 0), 4); line(imgMatch, scene_corners + Point2f(ImageTemple.cols, 0), scene_corners + Point2f(ImageTemple.cols, 0), Scalar(0, 255, 0), 4); line(imgMatch, scene_corners + Point2f(ImageTemple.cols, 0), scene_corners + Point2f(ImageTemple.cols, 0), Scalar(0, 255, 0), 4); line(imgMatch, scene_corners + Point2f(ImageTemple.cols, 0), scene_corners + Point2f(ImageTemple.cols, 0), Scalar(0, 255, 0), 4); imwrite("matches3.jpeg", imgMatch); return 0; )

For any features in OpenCV, you must initialize the SurfFeatureDetector class. The first action after various initializations is to detect features of detector.detect for the reference image and the scene image. After that, for each image, based on the results of the detector, FREAK features are calculated: extractor.compute.
Feature similarity comparison is done using matcher.match.
Next, there is a cycle with the formation of points from the features for both images. Based on the points, the findHomography image homography is calculated. The position and rotation of an object is calculated using the perspectiveTransform function. And then - the output to the image.
Reference image:

Scene Image:


The result is shown first.
However, this raises the question of how to calculate the optimal feature threshold: SurfFeatureDetector detector(1000);. The answer is experimental. You can get some information on this issue.
Suppose we have several objects in the image:


The result of the program will be the following:


Naturally, such a situation does not suit. In order to detect all objects, it is necessary to divide the image into several parts. However, it should be remembered here that if the image is divided into non-overlapping blocks (for example, a 100x100 image is divided into 4 blocks of 50x50 each), then a situation may arise when the object is partially located in several blocks and will not be detected. To avoid this, it is necessary to make intersecting blocks, which will slow down the work somewhat, but improve the quality (an example of a 100x100 image is divided into 9 blocks of 50x50, as shown in the example). An example program that detects multiple objects is below:
#include #include #include #include #include #include #include #include using namespace cv; int main(int argc, char** argv) ( if(argc != 3) return 1; Mat ImageTemple = imread(argv, CV_LOAD_IMAGE_GRAYSCALE); if(!ImageTemple.data) return 2; // Error Mat Image = imread( argv, CV_LOAD_IMAGE_GRAYSCALE); if(!Image.data) return 3; // Error std::vector keypointsImageTemple; Mat descriptorsImageTemple; std::vector matches; // Initialization of the feature detector class, 1000 - threshold value for sifting // unimportant features SurfFeatureDetector detector(1000); detector.detect(ImageTemple, keypointsImageTemple); int max = 3; intmaxx = 3; Mat Draw_mat = imread(argv, 1); for(int y = 0; y< maxy; y++) for(int x = 0; x < maxx; x++) { // Класс для FREAK особенностей. Можно настраивать режимы сравнения особенностей: // FREAK extractor(true, true, 22, 4, std::vector()); FREAK extractor; // Used to determine feature matches - Hamming measure BruteForceMatcher matcher; std::vector keypointsImage; Mat descriptorsImage; CvRect Rect = cvRect(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1)), 2 * (Image.cols / (maxx + 1)), 2 * ( Image.rows/(maxy+1))); MatImageROI(Image, Rect); detector.detect(ImageROI, keypointsImage); extractor.compute(ImageTemple, keypointsImageTemple, descriptorsImageTemple); extractor.compute(ImageROI, keypointsImage, descriptorsImage); matcher.match(descriptorsImageTemple, descriptorsImage, matches); // Discard too divergent values ​​for (int i = 0; i< matches.size(); i++) { if(matches[i].distance >150) ( matches.erase (matches.begin() + i); ) ) std::vector obj; std::vector scene; for(int i = 0; i< matches.size(); i++) { obj.push_back(keypointsImageTemple[ matches[i].queryIdx ].pt); scene.push_back(keypointsImage[ matches[i].trainIdx ].pt); } Mat H = findHomography(obj, scene, CV_RANSAC); std::vectorobj_corners(4); obj_corners = cvPoint(0,0); obj_corners = cvPoint(ImageTemple.cols, 0); obj_corners = cvPoint(ImageTemple.cols, ImageTemple.rows); obj_corners = cvPoint(0, ImageTemple.rows); std::vector Scene_corners(4); perspectiveTransform(obj_corners, scene_corners, H); //-- Draw lines between the corners (the mapped object in the scene - image_2) line(Draw_mat, scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), Scalar(0, 255, 0), 4) ; line(Draw_mat, scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), Scalar(0, 255, 0), 4); line(Draw_mat, scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), Scalar(0, 255, 0), 4); line(Draw_mat, scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), scene_corners + Point2f(x * (Image.cols / (maxx + 1)), y * (Image.rows / (maxy + 1))), Scalar(0, 255, 0), 4); ) imwrite("draw_mat.jpeg", Draw_mat); return 0; )

The result of the work is the following:


It can be seen that all objects are detected. And some twice (due to the fact that they hit two blocks).

Similar posts