Part 4: Computer Vision Algorithms for Motion
Motion Detection (Bulk Motion)
Motion detection works on the basis of frame differencing - meaning comparing how pixels (usually blobs) change location after each frame. There are two ways you can do motion detection.
The first method just looks for a bulk change in the image:
- calculate the average of a selected color in frame 1wait X seconds
calculate the average of a selected color in frame 2
if (abs(avg_frame_1 - avg_frame_2) > threshold)
then motion detected
- calculate the middle mass in frame 1wait X seconds
calculate the middle mass in frame 2
if (mm_frame_1 - mm_frame_2) > threshold)
then motion detected
The algorithm also cant handle a rotating object - an object that moves, but which has a middle mass that does not change location.
Tracking
By doing motion detection by calculating the motion of the middle mass, you can run more advanced algorithms such as tracking. By doing vector math, and knowing the pixel to distance ratio, one may calculate the displacement, velocity, and acceleration of a moving blob.
Here is an example on how to calculate speed of a car:
- calculate the middle mass in frame 1wait X seconds
calculate the middle mass in frame 2
speed = (mm_frame_1 - mm_frame_2) * distance / per_pixel
The major issue with this algorithm is determining the distance to pixel ratio. If your camera is at an angle to the horizon (not looking overhead and pointing straight down), or your camera experiences the lens effect (all cameras do, to some extent), then you need to write a separate algorithm that maps this ratio for a given pixel located at X and Y position.
The below image is an exagerated lens effect, with pixels further down the trail equaling a greater distance than the pixels closer to the camera.
This Mars Rover camera image is a good example of the lens effect:
Lens radial distortion can be modelled by the following equations:
- x_actual = xd * (1 + distortion_constant * (xd^2 + yd^2))
y_actual = yd * (1 + distortion_constant * (xd^2 + yd^2))
Cross over is the other major problem. This is when multiple objects cross over each other (ie one blob passes behind another blob) and the algorithm gets confused which blob is which. For an example, here is a video showing the problem. Notice how the algorithm gets confused as the man goes behind the tree, or crosses over another tracked object? The algorithm must remember a decent number of features of each tracked object for crossovers to work.
(video is not ours)
Optical Flow
This computer vision method completely ignores and has zero interest in identifying observed objects. It works by analyzing the bulk/individual motion of pixels. It is useful for tracking, 3D analysis, altitude measurement, and velocity measurement. This method has the advantage that it can work with low resolution cameras, while the more simple algorithms require minimal processing power.
Optical flow is a vector field that shows the direction and magnitude of these intensity changes from one image to the other, as shown here:
Applications for Optical Flow
Altitude Measurement (for constant speed)
Ever notice when traveling by plane, the higher you are the slower the ground below you seems to move? For aeriel robots that have a known constant speed, by analyzing pixel velocity from a downward facing camera the altitude can be calculated. The slower the pixels travel, the higher the robot. A potential problem however is when your robot rotates in the air, but this can be accounted for by adding additional sensors like gyros and accelerometers.
Velocity Measurement (for constant altitude)
For a robot that is traveling at some known altitude, by analyzing pixel velocity, the robot velocity can be calculated. This is the converse of the altitude measurement method. It is impossible to gather both altitude and velocity data simultaneously using only optical flow, so a second sensor (such as GPS or an altimeter) needs to be used. If however your robot was an RC car, the altitude is already known (probably an inch above the ground). Velocity can then be calculated using optical flow with no other sensors. Optical flow can be used to directly compute time to impact for missles. Optical flow also is a technique often used by insects to gauge flight speed and direction.
Tracking
Please see tracking above, and background subtraction below. The optical flow method of tracking combines both of those methods together. By removing the background, all that needs to be done is analyze the motion of the moving pixels.
3D Scene Analysis
By analyzing motion of all pixels, it is possible to generate rough 3D measurements of the observed scene. For example, the below image of the subway train: the pixels on the far left are moving fast, and they are both converging and slowing down towards the center of the image. With this information, 3D information of the train can be calculated (including velocity of train, and angle of the track).
Problems with optical flow . . .
Generally, optical flow corresponds to the motion field, but not always. For example, the motion field and optical flow of a rotating barber's pole are different:
Although it is only rotating about the z-axis, optical flow will say the red bars are moving upwards in the z-axis. Obviously, assumptions need to be made of the expected observed objects for this to work properly.
Accounting for multiple objects gets really complicated . . . especially if they cross each other . . .
And lastly, the equations get yet more complicated when you track not just linear motion of pixels, but rotational motion as well. With optical flow, how do you tell if the center point of this ferris wheel is connected to the outer half?
Background Subtraction
Background subtraction is the method of removing pixels that do not move, focusing only on objects that do. The method works like this:
Here is an example of a guy moving with a static background. Some pixels did not appear to change when he moved, resulting in error:
The problem with this method as above is that if the object stops moving, then it becomes invisible. If my hand moves, but my body doesnt, all you see is a moving hand. There is also the chance that although something is moving, not all the individual pixels change color because the object is of a uniform color. To correct for this, this algorithm must be combined with other algorithms such as edge detection and blob finding, to make sure all pixels within a moving boundary arent discarded.
There is one other form of background subtraction called blue-screening (or green-screening, or chroma-key). What you do is physically replace the background with a solid color - a big green curtain (called a chroma-key) typically works best. Then the computer replaces all pixels of that color with pixels from another scene. This technique is commonly used for weather anchor people, and is why they never wear green ties =P
This blue-screening method is more a machine vision technique, as it will not work in everyday situations - only in studios with expert lighting.Here is a video of my ERP that I made using chroma key. If you look carefully, you'll see various chroma key artifacts as I didn't put much effort into getting it perfect. I used Sony Vegas Movie Studio to make the video.
Feature Tracking
A feature is a specific identified point in the image that a tracking algorithm can lock onto and follow through multiple frames. Often features are selected because they are bright/dark spots, edges or corners - depending on the particular tracking algorithm. Template matching is also quite common. What is important is that each feature represents a specific point on the surface of a real object. As a feature is tracked it becomes a series of two-dimensional coordinates that represent the position of the feature across a series of frames. This series is referred to as atrack. Once tracks have been created they can be used immediately for 2D motion tracking, or then be used to calculate 3D information.
(for a realplayer streaming video example of feature tracking, click the image)
Visual Servoing
Visual servoing is a method of using video data to determine position data of your robot. For example, your robot sees a door and wants to go through it. Visual servoing will allow the front of your robot to align itself with the door and pass through. If your robot wanted to pick something up, it can use visual servoing to move the arm to that location. To drive down a road, visual servoing would track the road with respect to the robots heading.
To do visual servoing, first you need to use the vision processing methods listed in this tutorial to locate the object. Then your robot needs to decide how to orient itself to reach that location using some type of PID loop - the error being the distance between where the robot wants to be, and where it sees it is.
If you would like to learn more about robot arms for use in visual servoing, see myrobot arms tutorial.
Practice What You Learned
These three below images are made from sonar capable of generating a 2D mapped field of an underwater scene with fish (for fisheries counting). Since the data is stored in a similar way to data from a camera, vision algorithms can be applied.
(scene 1, scene 2, and scene 3)
So here is your challenge:
What two different algorithms can acheive the change from scene 1 to scene 2 (hint: scene 2 only shows moving fish)?
Name the algorithm that can acheive the change from scene 2 to scene 3 (hint: color is made binary)?
What algorithm allows finding the location of the fish in the scene?
If in scene two we were to identify the types of fish, what three different algorithmsmight work?
answers are at the bottom of this page
Downloadable Software (not affiliated with SoR)
For those interested in vision code for the hacking, here is a great source forcomputer vision source code.
|
Không có nhận xét nào:
Đăng nhận xét