Thứ Ba, 22 tháng 4, 2014

Part 1: Vision in Biology

    PROGRAMMING - COMPUTER VISION TUTORIAL
    Part 1: Vision in Biology
    Vision in Biology
    Vision in Biology
    So why vision in biology? What does biology have to do with robots? Well,biomimetics is the study of biology to aid in the design of new technology - such as robots. The purpose of this tutorial is so that you can understand how biology approaches the vision problem. As we progress through parts 23, and 4 you will start to draw parallels between how a robot can see the world and how you and I see the world. I will assume you have a basic understanding of biology, so I will try to build upon what you already know with a bottom->up approach, and hopefully not bore you with what you already know.
    The Eye
    The eye is stage one of the human vision system. Here is a diagram of the human eye:
    Human Eye
    Light first passes through the iris. The iris is what adjusts for the amount of light entering the eye - an auto-brightness adjuster. This is so no matter how much light the eye sees, it tries to adjust the eye to always gather a set amount. Note that if the light is still too bright, you will feel naturally compelled to cover your eyes with your hands.
    Light then passes to the lens, which is stretched and compressed by muscles to focus the image. This is similar to auto-focus on a digital camera. Notice how the lens inverts the image upside-down?
    With two eyes creates stereo vision, as they do not look in parallel straight lines. For example, look at your finger, then place your finger on your nose - see how you automatically become cross eyed? The angle of your eyes to each other generates ranging information which is then sent to your brain. Note: this however is not the only method the eyes use to generate range data.
    Cones and Rods
    The light then goes into contact with special neurons in the eye (cones for color and rods for brightness) that convert light energy to chemical energy. This process is complicated, but the end result is neurons that fire in special patterns that are sent to the brain by way of the optical nerve. Cones and Rods are the biological versions of pixels. But unlike in a camera where each pixel is equal, this is not true for the human eye.
    Rods and Cones within Eye
    What the above chart shows is the number of rods and cones in the eye vs location in the eye. At the very center of the eye (fovea = 0) you will notice a huge number of cones, and zero rods. Further out from the center the number of cones sharply decrease, with a gradual increase in rods. What does this mean? It means only the center of your eye is capable of processing color - the information from the rods going to your brain is significantly higher!
    Note the section labeled optic disk. This is where the optic nerve attaches to your eye, leaving no space left for light receptors. It is also called your blind spot.
    Compound Eyes
    Compound eyes work in the same way the human eye above works. But instead of rods and cones being the pixels, each individual compound eye acts as a pixel. Unlike popular folk-lore, the insect doesnt actually see hundreds of images. Instead it is hundreds of pixels, combined.
    Compound Eye Compound Eye Close-up
    An robot example of a compound eye would be getting a hundred photoresistorsand combining them into a matrix to form a single greyscale image.
    What advantage does a compound eye have over a human eye? If you poke a human eye out, his ability to see (total pixels gathered) drops to 50%. If you poke an insect eye out, it will still have 99% visual capability. It can also simply regrow an eye.
    Optic Nerve 'Image Processing'
    Most people dont realize how jumbled the information from the human eye really is. The image is inverted from the lens, rods and cones are not equally distributed, and neither eye sees the exact same image!
    This is where the optic nerve comes into play. By reorganizing neurons physically, it can reassemble an image to something more useful.
    Optic Nerve
    Notice how the criss-crossing reorganizes the information from the eyes - that which is seen on the left is processed in the right brain, and that which is seen on the right is processed in the left brain. The problem of two eyes seeing two different images is partially solved. Also interesting to note, there are significantly fewer neurons in the optic nerve then there are cones and rods in the eye. Theory goes that there is summing and averaging going on of 'pixels' that are in close proximity in the eye.
    What happens after this is still unknown to science, but significant progress has been made.
    Brain Processing
    This is where your brain 'magically' assembles the image into something comprehendable. Although the details are fuzzy, it has been determined that different parts of your brain process different parts of the image. One part may process color, another part detecting motion, yet another determining shape. This should give you clues to how to program such a system, in that everything can be treated as seperate subsystems/algorithms.

    And yet more Brain Processing . . .
    All of the basic visual information is gathered, and then processed again into yet a higher level. This is where the brain asks, what is it do I really see? Again, science has not entirely solved this problem (yet), but we have really good theories on what probably happens. Supposedly the brain keeps a large database of reference information - such as what a mac-n-cheese dinner looks like. The brain 'observes' something, then goes through the reference library to make conclusions on what is observed.
    Mac-n-Cheese
    How could this happen? Well, the brain knows the color should be orange, it knows it should have a shiny texture, and that the shape should be tube-like. Somehow the brain makes this connection, and tells you 'this is mac-n-cheese, yo.' Your other senses work in a similar manner.
    More specifically, the theory is about pattern recognition . . . its sorta like me showing you an ink blot, then asking you 'what do you see?' Your brain will try and figure it out, despite the fact it doesnt actually represent anything. Its a subconscious effort.
    Ink Blot Ink Blot Ink Blot
    Your brain also uses its understanding of the physical world (how things connect together in 3D space) to understand what it sees. Dont believe me? Then tell me how many legs this elephant has.
    Elephant Optical Illusion
    I highly recommend doing a google search on optical illusions. This is when the image processing rules of the brain 'break,' and is often used by scientists to figure out how we understand what we see.
    Stereo Image Processing
    What has baffled scientists for the longest time, and only recently solved (in my opinion), is what allows us to see a 2D image and yet picture it in 3D. Look at a painting of a scene, and you can immediately determine a fairly accurate measurement and distance away of every object in the picture. Scientists at CMU have recently solved how a computer can accomplish this. Basically a computer keeps a huge index of about a 1000 or so images, each with range data assigned (trained) to it. Then by probability analysis, it can make connections with future images that need to be processed.
    Here are examples of figuring out 3D from 2D.
    ALL lines that are parallel in 3D converge in 2D. This is a picture of a traintrack. Notice how the parellel lines converge to a single point? This is a method the brain uses to guestimate range data.
    Parallel lines in 3D converge in 2D
    The brain uses the relation of objects located on the 2D ground to determine 3D scenes. Here is a picture of a forest. By looking at where the trees are located on the ground, you can quickly figure out how far away the trees are located from each other. What tree is closest to the photographer? Why? How do you program that as an algorithm?
    2D Ground Referencing
    If I removed the ground reference, what then would you rely on to figure out how far each tree is from each other? The next method would probably be size comparisons. You would assume trees that are located closer would appear larger.
    Size Referencing
    But this wouldnt work if you had a giant tree far away and a tiny tree close up - as both would appear the same size! So the brain has yet many more methods, such as comparisons of details (size of leaves, for example), shading and shadows, etc. The below image is just a circle, but appears as a sphere because of shading. An algorithm that can process shading can convert 2D images to 3D.
    Shading Reference
    Now that you understand the basics of biological vision processing in our

Không có nhận xét nào:

Đăng nhận xét