Skip to main content

Blog

Go Search
Home
  

Categories
Work
Personal
Other
Other Blogs
There are no items in this list.
Blog of Rick Taylor, containing my thoughts, comments and questions.
Parallel Processing – Some Notes

I have been doing some reading and researching into parallel processing recently. Massive parallelism has always been an interest to me, and has in fact been something I’ve played around with for some time now, resulting in such geekery as building a small redhat linux based Beowulf cluster in my apartment some years ago. Currently, all the computers in my house participate in an Alchemi cluster, which is a .Net-centric way of cluster programming/grid computing and well worth looking into. It is in many respects similar to Digipede (in fact, many things are directly portable), but is open source. It’s quite a fun toy and very powerful.

 

Some of the things I have been reading up on are classical ways of looking at massive parallelism, such as Microsoft’s HPC initiative. Some other things are new and pretty exciting, such as the MS Accelerator project, ParallelFX, and the stuff Google is doing with MapReduce. There is a lot of information out there, and this subject is deep as well as wide, so I’ll try to present this as pointers to where you can find more information, with some notes as to what I found to be cool on each bit.

 

Traditional Approaches

 

On the more traditional side of things, there is Windows HPC Server. Windows HPC Server consists of Windows Server running MS Compute Cluster Server, which is additions to Windows Server 2003, and (soon) Windows Server 2008 which allow Windows servers to be used to create traditional MPI based clusters, based on the MPICH2 standard, with some optimizations and bug fixes. In most cases, scripts are directly portable to Microsoft’s MPI implementation, and programs should work with a recompile. There are some restrictions, however, such as being restricted to a single head node. Otherwise, topologies should be directly implementable. A good (if high level) overview of the programming environment is available by reading the Developer Tools Datasheet, which also gives some overview of some of the newer and less traditional technologies being researched (or already available) from Microsoft. Also, on MPI, for anyone interested in the subject, I can highly recommend the book Using MPI, available from MIT Press.

 

GPGPU (General Purpose GPU) Programming

 

So, you know that graphics card you have in your computer? It turns out it’s a pretty powerful processor in its own right. That’s pretty obvious, but, what might not be so obvious is that that power is usable from programs you can write. There are restrictions, and programming to graphics cards is in quite a lot of ways not standardized. For instance, NVidia provides a toolkit called CUDA, which allows you to do general purpose programming on their (newer) graphics cards. Similarily, ATI provides its CTM (Close To the Metal) toolkit. Neither are compatible with each other, and both provide a low-level way of using the graphics processor to do general purpose processing.

 

There are limitations to what you can do with a GPU. For instance, branching is generally unavailable. Yep, you don’t get an “if” statement. No case/switch statement, either. Looping? Nah. Also, programming to these devices is a different paradigm than what you may be used to. GPUs use what is called data parallelism, which means that you generally have a large array of data, and one instruction that you are going to apply to the whole set (this used to be referred to as SIMD processing – Single Instruction, Multiple Data). Programming in this way allows the GPU to split the data up, and, since GPUs generally have many subprocessors (albeit less powerful than the one(s) you might find in your CPU), allows the requested instruction to be run in parallel on many subprocessors. Another restriction is that the data arrays are immutable. You perform an instruction on a data array, and get another back, but the original never changes. Note the tie-in here to functional programming and functional languages such as LISP, Scheme, and, more interesting to me at this point, F#.

 

One cool thing that I have found recently on the subject of GPGPU programming is the Accelerator project from Microsoft Research. What it does is simplify the process of programming to a GPU by abstracting the underlying implementations (such as CUDA and CTM), and allowing you to program in a higher level language, such as C#.

 

So why would you do GPGPU programming? Because it’s incredibly fast! Check out the Game of Life sample in the Accelerator project. It takes a hit because it must translate from C# down to CUDA (in my case), and ship everything off to the GPU for processing, but, even with that hit, on my machine here, it is quite easily 16 to 19 times faster than the Life implementation I did in F#. An order of magnitude and then some! Very, very impressive.

 

Here is a MS Channel 9 video on the Accelerator project, with David Tarditi and Sidd Puri, who are researchers on the project. Very cool.

 

Distributed, Redundant, and Functional

 

Scatter, gather, map, and reduce. These are terms that are used with distributed, parallel, and functional programming. Because you are programming in a functional manner, and will not change the original data (ie, your functions do not have side effects), your data and its accompanying algorithm can be distributed (scattered) to an arbitrary number of processors. Your function can then be run (mapped) across all that data. The results can then be gathered from those multiple processors, and, if need be, the multiple results can be reduced into a smaller result set. This again is the SIMD approach mentioned in the section on GPGPU programming, and is a common theme across related technologies such as functionally based languages.

 

Add one more bit into the mix here: Computers fail. All the time. In fact, you could consider that, in any given set of compute resources, a certain number will be down at any given time, for hardware reasons, software reasons, or just that Billy tripped over the power cord. It happens. In fact, it happens all the time, and the systems in this section assume that it is the natural state of hardware to have failures. These systems, because they take a functional approach, can move processing tasks from one set of hardware to another, and continue with working hardware when (not “if”!) some compute node or set of nodes fails.

 

There are several systems that act in this manner, most notably at this time Google’s MapReduce, Hadoop (2), and Microsoft’s Dryad research project. Each are massively parallel and redundant systems capable of automatic failure recovery, geared towards functional programming. MapReduce is what Google uses to do its searches (among other processing tasks). An overview and tutorial is available from Google here. Hadoop is an open source implementation of the MapReduce concept, written in Java, which you can download and install on COTS hardware. It is available for Linux and for Windows, but only with Windows as a development platform. More information and a tutorial can be found here. Dryad is a similar technology available from Microsoft Research, but also has tie-ins to LINQ, allowing this technology to be more readily accessible to programmers in general, and to make this type of technology, which traditionally has been used for research, mapping the human genome, secret DoD projects and the like (check out the Top500 list), to be more readily accessible to less ivory-tower uses of computing horsepower at this scale.

 

Our New Robotic Overlords

 

Another area that is producing interesting and quite capable results in the realm of parallel processing, and making that level of computing power available to the average programmer is MS Robotics and the CCR (Concurrency and Coordination Runtime). A very interesting set of videos on what the CCR is and what you can do with it can be found here and here. Basically, robots need a lot of things to happen all at one time. There needs to be a good, robust, and easy to code to way to manage all these concurrent threads, without forcing the end programmer to build thread management, processor affinity management, and other related issues into their code – there should be a way to make this easy, especially since many- and multicore machines are becoming more commonplace. The CCR provides that functionality, and is finding uses outside the realm of robotics.

 

 

 

ParallelFX

LINQ is without a doubt a powerful addition to the .Net framework. With it, you can query over disparate data types, perform reductions, projections, and all the other neat stuff normally associated with database engines of various stripes. But, in order to get to that type of ability, certain other aspects needed to be added into the framework, such as the ability to construct lambda expressions and the ability to create queryable groupings of data. All this has added some decidedly functional aspects to some largely imperative and object oriented languages. Because of this, and, if you write your code so that anything you use inside a LINQ query adheres to some of the tenets of functional programming (that is, no side effects),  LINQ and its related functionality naturally proceeds to a state where parallelism is the obvious next step.

 

Enter ParallelFX. ParallelFX adds several things to the .Net framework, such as parallelism in queries, and loop parallelism. For instance, assuming you have written a LINQ query in a way where it can take advantage of it, PLINQ (Parallel LINQ) can split your query across available processors in your machine, speeding up the query significantly. Also, even if your machines are not multi-core right now, if you write your software in a way where it can take advantage of parallelism, PLINQ will automatically take advantage of multiple cores if they become available, or run without change on a single core machine if that is all it has.

 

Changing your queries is simple. The AsParallel() extension method provides the tie in to PLINQ. For example, a slight variation on some of the code in my A* in LINQ example from a while back might look like :

 

var neighborsQuery = from n in currentNode.Neighbors.AsParallel()

                     where n.NodeState == NodeState.Undetermined

                     select n;

 

ParallelFX also adds in the TPL (Task Parallel Library), which allows you to parallelize loops. For instance, a for each loop like this one

 

foreach (Node<Point> node in domain)

{

   GetNeighbors(map, node, domain);

}

 

Becomes a statement like this

 

Parallel.ForEach(domain, node => GetNeighbors(map, node, domain));

 

- where the lambda expression describes the function that will be mapped over the available data. If possible, the TPL will scatter the data and the function across available processors and cores. If there is only one available, it runs just like the original foreach.

 

ParallelFX can be downloaded here.

 

 

Going Deep

 

Aside from those specific subjects, I have discovered basically a gold mine of information on parallelism in the Going Deep series on Channel 9 (there’s also a lot, lot more neat stuff there than just the parallel programming stuff. It’s well worth checking out). Of particular interest have been the videos involving Erik Meijer (language designer and researcher at Microsoft), Bertrand Meyer (father of the Eiffel language, among many other things), and Anders Hejlsberg (chief architect over C#, and also such languages as Delphi and Turbo Pascal):

 

Anders and Joe Duffy on ParallelFX – very interesting talk on the subject. Anders makes a sideways reference to Gödel numbers J

 

Erik Meijer on Functional Programming – Erik describes F#, contrasts with Haskell, discusses functional programming in C#

 

Erik Meijer, Bertrand Meyer, and Yuri Gurevich discuss concurrency and use some classic problems (such as the sleeping barber problem) as illustrations

 

Burton Smith on General Purpose Supercomputing

 

Dan Reed on Cloud Computing

 

 

Exploring F#  with Conway’s Game of Life

The other day, I asked around here at Vertigo to see if anyone understood some new concepts that I had run across, those being monads and continuations. These are both concepts that you find in more functional languages such as Lisp, Scheme, Haskell, and so forth, but are also finding their way into more mainstream languages – notably in LINQ, due to its relationship to more functional ways of programming (such as its use of lambdas). We have a good environment here for asking questions like that, and the conversation soon evolved into other areas. One subject that came up was that of F#, which is a language that has come out of Microsoft Research. I had heard of the language before, but had not really explored it. I happened to have a little bit of bandwidth over the last few days, so, I thought I would do a little spelunking, and learn something new. Here's what I found J

So what's this F# thing, anyway?

What I found out is that F# is a programming language based on ML and OCaml. These are both languages with a high degree of functional programming aspects to them (which is not the opposite of dysfunctional programming), but which, unlike more 'pure' functional languages such as Scheme or Haskell, allows for more imperative styles of programming than those do – structures such as for and while loops exist, and look much as you would expect.

Unless you specify otherwise, everything in F# is immutable, much like the string construct in C#. This extends to areas that you might not expect. For instance, once you set an int to some specific value, that's it – you can't change it, unless you have marked it as mutable. The reason for this is because the language is primarily a functional one. Programs in functional languages contain functions which return values, which are then further used, etc – but each value or set of values is its own entity, and to change it while within a function is to produce a side effect, something undesirable in a functional language. In the strictest sense, functions return values but do not alter their parameters, or the outside world (which, incidentally, gives rise to the monad pattern mentioned earlier). F# follows these rules of functional programming, but also allows you to break those rules with the mutable keyword.

In F#, indentation is significant. For instance, take this bit of code out of my sample project :

if(i>0)then

   newRect.up <- rectsArray.[i-1,j].rect 

   rectsArray.[i-1,j].down <- newRect.rect

 

In that sample, everything indented under the "if" is part of the if statement's block. (note: this is using the #light directive, which imposes these rules)

F# uses type inference for most things. For instance, given a statement such as

let gridSize = 45

 

the compiler determines by its use that gridSize has to be an integer. This is not a variant from the old days, but instead is the rough equivalent of the var keyword in C# - the language is typesafe, but most often types are inferred. You can, if need be, specify the type explicitly, such as :

let redBrush : SolidColorBrush = new SolidColorBrush(Colors.Red)

but F# seems to do a pretty good job of figuring things out.

F# also makes extensive use of lists, which are not the generic lists you would find in the .Net framework proper. Instead, they bear more of a resemblance to lists in LISP. A short discussion of what lists in F# are and what they can be used for can be found here.

A Sample Application

I kicked around several ideas for an application to build and learn in, but eventually settled on making a version of Conway's Game of Life. Because F# is functional and because of that potentially can execute each of the cells in parallel, it seemed like a good match. Unfortunately, because of the way I wrote the program, and because I needed to update a user interface (producing a side effect!), I am instead iterating over each one of the cells. Maybe in a future version, instead of using the List.iter function to do the heavy lifting, I can instead write it in a way where I can use List.map, which implements function mapping, and, if we also do things such as stick to non-mutable values, opens up a program to parallelism. The main gist was to get my head around the language, and I'm definitely still learning! In some respects I approached the true spirit of the language, but a fair bit of it is still written in an imperative style. Maybe phase two.

I also wanted to hone my WPF skills a little. I wanted to originally write this with a Silverlight front end, but, alas, F# support for Silverlight is not there yet. I also wanted to see if I could write it in such a way that everything is backended in F# with a WPF front end. This is doable, but most examples use F# code to generate WPF objects, which is not what I wanted. I wanted to be able to design in XAML and have the code in F#. In order to do this, I had to create a C# shim application, which basically starts up and then as soon as possible hands over control to the F# code.

Give my creature Life!

You've probably seen versions of the game of life before, but the main idea is, depending on how many of its neighbors are active, a cell is either turned on (that is, 'born', turned off ('dies'), or remains the same. If a cell has two active neighbors, it stays the same. With three active neighbors, it turns on. Otherwise, it turns off. Pretty simple rules, but the patterns which emerge can be complex. One other note about Life is that it is one of the simpler forms of cellular automata.

A sample Life pattern

Here are some interesting patterns for the Life game.

This seed that produces a pulsar, which is a relatively complex shape that pulses between about four states:

The pulsar in one of its final forms :

This is an f-pentomino, a seed that does not stabilize in a short period of time, and produces some interesting results :

 

A "methuselah", which is a pattern that runs for an extended period of time before stabilizing:

After a short period of time, the above pattern grows to:

 

The code for this blog entry can be downloaded here. Enjoy! And, as this is definitely a new skill set for me, please feel free to tear the code apart, tell me how to do it better, etc.

 

Other useful links I found :

The F# Language specification

F# in 20 minutes – contains installation instructions

Bending Cats using F#

a notepad implementation in F#/WPF

ACO (ant Colony Optimization) in F#

a blog about F# and WPF, lots of useful information

lots of information on F#

F# lists and how to use them

 

Fun links :

"Ook", an example of a dysfunctional language (as opposed to functional)

an Ook compiler for .Net

hello world in Ook

Blog post about the task parallel library and F#

 

 

 

 

 

Computer vision processing in .Net, part II

In which we use AForge.Net to process images, find regions, and then use parallax techniques to determine range to those objects and their size.

In my last blog post, I discussed how you might be able to determine distance to objects by using two web cams. There are several ways that you might go about doing this, using such techniques as parallax, standard candle, and adjusting the camera’s focal length. The easiest way to do this was to get two cameras and use parallax to determine the distance between two objects as perceived by the cameras, and then doing some math to determine where that object is in three dimensional space. Well, two dimensional, anyway – currently what I have isn’t concerned with the y axis. It also is not interested in object recognition, although that is an interesting field in itself, and there are tangential connections.

Before we proceed into how to determine where objects are using parallax and two web cams, I have run across some other interesting things that are related. First, it is possible to do depth perception with just one camera. Pigeons do it, and so do a lot of other animals which don’t have their eyes pointing in the same direction. When pigeons bob their heads, or when praying mantises sway from side to side, they are using optical flow to determine where objects are. Some interesting research being done using this technique is using the AIBO, Sony’s robotic dog. The AIBO only has one camera, but it is very likely possible to use optical flow to determine where objects are on the AIBO platform. Information on a project researching just that can be found here. Oh and one more thing about pigeon sight - in humans, persistence of vision makes it so that if something refreshes above a rate of, say, 60 to 70 hertz, we do not see individual frames, but instead see a steady image. Not so with a pigeon. They can see individual frames up to a refresh rate of about 100 hz (see The Flicker Fusion Factor, which also claims humans are ill equipped to drive cars. Assumedly pigeons might do all right). That screen that you’re looking at right now? Totally flickering up a storm for pigeons.

Of other interest is object recognition, which I do not know much about yet. But, an interesting subject I ran across is that of geons, which in this context are geometric primitives. The idea with geons is that there are basic geometric shapes which the brain perceives, and then relates to one another in order to make a map of what it perceives optically (and supposedly by other means as well). To me, this sounds logical, although I’d like to see better proof, but it would fall into some of the basic ideas put forth in Artificial Intelligence: A Modern Approach, in which the authors talk about simplification of the problem domain in order to solve complex problems.

One other thing I have been reading about lately is Ant Colony Optimization – because the world needs more efficient  ant colonies! ACO is basically a highly parallelized search algorithm using multiple agents in order to find optima in n-ary space. (What? Was that English?) In other words, that’s “ants crawl around randomly until they find the goal, and then other ants follow”. The beauty is that in a situation in which there is more than one possible solution, or when the goal changes or the solution changes, ACO can come up with a solution, and keep coming up with solutions when one of the known solutions fails. Here’s an example, written in VB. It’s really kind of fun to watch.

One other thing on the cool & fun list – I have been poking around in a book on Google Books called The Computational Beauty of Nature, which is currently on my Amazon wish list. I may buy it, but then again I may soon have access to a kindle, in which case I may save a tree and just kindle it instead.

 

Cameras

A word on the hardware. The cameras I had picked up at WierdStuff in Sunnyvale Did Not Work Well With Others. They were old, and the driver – of which only an open source driver exists at this time – was incapable of supporting more than one camera on a single machine. Ok for a lot of setups, but not this one. So I broke down and got a couple of Lifecams, which worked perfectly, pretty much right out of the box.

AForge. Net

For this project, I am using a library called AForge.Net for (among other things) processing images, detecting motion, doing edge detection, and so forth. It is open source, written in C#, and has other libraries for such things as genetic algorithms, neural nets, and so forth. It’s a pretty interesting project to poke through. For efficiency, a fair bit of the bitmap-centric code seems to use unsafe code and pointers, but that’s ok, because it’s pretty fast for most things. There are also multiple write-ups on codeplex by one Andrew Kirillov, who is also the project owner for AForge. In order to process the images I am receiving from the two web cams (See my previous post), I am using the AForge libraries.

Steps to depth perception

After we have acquired two images from our cameras, there are several steps we need to take in order to determine where objects are in our images. One of the more common techniques in this process is to use binarization in order to create a black and white image from the source image, where each pixel is either black, or white. Most black and white images are grayscale images, where each pixel represents a sliding scale. When a picture is binarized, each pixel is converted to either an on or off state, depending on a threshold value. Over the threshold, it’s on. Under, it’s off. The piece of code in AForge.Net that determines where blobs are requires this, and it is a common step to take in computer vision systems (see also Computer Vision For Robotics:An Introduction). It allows images to be more easily manipulated in code, and is also another example of simplifying the problem domain in order to derive a solution. Once we have binarized the image, we can then search for objects in the images and compare them.

Another technique, which is a bit more processor intensive, is to use an edge detection algorithm to boil down the images into more succinct representations, and then do basically the same process as before.

AForge.Net has classes which allow us to do all of these. In my example, which you can download below, I have an option to use either of these techniques. The second, while potentially more accurate, is slower, but, because of my target hardware (an 800mhz Via C3 based machine that used to be a thin client terminal but now runs XP), is not necessarily suitable for its target environment.

Image Preparation

AForge.Net provides a wide array of image filters - classes which alter an image in some way. For instance, some might change the brightness or contrast, sharpening, and so forth. Many of these might be useful in what we are doing here, and in fact I have played with a few of them to varying effect. Some produce better results than others, and some produce good results, but at the expense of speed. Some of the ones I have found that work well are Threshold (which I use in this example), Pixellate, and CannyEdgeDetector. Others have proven interesting, such as ContrastCorrection, Dilatation, and CornersMarker, but after much experimentation I came up with two basic filter sequences I used :

Grayscale => Threshold => Invert, which is fast but sometimes inaccurate, and

Grayscale => CannyEdgeDetector, which is slower but sometimes more accurate

To see the code that does this preparation, see PrepareImage() in the accompanying code. Also, at the top of MainForm.cs, there is some commented out code for other filters that were interesting or useful.

 Once we have prepared the images, we can pass them to a detector of some sort in order to determine where objects are in the images. Aforge.Net provides several classes which do object detection of different kinds. For instance, there is an assortment of motion detectors (see here and here for examples) which can outline movement regions, and, in some cases, return an array of rectangles indicating where these motion areas are.

Put It Down, Flip It and Reverse It

For my first attempts at this, what I did was to use the CountingMotionDetector class, which outlines motion areas using rectangles, but also allows you to retrieve a collection of those rectangles. The thought was to get the rectangles from the images, flip the order that the images were processed in, reprocess and get the rectangles from the reversed order, and then calculate distances using the rectangles. Didn’t work. Unfortunately, the CountingMotionDetector class was too smart for me, and, no matter if the images were reversed in processing order or not, it always returned the rectangles at exactly the same positions. What I ended up doing, instead, is using the BlobCounter class to get rectangles from the individual images, and then match those rectangles with their counterparts. This is actually advantageous, because the BlobCounter class is lower level and faster. To see how the BlobCounter class works, see the GetRectangles() method in the accompanying code.

Matching Rectangles

Once we have rectangles in hand, the next step is to match rectangles, and then to do the math to determine distance. This is not as straightforward as it seems, because rectangles overlap, are not of the same size (but are of similar sizes) in the two images, there’s multiple possible matches, etc. What I did was use a bit of probability in order to determine which rectangles are the best matches for one another. Basically I scored the rectangles on how well they match one another, and then ran a LINQ query to determine the best matches, and bind these to one another.  The rectangles are scored on several criteria, such as how similar their areas are, how similar their widths and heights are, and so forth. Each score is a value between 0 and 1, inclusive, which I then add together to come up with a final score. This final score is obviously not in the range of 0 to 1, and therefore isn’t strictly in the realm of what we would classically think of probability, but, it does bubble up the highest ranking match to the top, which is what we need. Also note – this type of probabilistic reasoning, where instead of strict Boolean values we have ranges from 0 to 1, is a beginning point into fuzzy logic and how fuzzy reasoning works. In essence we are using a bit of fuzzy logic to match the rectangles – and using LINQ to do it! To see how all this works, check out the GetSimilarRectangles() and GetBestMatch() methods in MainForm.cs. There you will see LINQ queries to determine what the rank of matches are and how to get the top match. Look in GetNearScore() to see where the probabilistic logic takes place – and where I strayed from it! (For instance, the result of subtracting the abs() of the two y values is out of range for a canonically fuzzy value). You will also see some code that is commented out in there.  While those work well, I have found that the ones I left in are adequate and in general provide good results. The others I left in by way of example.

Doing the Math

From here, what we need to do is some crunching of numbers. The code that does this is in GetDistanceInfo(). An example of the kind of calculation we want to do can be found here. Basically what it does is calculate distance and angle, using the difference between the x values of the two rectangles, and the distance that those two rectangles are from the center line of one of our images. This gives us polar coordinates, with the cameras placed at the origin point. From there, we can manipulate the data as we please, for example plotting locations on a Cartesian graph (which is essentially what the example is doing). The calculation itself uses the disparity between the two objects (the perceived  x distance between them),the camera’s angle of view, and the camera’s focal length to determine polar coordinates for the perceived object. A discussion of this can be found here. From here, by way of example, I display the values on a graphic that looks like a radar screen.

Other interesting things I found along the way

Iplab_demo is an interesting example of the image processing capabilities of the AForge.Net library. It was very useful in determining what filter sequences worked well in the blob-finding process.

XDMessaging is a library which uses the windows message pump to communicate between processes/AppDomains on a single machine. It requires no setup and is relatively performant. I’m using it in the non-demo version of this code to communicate the vision info back to a consuming application.

An interesting and easy to digest page on stereopsis can be found here.

But most of all The Ants! You gotta check out the ants.

The Code

To play with the code for this blog post, download it here. While it is a demo and should work fine standalone, please consider it to be a work in progress. There are many things in it that could be improved upon. Also I used many references and examples to get this together. All cites and references should be intact in this document and in the code. If I missed anyone, please let me know and I will add ex post haste. Enjoy!

Computer vision processing in .Net

In my spare time, I have been playing around with vision processing. Basically, I have an ongoing robotics project I’m fiddling with off and on. What I need to figure out at this stage is how to do rangefinding, that is, finding the distance to objects near the robot. There are many options available, such as sonar, laser, and so forth. But, what I decided on doing, for various reasons, is to play with stereoscopic vision. The main reasons are that web cams are easy to find, and you can find older ones for a bargain. If, I thought, I could get two matching web cams, I may be able to use them to analyze images and determine where objects are and the distance to them. This is an ongoing project, and is far from complete (and with no guarantee of success!), but, I can report on some interesting things I have learned so far.

**Warning** I am far from an expert on any of this subject matter. Pretty much all of it is new to me, but I’m having fun exploring! Also, this describes a work in progress. If you see errors please feel free to point them out, I’ll certainly appreciate it!

How it works

I’m no expert on vision, but it seemed to me that with a little bit of trig and some known values (such as the distance between two cameras), then, assuming that you could somehow determine where objects were in images from those cameras, then you could do a little math and figure out where objects are in 3D space. After a little reading, it turns out that there are indeed several ways that this could be done.

Parallax

One of the ways that you can tell distance to an object by using imaging is to take two images at different locations, and then compare the two images in order to determine how far away they are. Objects in the images will shift depending on how far away they are – nearer images will shift more than ones that are far away. This is the method that astronomers use to determine distances to celestial objects, such as stars and galaxies. It is also one of the ways that we as humans use to determine how far away things are in our environment. For a fuller explanation of what parallax is and how it works, see the Wikipedia article. There is also a very good discussion of this and subjects related to it here.

So, it turns out that yes, we certainly can do what I stated above, in that if we can examine two images and figure out the positions of objects, we can then measure how much they have changed from one picture to another, and determine distance. The trick will be determining where objects are in the images. This may be possible, though, by processing the image using edge detection of some sort.

Standard Candle

Another way that we may possibly figure out how far away an object is, is by examining its luminosity. Photons radiating out from an object tend to dissipate over distance, for many reasons, such as hitting molecules in the atmosphere, the fact that they are radiating in an expanding sphere from a central point, and so forth. This leads to another way astronomers and people who use photogrammetry determine distance to objects. For instance, astronomers use red giants and supernovae as standard sources of light (standard candles) If we can determine what a standard source of light in our image is, and then compare other objects in the image to that known value, we may be able to determine distance. An advantage of this method is that we can do it with one camera. A disadvantage is that not only do we have to find objects in our image as before, but we also have to recognize what that object is in order to determine what its luminosity should be if it were at a standard distance. This requires knowledge of the problem domain that we may not necessarily be able to provide, and in fact probably cannot. Remember, we are after range finding here, not object recognition.

So standard candle is very likely out. It would require us to recognize objects to some degree, and also have knowledge of any environment that we may find ourselves in. That’s a big database and hefty processing power – and if we’re going there, buying a laser range finder at Lowe’s and hacking it might be an easier solution than all this vision stuff.

Focus

Another method we may be able to use is by using properties of the camera itself. Basically, if we use the focus of the camera and know what the focal point of the camera is at any given time, anything that is in focus is at the specified range. For instance, if we know the camera is focused at five feet, anything that is in focus is five feet away from us. This may work. The only problem is, is that the cameras I currently have cannot be programmatically focused. I would have to rig up a servo or a stepper motor and then alter the focus in that way. Doable, and probably fairly accurate if we can find edges correctly, but it means buying hardware and soldering things J.

So, parallax it is, for the moment, with the possibility of using the focus technique later, and probably not using standard candle techniques.

WIA and DirectShow

The first thing we need to do in order to make this happen is to get images from two cameras. The cameras will be mounted at a known distance from each other, be identical in make, and be calibrated to point in as close to the same direction as possible. A quick trip to WierdStuff netted me some cheap $5 web cams to play with. Unfortunately, there were some lessons learned here. WIA, (2) the Microsoft architecture for acquiring images from imaging devices such as scanners and web cams, proved to have some shortcomings that were insurmountable with the cameras I chose. Specifically, it appears that calls to capDriverConnect can only connect one imaging device at a time. That won’t do – we want both cameras up to speed and streaming video at the same time. Turning them on and connecting to them is a time consuming process. We will be unlikely to get realtime processing, with all this, but waiting 30 seconds to a minute between images is too much. Also, it appears that the drivers for the (very obsolete!) cameras I picked up assume that only one of these cameras are going to be connected to a computer at any given time, ever. If you connect two, Windows recognizes the second device, but the driver will only ever talk to the first one.

Unfortunate.  But, with more modern cameras, it is possible to use other technologies to have both cameras running at the same time and to talk to them simultaneously.

DirectShow

Fortunately, I did have some newer webcams. They don’t match, so they won’t do for the image acquisition in the long run, but they were enough for me to get up and running and write some code that will do image acquisition from two web cameras simultaneously. In order to do this, I used a library I found on SourceForge called DirectShow.Net, which allowed me to start up both web cams at the same time, begin image acquisition on them, and then acquire sample bitmaps from the image stream at predetermined intervals. The code I started with is from their July 2007 samples.  DirectShow.Net is a wrapper around the COM objects comprising DirectShow.  I can’t say I am fully up on graphs and pins and how to associate them correctly, but by using their DxSnap example (in the DirectShowSamples-2007-July\Samples\Capture\DxSnap directory), I was able to create an application that snaps two pictures simultaneously from two separate web cams. The gist of what I needed to do was set up two Capture objects. So, at the class level of the main form :

private Capture cam1;

private Capture cam2;

 

and then in the constructor:

 

cam1 = new Capture(1, VIDEOWIDTH, VIDEOHEIGHT, VIDEOBITSPERPIXEL, preview1PictureBox);

cam2 = new Capture(2, VIDEOWIDTH, VIDEOHEIGHT, VIDEOBITSPERPIXEL, preview2PictureBox);

 

The reason why I have values 1 and 2 above is because I also have a PCI TV tuner card in my dev box. Without that card, the values would be 0 and 1. And, of course, when this becomes a more robust program, those will be configurable values in a setting file of some sort. Because the target machine is relatively low power, and processing a web cam can be fairly process intensive, there were some other tweaks I did. Ultimately, the images are not for human consumption, so we do not need to look good. We can reduce the resolution (in the main form):

 

const int VIDEOWIDTH = 320;

const int VIDEOHEIGHT = 240;

 

And, since we don’t need to worry about flicker or problems with persistence of vision, we can drop the frame rate significantly. Normally, directShow will use the camera’s settings, which are usually the maximum it will support. We don’t need that. In fact, we can go well below 12 FPS if we so choose. What I did to do that, is in the Capture class, alter the SetConfigParams method to specify the frame rate. The VideoInfoheader structure, which defines how the video stream will be rendered. The AvgTimePerFrame property expresses how much time will be spent on each frame, expressed in nanoseconds. Since I wanted it to be approximately 12 FPS, I set it to 833000, which, 1 second in nanoseconds (1,000,000,000 per second) divided by 12 gives us 833,333.333, which I approximated. After these changes, the processor hit was much less intense.

 

After that, I created a small routine (using examples from the DxSnap example – thanks!) in the main form which I can call, specifying a capture device, which will snap a picture from the specified stream and return it :

 

private Bitmap TakePicture(Capture cam)

{

 

        // Release any previous buffer

        if (m_ip != IntPtr.Zero)

        {

                Marshal.FreeCoTaskMem(m_ip);

                m_ip = IntPtr.Zero;

        }

 

        // capture image

        m_ip = cam.Click();

        Bitmap b = new Bitmap(cam.Width, cam.Height, cam.Stride, PixelFormat.Format24bppRgb, m_ip);

 

        // If the image is upsidedown

        b.RotateFlip(RotateFlipType.RotateNoneFlipY);

        return b;

}

 

From here, it’s now a matter of processing images. I’ve used a good bit of some sample code in the DirectShow.Net samples download, but we’re now at a point where we can get images from two web cameras at regular intervals, which I’ve done by setting up a timer:

 

private void SnapshotTimer_Tick(object sender, EventArgs e)

{

        SnapshotTimer.Enabled = false;

 

        Bitmap b1;

        Bitmap b2;

 

        b1= TakePicture(cam1);

        b2=TakePicture(cam2);

 

        ProcessImages(b1, b2);

 

        SnapshotTimer.Enabled = true;

}

 

(We’ll get into what ProcessImage does in future installments!)

 

 From here, what we will need to do to meet our original goal of determining distance to objects by using the parallax method, we will need to do several things. First, we need to determine where objects are in the individual images. Once we have those locations, we need to correlate those to each other – that is, we need to know which images in one image correspond with images in the second. After that, we can measure differences between the objects we find and come up with numbers which we can then do computations on using what we know about computing distance via the parallax methodology.

 

Next :  Motion detection and object detection using AForge.Net

 

Implementing the A* algorithm in C# using LINQ and Generics

Some ongoing interests of mine have been things like robotics and artificial intelligence. In my spare time, I build small robotics projects just for fun, and have been involved in a larger robotics/art project, called OrbSWARM, which is a large scale kinetic art project experimenting in swarming behaviors and human/machine interaction. One of the current interests of mine have been implementing a controller program, in C#, which implements a hybridization of the agent/behaviors based approach to machine intelligence proposed by Marvin Minsky in his Society of Mind, and in other resources.

One of the ongoing problems in machine intelligence is to find a way for the machine to efficiently find a solution to specific problems presented to it. For instance, some of the cliché problems may be things like Towers of Hanoi, the travelling salesman problem, the knapsack problem, etc. For each of them, a typical way to implement a solution to the problem might be for the machine to find specific subgoals en route to the main goal. In my particular example, the program I am working needs to be able to find a route between two points on a map. Other parts of the program generate an internal map of the encountered world, represented as a bitmap, and yet other parts of the program choose a point where we desire the machine to be. The part of the program we are interested in here is the part that takes the map, and determines a route between where we are now, and where we would like to be.

There are many ways we can solve this problem, and each has its drawbacks and advantages. For instance, in an earlier iteration of the program, I used a binary search algorithm which used the bitmap itself, by drawing on it and sensing pixels directly to find paths between the goal and start positions. Basically, this works the same way the Paint bucket works in Paintbrush to paint an area, but starting at two points at once, and not filling in everything (it also uses a best-first approach). While this works, and has the advantage of allowing for stochastic variations in the resultant path, it is computationally intensive, and perhaps not suitable for lower power machines (such as Arm9, Transmeta, or Via Eden/C3 based machines) that a robotic controller program might find itself running on. For machines like these, efficiency in both memory and computational iterations really matters – the lower your O(n), the better.  Because of this, if we were able to represent the world in a more abstract manner, and have fewer data points to work with, it would be better. Also, abstracting the problem might allow us to reuse the code in more problems than just the current one – for instance, we’re talking about waypoints on a map for this specific problem, but what if we were able to apply what we build to generating workflows – for instance, a set of known delegates that point to methods that may or may not get us to a specific goal state? If we could assign costs to each one of those delegates, and determine what possible next steps there are for each, then an abstracted solution here might help us find solutions in that problem domain, as well.

The A* algorithm

Enter the venerable A* algorithm, one of the primary ways you might use to search a problem domain to find a solution. At a high level, the A* algorithm uses a best-first approach using a cost and a heuristic, to iterate through the nodes in a problem domain in order to find a path between a start state and a goal state.  Some definition of terms is in order. The cost in the above statement basically means the cost to get to a particular intermediate state. For instance, if we are trying to find a route on a map, the cost might be the distance travelled to get to a specific point. Cost can be things like distance, but can also take into account the difficulty of acquiring a particular state – for instance, if we had to go through an alligator infested swamp surrounded by thistle bushes, or worse, had to deal with a nest of the Dreaded Copperheaded Water Rattler, the cost is higher than a waypoint in an easy to traverse area. The heuristic is similar to the cost, but is more or less its inverse. The heuristic is basically a guess of what the remaining cost is to reach a specific goal. In our case, it may be the length of a straight line-of-sight line between the current state (a point on the map), and the desired goal state (our ending point).  The idea is that we will look at each possible next state, choose the one with the best possibilities (calculated as cost + heuristic), and examine that node – and this is what ‘best first’ means.

It would be nice if we could abstract some of that, and generics in C# allow us to do so.

Description of Node<>

In my implementation of A*, what I have basically done is divide the problem into several sections. The first, and easiest to implement, is a generic Node class. The Node class represents states in a particular problem domain. For instance, in our concrete example here, each node needs to keep track of possible waypoints in a path between physical points on a map, represented as the .Net Point structure. In order to be abstract about this, what I have done is to create a generic class which contains several properties which will help us track the node and our progress. I’ll highlight some of the interesting bits here.

Firstly, each node needs to know about the nodes that it is connected to.

public List<Node<T>> Neighbors { get; set; }

 

In our particular instance, this will be the points to the left, right, up and down directions from the current point, (if any exist there – if the map says there is a solid object there, we don’t create a point),

Secondly, we need to know what state the node is in. Later on, we will need to make decisions based on whether a node has been examined, is currently able to be examined, or has not been examined yet  (“closed”, “open”, and indeterminate state in normal A* parlance).

public NodeState NodeState { get; set; }

NodeState is an enum, defined as :

public enum NodeState

{

    Undetermined,

    Open,

    Closed

}

Next, we need to track what value this node represents, and it will also be useful to know what the goal value is. Also, in order to construct the path between start and goal states, each node in an A* scenario needs to track what its previous node was – what point we are adjacent to that was examined right before this one was examined.

        public T Value { get; set; }

        public T GoalValue { get; set; }

        public Node<T> PreviousNode { get; set; }

 

So that’s some basic housekeeping, and now for the interesting stuff.  As far as figuring out the cost and the heuristic, we can write code in the Node class that adds them together, but in order to abstract the definition of each of those, we really need to allow the consumer of our class to plug in a definition for each. Using lambda expressions would be nice, and the Func generic delegate allows us to do this.

        public Func<Node<T>, int> GetHeuristic { get; set; }

        public Func<Node<T>, int> GetCost { get; set; }

 

Also, because not all objects we will be dealing with implement IComparable (in this case, conspicuously, Points) we cannot constrict the Node class to something like IComparable or IEquatable<>, so, what we can do instead is define the method of obtaining equality elsewhere :

        public Func<Node<T>, bool> IsEqualFunc { get; set; }

 

All this will allow us later on to plug in our cost and heuristic in a more readily readable way. For instance, if we are dealing with a Node<Point>, we could do this :

Node<Point> point =