Computer

VISION

Miguel Araujo @maraujop

http://bit.ly/pycones

DISCLAIMER

Just an amateur

http://bit.ly/PYCONES

Red LIGHT HAL

HARDWARE

CAMERAS

Compact cameras
DSLR cameras (Reflex)
Micro cameras
USB cameras (webcams)
IP cameras
Depth field / 3D cameras

CHOOSING A CAMERA

Volume / Weight
Size of the sensor, bigger is always better
Focal Length
Resolution
Light conditions
Adjustable
Price

PHOTOGRAPHY 101

3 Pillars

Shutter speed
Aperture
ISO (Film speed)

http://bit.ly/poBjKi

ALSO

White balance
etc

Shutter speed

http://bit.ly/17hSKG

APERTURE

Depth of field

http://bit.ly/158gbyW

ISO

LIBGPHOTO2

Linux Open Source project
Handles digital cameras DSLRs/compact cameras through USB.
Supports MTP and PTP v1 & v2.

Vision

Compact Cameras

Many take from 6-15 seconds using libgphoto2.
Rarely can stream video in real time.
Rarely can adjust camera settings on the go.

Vision

DSLRs

Good time response.
Very well supported, many features.
Many camera parameters adjustable on the fly.

VISION

Micro Cameras

Custom drivers
Proprietary ports

VISION

Webcams

Bad resolution
Handled through V4L2
Poor performance in bad lighting conditions
Not very adjustable

EXTRA

Lenses
Number of cameras

SOFTWARE

OpenCV

Open Source
Known and respected
C++ powered
Python bindings
Low level concepts, hard for newbies
opencv-processing and others

Simplecv

Built on top of OpenCV using Python
Not a replacement
High level concepts and data structures
It also stands on the shoulders of others giants: numpy, Orange, scipy...
Well, yeah, it uses camelCase
simplecv-js

HELLO WORLD

COORDINATES

FEATURE DETECTION

Edges
Lines
Corners
Circles
Blobs

BLOB

A region of an image in which some properties are constant or vary within a prescribed range of values.

Blue M&Ms are blobs

m_and_ms = Image('m&ms.jpg')
blue_dist = m_and_ms.colorDistance(Color.BLUE)
blue_dist.show()

BLUE BLOBS

blue_dist = blue_dist.invert()
blobs = blue_dist.findBlobs()
print len(blobs)
>> 122

blobs.draw(Color.RED, width=-1)
blue_dist.show()

Polishing it

findBlobs(minsize, maxsize, threshval, ...)

blue_dist.findBlobs(minsize=200)
blobs = blobs.filter(blobs.area() > 200)
len(blobs)
>> 36

average_area = np.average(blobs.area())
>> 37792.77

blue_dist = blue_dist.scale(0.35)
blobs = blue_dist.findBlobs(threshval=177, minsize=100)
len(blobs)
>> 25

RULes

Dynamic is better than fixed, but harder to achieve.
If color is not needed, drop it, at least until needed.
The smaller the picture, less information, faster processing.
Always use the easiest solution, which will usually be the fastest too.
Real life vs laboratory situations.
Some things are harder than they look like.
When working in artificial vision, don't forget about other input sources (time, sounds, etc).

GOLDEN RULE

Always do in hardware what you can do in hardware.

COLOR SPACES

RGB / BGR

image.toRGB()

HSV (Hue Saturation Value)

image.toHSV()

YCbCr

image.toYCbCr()

http://bit.ly/1dSSoI2

huedistance

blue_hue_dist = m_and_ms.hueDistance((0,117,245))

IDEAL

blue_hue_dist = m_and_ms.hueDistance(Color.BLUE)

binarize

Creates a binary (black/white) image. It's got many parameters you can tweak.
Use Otsu's method by default, adjusting the threshold dynamically for better results.

blue_dist.binarize(blocksize=501).show()

MATCHING

Detector
Descriptor
Matcher
Filtering or Pruning best matches

DETECTORS

They need to be effective with changes in:

Viewpoint
Scale
Blur
Illumination
Noise

DETECTORS

Find ROIs

Corners

Hessian Affine
Harris Affine
FAST

Keypoints

SIFT
SURF
MSER
ORB (Tracking)
BRISK (Tracking)
FREAK (Tracking)

Many more

DESCRIPTORS

Speed vs correctness

SURF
SIFT
LAZY
ORB
BRIEF
RIFF
etc.

MATCHERS

FLANN
Brute Force

PRUNING

Cross-check
Ratio-Test
shape overlapping

mATCHING

Template or Query image (Choose wisely)
Sample or Train image

result_image = sample.drawKeypointMatches(template)

skp, tkp = sample.findKeypointMatches(template)

skp - Keypoints matched in sample

tkp - Keypoints matched in template

findKEYPOINTMATCH

Detection: Hessian affine
Description: SURF
Matching: FLANN Knn
Filtering: Lowe's ratio test
find an Homography
Returns a FeatureSet with one KeypointMatch

TEMPLATE

SAMPLE

FINDKEYPOINTMATCH

coupons = Image("coupons.jpg")
coupon = Image("coupon.jpg")
match = coupons.findKeypointMatch(coupon)
match.draw(width=10, color=Color.GREEN)
uno.save("result.jpg")

2nd example

FAILS

Many OUTLIERS

CLUSTERING

def find_clusters(keypoints, separator=None):
    features = FeatureSet(keypoints)
    if separator is None:
        separator = np.average(features.area())

    features = features.filter(
        features.area() > separator
    )
    return features.cluster(
        method="hierarchical", 
        properties="position"
    )

BIGGEST CLUSTER

def find_biggest_cluster(clusters):
    max_number_of_clusters = 0
    for cluster in clusters:
        if len(cluster) > max_number_of_clusters:
            biggest_cluster = cluster
            max_number_of_clusters = len(cluster)

return biggest_cluster

NORMAL DISTRIBUTION

Point = namedtuple('Point', 'x y')
def distance_between_points(point_one, point_two):
    return sqrt(
        pow((point_one.x - point_two.x), 2) + \
        pow((point_one.y - point_two.y), 2)
    )

skp_set = FeatureSet(biggest_cluster)
x_avg, y_avg = find_centroid(skp_set)
centroid = Point(x_avg, y_avg)
uno.drawRectangle(
    x_avg, y_avg, 20, 20, width=30, color=Color.RED
)

NORMAL DISTRIBUTION

distances = []
for kp in biggest_cluster:
    distances.append(distance_between_points(kp, centroid))

mu, sigma = cv2.meanStdDev(np.array(distances))
mu = mu[0][0]
sigma = sigma[0][0]

for kp in skp:
    if distance_between_points(kp, centroid) < (mu + 2*sigma):
        uno.drawRectangle(
            kp.x, kp.y, 20, 20, width=30, color=Color.GREEN
        )

Normal DISTRIBUTION

ReAL WORLD EXAMPLE

DETECTION

HAAR

FACE DETECTION

Haar-like features 2001 Viola-Jones

HAAR

Needs to be trained with hundreds/thousands
Scale invariant
NOT Rotation invariant
Fast and robust
Not only for faces

How face detection works

HAAR

friends.listHaarFeatures()
['right_ear.xml', 'right_eye.xml', 'nose.xml', 'face4.xml', 'glasses.xml', ...]

faces = friends.findHaarFeatures("face.xml")
faces.draw(width=10, color=Color.RED)
faces.save('result.jpg')

1 MISS FACE.XML

face2.xml

TRACKING

Detection != tracking
Uses information from previous frames
Initially tracks what we want

SOME Alternatives

Optic Flow: Lucas-Kanade
Descriptors: SURF
Probability/Statistics and histograms: Camshift

CAMSHIFT

Effective for tracking simple and constant objects with homogeneous colors, like faces.
Gary Bradski in 1998
Original implementation has problems with similar color objects around or crossing trajectories and lightning changes.

Simple example

from SimpleCV import *

video = VirtualCamera("jack.mp4", 'video')
video_stream = VideoStream(
    "jack_tracking.mp4", framefill=False, codec="mp4v"
)

track_set = []
current = video.getImage()

while (disp.isNotDone()):
    frame = video.getImage()
    track_set = frame.track(
        'camshift', track_set, current, [100, 100, 50, 50]
    )
    track_set.drawBB()
    current = frame
    frame.save(video_stream)

MoRE COMPLEX

Initialization

video_stream = VideoStream(
    "jack_tracking.avi", framefill=False, 
    codec="mp4v"
)
video = VirtualCamera("jack.mp4", 'video')

disp = Display()

detected = False
current = video.getImage().scale(0.6)
tracked_objects = []
last_diff = None

while (disp.isNotDone()):
    frame = video.getImage().scale(0.6)

    # Scene changes
    diff = cv2.absdiff(frame.getNumpyCv2(), current.getNumpyCv2())
    if last_diff and diff.sum() > last_diff * 6:
        detected = False
    last_diff = diff.sum()

    # Detects faces and restarts tracking
    faces = frame.findHaarFeatures('face2.xml')
    if faces and not detected:
        tracked_objects = []
        final_faces = []
        for face in faces:
            if face.area() > 65:
                tracked_objects.append([])
                final_faces.append(face)
                detected = True


    # Restart if tracking grows too much
    if detected:
        for i, track_set in enumerate(tracked_objects):
            track_set = frame.track(
                'camshift', track_set, current,
                final_faces[i].boundingBox()
            )

            # Restart detection and tracking
            if track_set[-1].area > final_faces[i].area() * 3 \
                or not detected:
                    detected = False
                    break

            # Update tracked object and draw it
            tracked_objects[i] = track_set
            track_set.drawBB()

    current = frame
    frame.save(video_stream)

MoG

BACKGROUND SUBSTRACTION

Separate people and objects that move (foreground) from the fixed environment (background)
MOG - Adaptative Mixture Gaussian Model

BACKGROUND SUBSTRACTION

mog = MOGSegmentation(
    history=200, nMixtures=5, backgroundRatio=0.3, noiseSigma=16,
    learningRate=0.3
)

video = VirtualCamera('semaforo.mp4', 'video')
video_stream = VideoStream("mog.mp4", framefill=False, codec="mp4v")

while (disp.isNotDone()):
    frame = video.getImage().scale(0.5)

    mog.addImage(frame)
    # segmentedImage = mog.getSegmentedImage()
    blobs = mog.getSegmentedBlobs()
    if blobs:
        blobs.draw(width=-1)

    frame.save(video_stream)

RED-LIGHT HAL

Red light runners

1- Detect if traffic light is red, otherwise it's green. Using hysteresis.

2- Project a line for runners.

3- Do MOG and pruning for finding cars.

4- When traffic light is RED, if a car blob intersects the line, then it's a runner.

5- Recognize car to count it only once.

red_light_bb = [432, 212, 13, 13]
cross_line = Line(
    frame.scale(0.5), ((329, 230), (10, 360))
)

RED = False
number_of_opposite = 0
HISTERESIS_FRAMES = 5

def is_traffic_light_red(frame):
    red_light = frame.crop(*red_light_bb)

    # BLACK (30, 28, 35)
    # RED  (21, 17, 51)
    if red_light.meanColor()[2] > 42:
        return True

    return False

def hysteresis(red_detected=False, green_detected=False):
    global RED, number_of_opposite
    
    if RED and green_detected:
        number_of_opposite += 1
        if number_of_opposite == HISTERESIS_FRAMES:
            RED = False
            number_of_opposite = 0
    elif not RED and red_detected:
        number_of_opposite += 1
        if number_of_opposite == HISTERESIS_FRAMES:
            RED = True
            number_of_opposite = 0
    else:
        number_of_opposite = 0

while (disp.isNotDone()):
    frame = video.getImage()
    small_frame = frame.scale(0.5)
    mog.addImage(small_frame)

    if is_traffic_light_red(frame):
        hysteresis(red_detected=True)
        if RED:
            blobs = mog.getSegmentedBlobs()
            if blobs:
                big_blobs = blobs.filter(blobs.area() > 1000)

                for car in big_blobs:
                    if cross_line.intersects(car.getFullMask()):
                        # RED LIGHT RUNNER
                        small_frame.drawRectangle(
                            *car.boundingBox(), color=Color.RED, width=3
                        )
    else:
        hysteresis(green_detected=True)

    small_frame.save(disp)