Tri thức là sức mạnh: Image and Video Processing

Hiển thị các bài đăng có nhãn Image and Video Processing. Hiển thị tất cả bài đăng

Thứ Hai, 4 tháng 1, 2016

Nhận dạng là gì?

Mang tiếng là chuyên gia về nhận dạng mà khi hỏi nhận dạng là gì không định nghĩa một cách thuộc lòng được. Hóa ra trong đống tài liệu của mình đã có, mình đã lướt qua trong một báo cáo nào đó. Trong đó có ghi:

Nhận dạng (pattern recognition) là một ngành thuộc lĩnh vực học máy (machine learning). Nhận dạng nhằm mục đích phân loại dữ liệu (là các mẫu) dựa trên: hoặc là kiến thức tiên nghiệm (a priori) hoặc dựa vào thông tin thống kê được trích rút từ các mẫu có sẵn. Các mẫu cần phân loại thường được biểu diễn thành các nhóm của các dữ liệu đo đạc hay quan sát được, mỗi nhóm là một điểm ở trong một không gian đa chiều phù hợp. Đó là không gian của các đặc tính để dựa vào đó ta có thể phân loại. Quá trình nhận dạng dựa vào những mẫu học biết trước gọi là nhận dạng có thầy hay học có thầy (supervised learning); trong trường hợp ngược lại là học không có thầy (unsupervised learning).

Trong lý thuyết nhận dạng nói chung có ba cách tiếp cận khác nhau:

- Nhận dạng dựa vào phân hoạch không gian.

- Nhận dạng cấu trúc.

- Nhận dạng dựa vào kỹ thuật mạng nơ ron.

Hai cách tiếp cận đầu là các kỹ thuật kinh điển. Cách tiếp cận thứ ba hoàn toàn khác. Nó dựa vào cơ chế đoán nhân, lưu trữ và phân biệt đối tượng mô phỏng theo hoạt động của hệ thần kinh con người. Các cách tiếp cận trên sẽ trình bày trong các phần dưới đây.

Các ứng dụng phổ biến là nhận dạng tiếng nói tự động, phân loại văn bản thành nhiều loại khác nhau (ví dụ: những thư điện tử nào là spam/non-spam), nhận dạng tự động các mã bưu điện viết tay trên các bao thư, hay hệ thống nhận dạng danh tính dựa vào mặt người. Ba ví dụ cuối tạo thành lãnh vực con phân tích ảnh của nhận dạng với đầu vào là các ảnh số.

Thứ Sáu, 20 tháng 3, 2015

Một số thuật ngữ trong máy ảnh

CÁC THUẬT NGỮ TRONG CAMERA

I Ý nghĩa của các con số :

Video được tạo thành từ các ảnh chuyển động liên tục hay gọi là khung hình, 1 khug hình dc tạo từ các dòng quét, 480,720,..đây là số dòng quét ngang trong một khung hình, thường 1s video sẽ có 24 khung hình. Từ số dòng quét ngang dựa vào tỉ lệ khung hình sẽ tính ra số dòng theo phương dọc. chữ "p" đứng sau chỉ kĩ thuật quét song song, nghĩa là quét liên tục từ dòng số 1 đên dòng cuối cùng 1080, còn chuẩn "i" là quét xen kẽ: quét các dòng 1,3,5,7...1079. Sau đó quay lại quét dòng chẵn 2,4,6...1080 : tổng 2 lần quét vẫn đc 1080 dòng /khung

II Độ phân giải PX là gì?

ví dụ ta có độ phân giải : 1920x1200 nghĩa là 1920x1200=2.3 Megapixel tức là 2.3 triệu điểm ảnh trên màn hình đang dùng.

đây là số điểm ảnh trên một ảnh (độ phân giải). kích thước của điểm ảnh có thể hiểu là kích thước của 1 phần tử vật lý hiển thị điểm ảnh. càng nhỏ thì tạo ra mật độ dày và ảnh sẽ mượt hơn ( 1 màn hình điện thoại 5'' có thể có số điểm ảnh = màn hình tivi 52''), Ví dụ: một bóng đèn led trên biển quảng cáo điện tử cũng là một điểm ảnh.

Camera có độ phân giải px là gì?

Ví dụ ta có camera có độ phân giải 8 Mpx nghĩa là ta chỉ có thể chụp được hình ảnh có độ phân giải 8 Mpx mà thôi số Mpx càng lớn thì ảnh sẽ càng rõ, khi zoom ảnh lên sẽ khoonng bị vỡ

4,tỉ lệ màn hình (4:3, 16:9,...vv) có liên hệ thế nào với độ phân giải màn hình ( thấy trên yahoo answer đưa ra 1 loạt các chuẩn ,ví dụ "4:3 chuẩn với 960x720"--> ko hiểu gì cả ). Theo mình hiểu tỉ lệ màn hình thì có thể điều chỉnh dc còn độ phân giải thì ko (search lung tung thì hình như màn hình CRT có điều chỉnh dc độ phân giải )
720x960 ( tỷ lệ 3:4) khung hình: đường quét theo phương ngang là 720 ---> phương dọc 720x4/3= 960

III TVL là gì?

Ví dụ khi ta có camera với thông số 720 TVL có nghĩa là 700 dòng quét theo phương ngang trong một khung hình ( nói bên trên rồi). Trên phương diện truyền hình ảnh,cam nào thì cam cũng đo bằng TVL hết, cam ip khác analog ở kỹ thuật truyền thôi.

Megapixel là đơn vị của hình ảnh
TVL là đơn vị đo của Video, chuẩn video cao nhất giờ là full hd: 1080x1920 . tức là đoạn video này được tạo thành từ các ảnh (khung) có số điểm ảnh là: 1080x1920 =2 triệu điểm (2Mpx) hoặc cũng có thể hiểu là có 1080 dòng quét trên một khung hình
trước camera analog bị giới hạn bởi kỹ thuật Pal/ Ntsc nên ko thể đạt đc độ phân giải HD, hiện tại có dòng HD-SDI có thể đạt đc độ phân giải Full HD

có nghĩa là 700 dòng quét theo phương ngang trong một khung hình ( nói bên trên rồi). Trên phương diện truyền hình ảnh,cam nào thì cam cũng đo bằng TVL hết, cam ip khác analog ở kỹ thuật truyền thôi.

IV Một số thuật ngữ phải biết trong lap dat camera quan sat:

Image Sensors - Cảm biến hình ảnh
CCD (Charge Coupled Device) : Thiết bị tích điện kép (1 loại sensor của cảm biến hình ảnh )

CMOS (Complementary Metal-Oxide-Semiconductor) : là thuật ngữ chỉ một loại công nghệ dùng để chế tạo vi mạch tích hợp

Horizontal : Độ phân giải

TV lines : Đơn vị tính Độ phân giải

Total pixels: Số điểm ảnh

Effective Pixels : Độ phân giải hình ảnh

Scanning System : Hệ thống quét

Scanning Frequency : Tần số quét

S/N Ratio (peak signal-to-noise ratio) : tỉ số tín hiệu cực đại trên nhiễu
Visible Distance: Khoảng cách quan sát
NR (Noise Reduction) : Giảm tiếng ồn (độ nhiễu)

Illumination : Độ nhạy sáng ( tính bằng LUX)

Minimum Illumination: Cường độ ánh sáng nhỏ nhất ( tính bằng LUX)
White Balance : Cân bằng trắng
AWB (Auto white balance): Tự động cân bằng ánh sáng trắng
AGC (Auto Gaint Control): Tự động bù tín hiệu hình ảnh
Backlight Compensation: Bù ánh sáng ngược

Day/Night : Ngày / đêm
ATR (Digital Wide Dynamic Range): Có khả năng thích nghi môi trường ánh sáng yếu.
HLC (High Light Compensation): Là chức năng che điểm sáng chói,quan sát tại nới có vị trí ánh sáng không cần bằng
WDR (Wide Dynamic Range): Là chức năng bù sáng khi điều kiện ánh sáng tại mỗi điểm ảnh không cần bằng nhau về ánh sáng

IR (Infrared rays): Tia hồng ngoại
Infrared Distance : Khoảng cách hồng ngoại
IR effective Sistance: Khoảng cách hoạt động của tia hồng ngoại
IR Led: Số lượng đèn hồng ngoại
IR Status : Tình trạng hồng ngoại ( bắt đầu bật hồng ngoại )
IR Power On : Nguồn hồng ngoại
IR Cut Filter : Cắt bỏ tín hiệu hồng ngoại (Lọc Hồng Ngoại)
Night Vision : Tầm nhìn đêm ( Khi camera quan sát bật hồng ngoại )

Fixed Focal Lens : ống kính tiêu cự cố định
Varifocal lens : Ống kính di động có tiêu cự thay đổi được, còn được gọi là ống kính có khả năng zoom

Normal lens : Là ống kính bình thường
Wide Angle lens : Ống kính góc mở rộng
Telephoto lens : Ống kính nhìn xa

Picture Adjustment : Điều chỉnh hình ảnh
Dual Voltage : Điện áp kép
Auto electrolic Shutter : Tự động chống sốc điện
Water resistance/ water proof : Chịu nước
Vandal Proof: Chống va đập

Indoor/outdoor: Camera đặt trong nhà hay ngoài trời
Pan/Tilt/Zoom : Chức năng quay trái phải/ trên dưới/phóng to/thu nhỏ

Operation Temperature : Nhiệt độ hoạt động
Power Source : Nguồn điện.
Power consumption : Công suất
Dimension : Kích thước.
Weight : Trọng lượng

Thứ Hai, 23 tháng 2, 2015

Một số điều ghi nhớ về ảnh

1. Biểu đồ màu (histogram) sẽ không thay đổi khi ta quay ảnh, tịnh tiến ảnh,... Do đó nếu cố định được kích thước ảnh trong tập các ảnh, thì lược đồ màu là một đặc trưng để đối sánh ảnh rất tốt.

Bài tập về dạng lược đồ màu có thể thiết kế: cho một ma trận (ảnh), tìm lược đồ màu. Chuyển vị ma trận đó đi, hoặc xáo trộn ma trận đó, tìm lại lược đồ màu.

Thứ Bảy, 13 tháng 12, 2014

Danh mục Lớp và hàm trong OpenCV

OpenCV 2.4 Cheat Sheet (C++)

The OpenCV C++ reference manual is here: http: // docs. opencv. org . Use Quick Search to find descriptions of the particular functions and classes

1.1. Lớp từ khóa trong thư viện OpenCV

Key (từ khóa)	Classes (Lớp)
Point_	Template 2D point class - Lớp điểm ảnh 2D
Point3_	Template 3D point class - Lớp điểm ảnh 3D
Size_	Template size (width, height) class - Lớp kích thước ảnh (rộng, cao)
Vec	Template short vector class
Matx	Template small matrix class
Scalar	4-element vector
Rect	Rectangle
Range	Integer value range
Mat	2D or multi-dimensional dense array (can be used to store matrices, images, histograms, feature descriptors, voxel volumes etc.)
SparseMat	Multi-dimensional sparse array
Ptr	Template smart pointer class

1.2. Cơ bản về ma trận

Function	Meaning
Mat image(240, 320, CV_8UC3);	Create a matrix
image.create(480, 640, CV_8UC3);	[Re]allocate a pre-declared matrix
Mat A33(3, 3, CV_32F, Scalar(5)); Mat B33(3, 3, CV_32F); B33 = Scalar(5); Mat C33 = Mat::ones(3, 3, CV_32F)*5.; Mat D33 = Mat::zeros(3, 3, CV_32F) + 5.;	Create a matrix initialized with a constant
double a = CV_PI/3; Mat A22 = (Mat_(2, 2) << cos(a), -sin(a), sin(a), cos(a)); float B22data[] = {cos(a), -sin(a), sin(a), cos(a)}; Mat B22 = Mat(2, 2, CV_32F, B22data).clone();	Create a matrix initialized with specified values
randu(image, Scalar(0), Scalar(256)); // uniform dist randn(image, Scalar(128), Scalar(10)); // Gaussian dist	Initialize a random matrix
(without copying the data) Mat image_alias = image; float* Idata=new float[4806403]; Mat I(480, 640, CV_32FC3, Idata); vector iptvec(10); Mat iP(iptvec); // iP _ 10x1 CV_32SC2 matrix IplImage* oldC0 = cvCreateImage(cvSize(320,240),16,1); Mat newC = cvarrToMat(oldC0); IplImage oldC1 = newC; CvMat oldC2 = newC;	Convert matrix to/from other structures
(with copying the data) Mat newC2 = cvarrToMat(oldC0).clone(); vector ptvec = Mat_(iP);

Access matrix elements

A33.at(i,j) = A33.at(j,i)+1;

Mat dyImage(image.size(), image.type());

for(int y = 1; y < image.rows-1; y++) {

Vec3b* prevRow = image.ptr(y-1);

Vec3b* nextRow = image.ptr(y+1);

for(int x = 0; x < image.cols; x++)

for(int c = 0; c < 3; c++)

dyImage.at(y,x)[c] =

saturate_cast(nextRow[x][c] - prevRow[x][c]);

}

Mat_::iterator it = image.begin(), itEnd = image.end();

for(; it != itEnd; ++it)

(*it)[1] ^= 255;

1.3. Matrix Manipulations: Copying, Shuffling, Part Access

Function	Meaning
src.copyTo(dst)	Copy matrix to another one
src.convertTo(dst,type,scale,shift)	Scale and convert to another datatype
m.clone()	Make deep copy of a matrix
m.reshape(nch,nrows)	Change matrix dimensions and/or number of channels without copying data
m.row(i), m.col(i)	Take a matrix row/column
m.rowRange(Range(i1,i2)) m.colRange(Range(j1,j2))	Take a matrix row/column span
m.diag(i)	Take a matrix diagonal
m(Range(i1,i2),Range(j1,j2)), m(roi)	Take a submatrix
m.repeat(ny,nx)	Make a bigger matrix from a smaller one
flip(src,dst,dir)	Reverse the order of matrix rows and/or columns
split(...)	Split multi-channel matrix into separate channels
merge(...)	Make a multi-channel matrix out of the separate channels
mixChannels(...)	Generalized form of split() and merge()
randShuffle(...)	Randomly shuffle matrix elements

Ví dụ 1. Smooth image ROI in-place

Mat imgroi = image(Rect(10, 20, 100, 100));

GaussianBlur(imgroi, imgroi, Size(5, 5), 1.2, 1.2);

Ví dụ 2. Somewhere in a linear algebra algorithm

m.row(i) += m.row(j)*alpha;

Ví dụ 3. Copy image ROI to another image with conversion

Rect r(1, 1, 10, 20);

Mat dstroi = dst(Rect(0,10,r.width,r.height));

src(r).convertTo(dstroi, dstroi.type(), 1, 0);

1.4. Simple Matrix Operations

OpenCV implements most common arithmetical, logical and other matrix operations, such as

Function	Meaning
add(), subtract(), multiply(), divide(), absdiff(), bitwise_and(), bitwise_or(), itwise_xor(), max(), min(), compare()	Correspondingly, addition, subtraction, element-wise multiplication ... comparison of two matrices or a matrix and a scalar.

Ví dụ. Alpha compositing function:

void alphaCompose(const Mat& rgba1, const Mat& rgba2, Mat& rgba_dest)

{

Mat a1(rgba1.size(), rgba1.type()), ra1;

Mat a2(rgba2.size(), rgba2.type());

int mixch[]={3, 0, 3, 1, 3, 2, 3, 3};

mixChannels(&rgba1, 1, &a1, 1, mixch, 4);

mixChannels(&rgba2, 1, &a2, 1, mixch, 4);

subtract(Scalar::all(255), a1, ra1);

bitwise_or(a1, Scalar(0,0,0,255), a1);

bitwise_or(a2, Scalar(0,0,0,255), a2);

multiply(a2, ra1, a2, 1./255);

multiply(a1, rgba1, a1, 1./255);

multiply(a2, rgba2, a2, 1./255);

add(a1, a2, rgba_dest);

}

Function	Meaning
sum(), mean(), meanStdDev(), norm(), countNonZero(), minMaxLoc(),	various statistics of matrix elements.
exp(), log(), pow(), sqrt(), cartToPolar(), polarToCart()	the classical math functions.
scaleAdd(), transpose(), gemm(), invert(), solve(), determinant(), trace(), eigen(), SVD	the algebraic functions + SVD class.
dft(), idft(), dct(), idct(),	discrete Fourier and cosine transformations

For some operations a more convenient algebraic notation can be used, for Ví dụ:

Mat delta = (J.t()*J + lambda*Mat::eye(J.cols, J.cols, J.type())).inv(CV_SVD)*(J.t()*err);

implements the core of Levenberg-Marquardt optimization algorithm.

1.5. Image Processsing

1.5.1. Filtering

Function	Meaning
filter2D()	Non-separable linear filter
sepFilter2D()	Separable linear filter
boxFilter(),	Smooth the image with one of the linear or non-linear filters
GaussianBlur(),medianBlur(), bilateralFilter() Sobel(), Scharr()	Compute the spatial image derivatives
Laplacian()	compute Laplacian
erode(), dilate()	Morphological operations

Ví dụ. Filter image in-place with a 3x3 high-pass kernel (preserve negative responses by shifting the result by 128):

filter2D(image, image, image.depth(), (Mat_(3,3) <-1 -1="" 128="" 9="" o:p="" point="">

1.5.2. Geometrical Transformations

Function	Meaning
resize()	Resize image
getRectSubPix()	Extract an image patch
warpAffine()	Warp image afinely
warpPerspective()	Warp image perspectively
remap()	Generic image warping
convertMaps()	Optimize maps for a faster remap() execution

Ví dụ. Decimate image by factor of

Mat dst; resize(src, dst, Size(), 1./sqrt(2), 1./sqrt(2));

1.5.3. Various Image Transformations

Function	Meaning
cvtColor()	Convert image from one color space to another
threshold(), adaptivethreshold()	Convert grayscale image to binary image using a fixed or a variable threshold
floodFill()	Find a connected component using region growing algorithm
integral()	Compute integral image
distanceTransform()	build distance map or discrete Voronoi diagram for a binary image.
watershed(),grabCut()	marker-based image segmentation algorithms. See the samples watershed.cpp and grabcut.cpp.

1.5.4. Histograms

Function	Meaning
calcHist()	Compute image(s) histogram
calcBackProject()	Back-project the histogram
equalizeHist()	Normalize image brightness and contrast
compareHist()	Compare two histograms

Ví dụ. Compute Hue-Saturation histogram of an image:

Mat hsv, H;

cvtColor(image, hsv, CV_BGR2HSV);

int planes[]={0, 1}, hsize[] = {32, 32};

calcHist(&hsv, 1, planes, Mat(), H, 2, hsize, 0);

1.5.5. Contours

See contours2.cpp and squares.cpp samples on what are the contours and how to use them.

1.6. Data I/O

XML/YAML storages are collections (possibly nested) of scalar values, structures and heterogeneous lists.

1.6.1. Writing data to YAML (or XML)

// Type of the file is determined from the extension

FileStorage fs("test.yml", FileStorage::WRITE);

fs << "i" << 5 << "r" << 3.1 << "str" << "ABCDEFGH";

fs << "mtx" << Mat::eye(3,3,CV_32F);

fs << "mylist" << "[" << CV_PI << "1+1" << "{:" << "month" << 12 << "day" << 31 << "year" << 1969 << "}" << "]";

fs << "mystruct" << "{" << "x" << 1 << "y" << 2 << "width" << 100 << "height" << 200 << "lbp" << "[:";

const uchar arr[] = {0, 1, 1, 0, 1, 1, 0, 1};

fs.writeRaw("u", arr, (int)(sizeof(arr)/sizeof(arr[0])));

fs << "]" << "}";

Scalars (integers, floating-point numbers, text strings), matrices, STL vectors of scalars and some other types can be written to the file storages using << operator.

1.6.2. Reading the data back

// Type of the file is determined from the content

FileStorage fs("test.yml", FileStorage::READ);

int i1 = (int)fs["i"]; double r1 = (double)fs["r"];

string str1 = (string)fs["str"];

Mat M; fs["mtx"] >> M;

FileNode tl = fs["mylist"];

CV_Assert(tl.type() == FileNode::SEQ && tl.size() == 3);

double tl0 = (double)tl[0]; string tl1 = (string)tl[1];

int m = (int)tl[2]["month"], d = (int)tl[2]["day"];

int year = (int)tl[2]["year"];

FileNode tm = fs["mystruct"];

Rect r; r.x = (int)tm["x"], r.y = (int)tm["y"];

r.width = (int)tm["width"], r.height = (int)tm["height"];

int lbp_val = 0;

FileNodeIterator it = tm["lbp"].begin();

for(int k = 0; k < 8; k++, ++it)

lbp_val |= ((int)*it) << k;

Scalars are read using the corresponding FileNode's cast operators. Matrices and some other types are read using operator. Lists can be read using FileNodeIterator's.

1.6.3. Writing and reading raster images

imwrite("myimage.jpg", image);

Mat image_color_copy = imread("myimage.jpg", 1);

Mat image_grayscale_copy = imread("myimage.jpg", 0);

The functions can read/write images in the following formats:

BMP (.bmp), JPEG (.jpg, .jpeg), TIFF (.tif, .tiff),

PNG (.png), PBM/PGM/PPM (.p?m), Sun Raster (.sr), JPEG 2000 (.jp2).

Every format supports 8-bit, 1 or 3-channel images. Some formats (PNG, JPEG 2000) support 16 bits per channel.

1.6.4. Reading video from a file or from a camera

VideoCapture cap;

if(argc > 1) cap.open(string(argv[1])); else cap.open(0);

Mat frame; namedWindow("video", 1);

for(;;) {

cap >> frame; if(!frame.data) break;

imshow("video", frame); if(waitKey(30) >= 0) break;

}

1.7. Simple GUI (highgui module)

Function	Meaning
namedWindow(winname,flags)	Create named highgui window
destroyWindow(winname)	Destroy the specified window
imshow(winname, mtx)	Show image in the window
waitKey(delay)	Wait for a key press during the specified time interval (or forever). Process events while waiting. Do not forget to call this function several times a second in your code.
createTrackbar(...)	Add trackbar (slider) to the specified window
setMouseCallback(...)	Set the callback on mouse clicks and movements in the specified window

See camshiftdemo.cpp and other OpenCV samples on how to use the GUI functions.

1.8. Camera Calibration, Pose Estimation and Depth Estimation

Function	Meaning
calibrateCamera()	Calibrate camera from several views of a calibration pattern.
findChessboardCorners()	Find feature points on the checker board calibration pattern.
solvePnP()	Find the object pose from the known projections of its feature points.
stereoCalibrate()	Calibrate stereo camera.
stereoRectify()	Compute the rectification transforms for a calibrated stereo camera.
initUndistortRectifyMap()	Compute rectification map (for remap()) for each stereo camera head.
StereoBM, StereoSGBM	The stereo correspondence engines to be run on rectified stereo pairs.
reprojectImageTo3D()	Convert disparity map to 3D point cloud.
findHomography()	Find best-fit perspective transformation between two 2D point sets.