Saturday, September 5, 2015

Researching on OCR (Optical Character Recognition) library

Recently I got a project request to do an auto web form submission program that will submit auto submit data as follow :



I want to build a desktop program so that it's easier for the user to run and do the auto submission. I choose the Dot Net framework as the platform to work on.

As you can see from the above image, there is a captcha image generated every time this page is accessed. So naturally the solution is to go to OCR library. From my experience, Tesseract is one of the best solution out there. However, it will be difficult to use it directly as it is developed in C / C++ language.

So is there any convenient wrapper that provide a more direct usage of the library? The answer is yes, there are several of them. But I am not going to go through all of them, rather, just the one that I tested working, it's :
https://github.com/charlesw/tesseract/tree/release/2.4.0

The github project mentioned these :

1. Add the Tesseract NuGet Package by running Install-Package Tesseract from the Package Manager Console.
2. Ensure you have Visual Studio 2012 x86 & x64 runtimes installed (see note above).
3. Download language data files for tesseract 3.02 from tesseract-ocr and add them to your project, ensure 'Copy to output directory' is set to Always.
4. Check out the Samples solution ~/Samples/Tesseract.Samples.sln for a working example

Of course they are correct on the above instructions and I did all of them, and I really appreciate their contributions. I open the solution file in SharpDevelop 4.4 and tried on the Tesseract.ConsoleDemo project, which is a console program. Seems like there is "something" missing when I tried to build the console program, the reference to "Tesseract.dll" is missing !!

Why? I asked myself, there must be something I missed out. Well, indeed I missed out a big chunk, and I blamed myself for that, Tesseract is a C / C++ library, and I am stucked at the DotNet mode at that moment, I can't expect it to be cross platforms like DotNet or Java, C / C++ is platform dependent !! I have to compile my own dll, stupid me :D

Knowing what's missing is a major help, it's not difficult to find that the "build.bat" is right there at the beginning. So I launch my favourite command prompt gnuwin32 to start running the "build.bat" file. Then come another difficulty, it failed to compile with the following message, as you can see below :

Text Output 

Project "C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\build.proj" on node 1 (default targets).
PrepareBuild:
  Copying file from "C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\src\AssemblyVersionInfo.template.cs" to "C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\src\AssemblyVersionInfo.cs".
  copy /y "C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\src\AssemblyVersionInfo.template.cs" "C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\src\AssemblyVersionInfo.cs"
C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\build.proj(55,9): error MSB4062: The "MSBuild.ExtensionPack.FileSystem.File" task could not be loaded from the assembly C:\Projects\Temp\Http_Submit%28OCR%29\tesseract-release-2.4.0\tesseract-release-2.4.0\tools\MSBuild.ExtensionPack\MSBuild.ExtensionPack.dll. Could not load file or assembly 'file:///C:\Projects\Temp\Http_Submit%28OCR%29\tesseract-release-2.4.0\tesseract-release-2.4.0\tools\MSBuild.ExtensionPack\M
SBuild.ExtensionPack.dll' or one of its dependencies. The system cannot find the file specified. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.
Done Building Project "C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\build.proj" (default targets) -- FAILED.


Build FAILED.

"C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\build.proj" (default target) (1) ->
(PrepareBuild target) ->
  C:\Projects\Temp\Http_Submit(OCR)\tesseract-release-2.4.0\tesseract-release-2.4.0\build.proj(55,9): error MSB4062: The "MSBuild.ExtensionPack.FileSystem.File" task could not be loaded from the assembly C:\Projects\Temp\Http_Submit%28OCR%29\tesseract-release-2.4.0\tesseract-release-2.4.0\tools\MSBuild.ExtensionPack\MSBuild.ExtensionPack.dll. Could not load file or assembly 'file:///C:\Projects\Temp\Http_Submit%28OCR%29\tesseract-release-2.4.0\tesseract-release-2.4.0\tools\MSBuild.ExtensionPack
\MSBuild.ExtensionPack.dll' or one of its dependencies. The system cannot find the file specified. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.

    0 Warning(s)
    1 Error(s)

Image Screenshot

I tried to find out the reason of failure, and failed countless times, and when it seems to be too big of a problem for me to be solve, then thanks to this person, I found the answer here, the parentheses escape problem, ahhh ... finally it make sense. Changing my project folder name from "Http_Submit(OCR)" to "Http_Submit_OCR" solved all compile error. Now it compile smoothly like below :


If you read until here and still reading, I bet you are either a programmer or you a lot of free time :) . So next step is the correct reference to the correct "Tesseract.dll".

After the configurations, this is my first attempt with the captcha orignal images (250 width X 30 height)



Not bad for first attempt, but there are still many errors in recognition.

Then in second attempt I added :

engine.SetVariable("tessedit_char_whitelist", "?.0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"); 
and have the following results :



Yeah, have some improvements with the special characters eliminated. Finally to increase accuracy, I added code to enlarge the image to bigger size and have better results :


Better improvements, but there are still some errors, so there are still rooms for improvements, any suggestions for further improvements are most welcome :)


PS: I am running on Windows 7 when doing my research.

You can always contact me at via my website : www.weeprogramming.com

Monday, August 31, 2015

Recent experience on free upgrade to Windows 10

Recently, I applied the Windows free upgrade from Windows 7 to Windows 10.

Overall, the transition was smooth, I do not need to do anything during the upgrade process, just sit there and wait for the download and the installation run by itself, here is one of the screen shot I took during the upgrade.


After the upgrade, I faced the following issues :

1) Sound card is not working, driver shows it's working ok with no issue, but there is no sounds coming out of speaker, somehow it works again after I tried to download the Lenovo tools for the Windows 10 and/or play video with VideoLan. Not even sure how I solve it but the important thing is, it is working now.

2) Laptop freeze in the middle of usage, nothing responding, keyboard, mouse, even power button also not responding, but it get recovered automatically (it took quite a long time to do that , I suspect it is doing the hardisk index scan for corruption just like all previous Windows, only different is, now it is doing it all in the background) after the power off and power it on again.

Update on September 4, 2015
Windows 10 freezed again on while I am using Chrome to browse websites, same symtoms as previous experience, cannot respond to keyboard, power button, reset button and etc. Have to power off and turn it back on. After turn the power on, it display these 2 screens :




Laptop spec : Lenovo G470
RAM : Original is 2GB, upgraded to 6 GB
Processor : Intel Core i5-2410M 2.30GHz
64 bit

Checked on the Windows Event Logs, nothing suspecious, anyone have the same experience before? Want to share your opinions?

You can always contact me at via my website : www.weeprogramming.com



Saturday, August 29, 2015

Batu Cave Trip after Hua Yang AGM

I went to Hua Yang AGM on August 26, 2015. The AGM was kind of boring with only 1 person asking question, so I have plenty of time left on my allocated schedule, I decided to drive to the nearby tourist spot Batu Cave. Here are some photos :










Thursday, July 30, 2015

Introduction to software programming

Programming Tutorial Series 1
Introduction to software programming

Q) What is software programming?
A) In short , software programming is a process that turns human understandable programming language code / instructions into something that is understand by machine.

Q) What are the requirements to be a programmer ?
A) Short answer : Based on my personal experience, passion, patience, dedication and skills.
Long answer : Passion is the driving force for anyone to do anything, if you are passionate about something, you will certainly do your best.

One need to be patience in order to learn the programming skills, have to be patience to think about the programming flows, have to be patience to do debugging (a terms used to describe the process of finding errors in programming code), and also most importantly, be patience and do not attempt to bang your head against the desk when you cannot found the bugs you are looking for :)

Dedication is a must to complete anything, if you are dedicated, it's almost certain you will complete your jobs.

Skill is the last requirements because with passion, patient and dedication, you will almost certainly acquire the skills needed.

Wednesday, July 29, 2015

Android hidden easter eggs

Android version 4.1.1
How to display :
Settings > About phone > press Firmware version multiple times really fast, you will get the image as below

Then you can play with it by moving the beans around.

Android version history and their API level numbers

For easy reference, here is a list of Android version and their API level numbers : 
 
Android 1.0 (API level 1)
Android 1.1 (API level 2)
Android 1.5 Cupcake (API level 3)
Android 1.6 Donut (API level 4)
Android 2.0 Éclair (API level 5)
Android 2.0.1 Éclair (API level 6)
Android 2.1 Éclair (API level 7)
Android 2.2–2.2.3 Froyo (API level 8)
Android 2.3–2.3.2 Gingerbread (API level 9)
Android 2.3.3–2.3.7 Gingerbread (API level 10)
Android 3.0 Honeycomb (API level 11)
Android 3.1 Honeycomb (API level 12)
Android 3.2–3.2.6 Honeycomb (API level 13)
Android 4.0–4.0.2 Ice Cream Sandwich (API level 14)
Android 4.0.3–4.0.4 Ice Cream Sandwich (API level 15)
Android 4.1–4.1.2 Jelly Bean (API level 16)
Android 4.2–4.2.2 Jelly Bean (API level 17)
Android 4.3–4.3.1 Jelly Bean (API level 18)
Android 4.4–4.4.4 KitKat (API level 19)
Android 4.4W–4.4W.2 KitKat, with wearable extensions (API level 20)
Android 5.0–5.0.2 Lollipop (API level 21)
Android 5.1–5.1.1 Lollipop (API level 22)

Tuesday, July 28, 2015

PHP 7 is coming

As of 2014, work is underway on a new major PHP version named PHP 7. There was some dispute as to whether the next major version of PHP was to be called PHP 6 or PHP 7. While the PHP 6 unicode experiment had never been released, a number of articles and book titles referenced the old PHP 6 name, which might have caused confusion if a new release were to reuse the PHP 6 name.[37] After a vote, the name PHP 7 was chosen.[38]
PHP 7 gets its foundations from an experimental PHP branch that was originally named phpng (PHP next generation), which aims at optimizing PHP performance by refactoring the Zend Engine while retaining near-complete language compatibility.[39] As of 14 July 2014, WordPress-based benchmarks, which serve as the main benchmark suite for the phpng project, show an almost 100% increase in performance. Changes from phpng are also expected to make it easier to improve performance in the future, as more compact data structures and other changes are seen as better suited for a successful migration to a just-in-time (JIT) compiler.[40] Because of the significant changes, this reworked Zend Engine will be called Zend Engine 3, succeeding the Zend Engine 2 used in PHP 5.[41]
Because of phpng's major internal changes, it would have to go into a new major version of PHP, rather than a minor 5.x release, according to PHP's release process,[42] thus spawning PHP 7. Major versions of PHP are allowed to break code backwards-compatibility, and so PHP 7 presented an opportunity to make other improvements beyond phpng that require backwards-compatibility breaks. In particular, the following backwards-compatibility breaks were made:
  • Many "fatal" or "recoverable"-level legacy PHP "errors" were replaced with modern object-oriented exceptions[43]
  • The syntax for variable dereferencing was reworked to be more internally consistent and complete, allowing the use of->[](){}, and :: operators with arbitrary meaningful left-hand-side expressions[44]
  • Support for legacy PHP 4-style constructor methods was deprecated[45]
  • The behaviour of the foreach statement was changed to be more predictable[46]
  • Constructors for the few classes built-in to PHP which returned null upon failure were changed to throw an exception instead, for consistency[47]
  • Several unmaintained or deprecated SAPIs and extensions were removed from the PHP core, most notably the legacymysql extension[48]
  • The behaviour of the list() operator was changed to remove support for strings[49]
  • Support for legacy ASP-style PHP code delimeters (<% and %><script language=php> and </script>) was removed[50]
  • An oversight allowing a switch statement to have multiple default clauses was fixed[51]
  • Support for hexadecimal number support in some implicit conversions from strings to number types was removed[52]
  • The left-shift and right-shift operators were changed to behave more consistently across platforms[53]
  • Conversions between integers and floating point numbers were tightened and made more consistent across platforms[53][54]
PHP 7 will also include new language features. Most notably, it will introduce return type declarations,[55] which will complement its existing parameter type declarations, and support for the scalar types (integer, float, string and boolean) in parameter and return type declarations.[56]

Sourced from Wikipedia on July 29, 2015

Monday, July 27, 2015

Welcome to Wee Software Programming

I am a Professional Freelance Software Programmer with a degree in Bachelor of Science in Computer Science from University of Missouri-Columbia, Missouri State, USA.

With over 15++ years of  fulltime working experience as Programmer, including over 1++ years of working experience as Analyst Programmer in U.S.

Currently, I am providing Fulltime Freelance Software Programming Services.
You are welcome to check out more details in my website : www.weeprogramming.com

No matters you are operating business, working professional, studying, or even just learning programming for fun. All people are welcomed to outsource your projects to me.

*~*~ The Price will depend on each individual project's complexity and features,
Please send me your project's requirement or details so I can give you the quotation ~*~*





Services Provided:

1) Accept IT Projects Outsourcing.
2) Developed custom software systems for companies/ individual.
3) Developed Android and iPhone App.
4) Macro programming for reporting and others.
5) Web based application programming.
6) Open-source projects modification and customization.


For those who are uncertain about my Programming Skills, below are list of Programming Languages and scriptings which I know. (The lists will keep on growing) :




Languages
Objective-C, Java, C/C++, PHP, PERL, Visual Basic, Pascal, LISP, VB.NET, ASP.NET, C# and etc.

scripting
Javascript, VBscript, ASP, Actionscript and etc.

Database
Oracle, MYSQL, MSSQL (SQL Server), Access and etc.

Server
Apache, Tomcat, Linux and etc.

Platform

Pocket PC, Mobile.NET, Windows CE, MIDP, CLDC, J2ME and etc.

Others
JSP, JBoss, WAP, Servlet, Applet, HTML, DHTML, XML, Flash, Object Oriented Programming, JDBC, ODBC, VBA, CGI, PL/SQL,CSS, Palm applications development and etc.


*******************************************************************************************
All my source codes will include Inline Comments, I will also provide after sale support for all my projects 
including codes explaination using Email, Online Chat and Meetup.
*******************************************************************************************





Some Completed Projects Lists :
**Please check for latest update lists in my website**

1) Smartcard (IRIS) Reading and Writing System On Windows CE (VB.NET, Oracle)


2) RFID Smart Tag Reader/Writer Prototype (Java, MySQL)


3) SMS Gateway System For Maxis, Celcom, Digi  (Java, SOAP, MySQL )


4) OCR (Optical Characters Recognised)  System with JTWAIN Scanning (Java, MySQL)


5) Opensource Website/Projects customization (Oscommerce, ZenCard, Moodle, Mambo, Joomla and etc)


6) Online Leave Application System (PHP, MySQL)


7) Online Scheduling Portal (PHP, MySQL)


8) Webcam Surveillance System Portal (PHP, MySQL)


9) Hotel Reservation System (VB.NET, MySQL)


10) Pocket PC Parcel Tracking System (VB.NET)


11) Customized VOIP Application (C,C++)


12) Video Streaming Server and Client (VB.NET)


13) Multiple Users Chat Client and Server (VB.NET, Java)


14) Payroll System with Multi-levels Commission (VB.NET, MySQL)


15) Bus Ticketing Information System (ASP.NET, VB.NET)


16) E-Business Cards Management System (Java, J2ME, MySQL)


17) Online Tuition System (ASP)


18) Peer-2-Peer File Sharing (VB.NET)


19) Network Packet Sniffing  for PC (C++)


20) J2ME Diary Management Program (Java)


21) J2ME Games (Java)


22) Movie Booking System using GSM modem (VB.NET, MySQL )


23) WAP Base Movie and Restaurant Booking System (ASP.NET)


24) Shopping Mall Inquiry System on Pocket PC (VB.NET)


25) Home Alarm System with LED Indicators (VB.NET)


26) Sport Clubs Web Portal (PHP, MySQL)


27) Shortest Path Algorithms (Java)



~ P/S : I am a fast learner, I had learned PHP in just 1 week and already completed 3 projects immediately afterward :o) ~






Please don't hesitate to call or sms me :
My HP Number : 6012-9537993

or email me at :
wsc2004@gmail.com 
wschong2004@yahoo.com (Yahoo Messenger ID and MSN ID)

Office Hour 11 A.M. to 11 P.M. Malaysian Time (GMT +08:00)

Thank you.