Kasun’s Blog

Kasun Indrasiri

  • Kasun Indrasiri

  • Info

    Department of CSE
    University of Moratuwa,
    Sri Lanka

  • Archives

  • Categories

Moving to … P A N O R A M A : Kasun’s Blog

Posted by kasun04 on January 11, 2010

I’ve moved this blog to my new blogger.

Thank YOU!

Please Visit

http://kasunpanorama.blogspot.com/

Posted in Uncategorized | Leave a Comment »

Let’s start Waving… – Google Wave

Posted by kasun04 on December 2, 2009

Real-time Communication and Collaboration tool

‘Google’, a name that is renowned for revolutionary products and ideas, came up with yet another revolutionary product called ‘Google Wave’. It’s needless to recapitulate about all the mysterious Google products such as Google Web Search, Google Earth, Gmail, Google Docs etc. which made a tremendous impact on the society. Their latest innovative product is Google Wave which is a real-time communication platform which is based on Google Web tool kit (HTML 5).

Preview

This was the Google’s biggest product launch in recent memory and it looks very promising indeed. Currently Google Wave is on its preview stage and allows limited access to the normal web users. (This is very similar to the way they started Gmail as they offer their services based on invitations). Few weeks back I subscribed to it and got the invitation recently. I tried it out most of its features and got really amazed with their idea and its performances. (Chrome is the best browser for try it out Google Wave ..and Firefox often crashed during ‘waving’ while Safari looks ok. He he… No IE please… Google decided to completely banned the IE users from Google Wave :D).

Mind blowing features

  • Real-time – you can see what someone else (I mean any of the participant for a given thread) is typing, character-by-character.. (No worries.. there is a way you can avoid this too )
  • Embeddability – Waves can be embedded on any blog or website.
  • Applications and Extensions – Gadgets, games .. far far superior to wearisome Facebook ..
  • Shared Documents– Offers a wonderful way to manage shared document. The participant can modify it on real time.. and also they can view the changes from its initial state (diff)
  • Open source – may be this is a business strategy
  • Playback – Awesome feature.. you can almost view the life cycle of a wave. (Like watching a video.. )
  • Natural language: Auto correction on the fly, as well as auto translation (with the use of Robots)
  • Drag-and-drop file sharing: Just drag and drop images that you want to share with others (may be this will get extend to other formats)
  • Cool Gadgets
    You can embed google maps, google images or search directly in to a wave

Please check out this video.

From Maps to Wave – Rasmussen Brothers

Among the coolest products of Google, Google Maps flying pretty high. As with most of the Google products, what was the turning point here was the ‘idea’. And that ‘idea’ of Google Maps came from two brothers from Sydney, Australia.

This is how CNN reports it.

“(CNN) — Lars and Jens Rasmussen were broke and jobless — with only $16 between them — when they made it big in the Web world by selling their idea for Google Maps.”

Lars and Jens Rasmussen were the genius brothers behind the superb idea of Google Maps. Yet again they came up with this revolutionary idea of ‘Google Wave’.

Lars Rasmussen and Jens Rasmussen during Google Wave internal demo.

So.. it’s a brief walk through about Google Wave… I’m pretty sure that this will become a super duper communication tool in near feature. This will be an huge challenge for existing communication tool (AIM sucks and FB is boring now.. OK!). So it’s the time to try it out.. be the very first few people who experienced a beauty of real-time communication and collaboration. Try it OUT!

See this video to get a complete overview of the product.

Posted in Computer Science | Leave a Comment »

Crafty ‘Pimpl’ : Compilation Firewall

Posted by kasun04 on November 9, 2009

The Pimpl idiom or ‘Pointer to Implementation’ (aka compilation firewall or Cheshire Cat technique) is a technique of hiding implementation details by moving private members (both data and functions) of a class into an internal private struct (or class). So it simply minimizes coupling via the separation of interface and implementation and then implementation hiding. Therefore pimpl idiom insulates clients from all knowledge about the encapsulated parts of a class.

Although, ‘pimpl’ is a cool technique to achieve ‘pure encapsulation’ and decoupling, it is often used as a compiling time optimization technique. In a non-pimpl scenario, the clients depend on the header file of a class, any changes to the header will affect clients, even if they are made to the private or protected sections. But with pimpl idiom, it hides those private details by putting private data (and functions) in a separate type defined in the implementation file and then forward declaring the type in the header file and storing a pointer to it (pointer to implementation).

Source code sample – here!

‘Pimpl’ : How to do it

This is how we can convert a non-pimpl class to a pimpl. (say FooStr is the main class and Ximpl is the implementation class)

  • Put all the private member variables into a struct/class. (move all private data from FooStr to Ximpl)
  • Define the struct/class in .cpp. (Ximpl definition)
  • Forward declaration of the struct/class in .h (class Ximpl)
  • Declare a pointer to the implementation class as a (smart) pointer (of course as a private attribute) – boost::scoped_ptr<Ximpl> pimpl
  • Instantiate struct/class in the ctor of the main class
  • May be we need to modify the copy ctor and also the destructor (if boost smart ptrs are not used)
  • (please refer to source code for further clarifications)

I know this makes your code looks so ambiguous at first sight, but since this is a standard C++ technique we don’t need to worry about the complexity. Let’s take a look at pros and cons of pimpl.

Benefits

  • Changing private member variables of a class does not require recompiling the dependent client classes – Faster Compiling time
  • The header file does not need to ‘#include’ classes that are used in private member variables – again compiles faster
  • True and pure ‘Encapsulation’

Drawbacks

  • Tough work for the developer (J)
  • Less readable
  • Doesn’t support ‘protected’ member variables

 

For further information on pimpl, please refer ‘Exceptional C++’.

 

Posted in Uncategorized | 3 Comments »

B-Trees and Inverted Index – De facto standard for file organization (II)

Posted by kasun04 on September 27, 2009

In the context of modern IR Systems, using a dictionary or hash table to represent inversion list would be kind of tedious as it request huge amount of memory. There for its obvious that we need to get the help of secondary storage to store the content and retrieve when required. Then again there is a huge overhead of using secondary storage to store and read a dictionary. So the solution that we came across was B-Tree, which is a used almost each and every implementation of inverted index.

Definition of a B-Tree of order ‘m’

  • A root node with 2 ~ 2m keys
  • All the other internal nodes have between m ~ 2m keys
  • All keys are kept in ascending order
  • All levels have the same level of differ (at most 1)

B-Tree node of order ‘d’ as shown below.

e.g.:

Inversion lists structures are used because they provide optimum performance in searching large databases. The optimality comes from the minimization of data flow in resolving a query. Only data directly related to the query are retrieved from secondary storage. The beauty of B-Trees lies in the methods for insertion and deletion of records which always leaves the tree balanced.

In the most command secondary storage ; the hard disk, the number of disk accesses is measured in terms of the number of pages of information that need to be read from or written to the disk. And the disk access time is not constant-it depends on the distance between the current track and the desired track and also on the initial rotational state of the disk. We shall nonetheless use the number of pages read or written as a first-order approximation of the total time spent accessing the disk.

In a typical IR System B-tree, the amount of data handled is so large that all the data do not fit into main memory at once. The B-tree algorithms copy selected pages from disk into main memory as needed and write back onto disk the pages that have changed. B-tree algorithms are designed so that only a constant number of pages are in main memory at any time; thus, the size of main memory does not limit the size of B-trees that can be handled.

It can be prove that the cost of processing a ‘find’ operation in a B-Tree grows as the logarithm of the file size. For example a B-Tree of order 50 which can index a file of 1 million records can be searched with in 4 disk accesses (worst case). Also for insertion and deletion, a B-Tree of order d, for a file of n records, insertion/deletion time is proportional to log(d) n.

However, for B-Trees there are some practical limitations where as the amount of data that can be transferred to with one secondary storage access is limited as well as the track size of hardware should be taken in to the account. So, in practice, optimum node size is depends critically on the characteristics of hardware.

Further Information on B-Trees

http://www.cl.cam.ac.uk/~smh22/docs/ubiquitous_btree.pdf

Posted in Uncategorized | Leave a Comment »

B-Trees and Inverted Index – De facto standard for file organization (I)

Posted by kasun04 on September 27, 2009

‘Secondary Storage’ is still a nightmare in achieving higher performance in modern computer systems and often the power of multi-core CPU is more or less negate due secondary storage low performance IO. Of course the performance of secondary storage IO has improved in the recent past but it is still inferior relative to CPU performances.

A computer must retrieve an item and place it in main memory before it can be processed. In order to overcome the low performances of the system one must organize the files intelligently and making the retrieval efficient. The file organization depends on the retrieval method; sequential or random. Particularly secondary storage IO is a huge overhead in the context of random access method. Therefore associated with a large, randomly accessed file in a computer system is an index. An index is often a file that stored in the disk and it contains a mapping between the terms and content.

Inverted Index

An index if often used in Information Retrieval systems, Database systems, file indexer for user and general purpose access methods. The important thing here is that how we physically represent index in a computer system. That’s where B-Trees come in to play. So, before we going deep into B-Trees it’s worthy to have a close look at index and its structure.

‘Inverted Index’ is the more specific name given to the index as it’s the most common data structure used in Database Management Systems and Information Retrieval Systems (Search Engines etc.). It consists of three different basic files,

  • Document File
  • Inversion list (posting files)
  • Dictionary

The name “inverted file” comes from its underlying methodology of storing an inversion of the documents: inversion of the document from the perspective that, for each word, a list of documents in which the word is found in is stored (the inversion list for that word). Each document in the system is given a unique numerical identifier (Document ID). It is that identifier that is stored in the inversion list. The way to locate the inversion list for a particular word is via the Dictionary. The Dictionary is typically a sorted list of all unique words (processing tokens/terms) in the system and a pointer to the location of its inversion list. Dictionaries can also store other information used in query optimization such as the length of inversion lists.

The above inverted index is the simplest representation of the inverted index but for modern IR systems there can be more attributes in the inversion list itself. For example, to support proximity, contiguous word phrases and term weighting algorithms the inversion list contains the offset of each word in the document. So for Document ID = 1, the term bit is the 4th, 8th and 30th term in the document, then the inversion list would look as follows.

  • bit -1(4), 1(8), 1(30)

So once we create the inverted index, the search process or system is the actual component that uses the created inverted index. So, when a search is performed for a given query, the inversion lists for the terms in the query are located and the appropriate logic is applied on the inversion list sets. So the outcome is the set of hits for a given query (inversion list may ordered based on ranks), and the hits will be given in the form of document IDs. These document ID will be used to retrieve the accrual documents.

So where are the B-trees ?.. Well that comes in to play when we implement the above conceptual models. Although we stored a set of normalized terms in the form of a ‘dictionary’, it actually uses a B-Tree to hold the normalized set of terms and make Secondary Storage IO more efficient. We should keep in mind that, for an enormous set of document we must get an extremely huge set of terms and the inverted index would be equally huge. If we are going to store these dictionaries in the memory, it would be very costly if not impossible. So we tend to store in the secondary storage and of course we have the performance nightmare which is always bounded to secondary storage. So we’ll see how B-Trees solve our problem in this context from part II of this post.

Posted in Uncategorized | Leave a Comment »

Powered by MS Word 2007

Posted by kasun04 on September 21, 2009

This is my first blog post through MS Word 2007. And I’m really happy about the fact that wordpress can interact with MS products. This is a big achievement by MS Word as well as WordPress as MS Word is considered to be the best word processing software.

    

Posted in Uncategorized | Leave a Comment »

Unique Perspective on C++

Posted by kasun04 on June 21, 2009

Its not worthy to write a blog post on the impacts of the Software Development on economy, society and life style. Its one of the major driving forces in the modern world. When it comes to software development, the programming languages has an immense impact on software engineering. As an incontestably candidate of programming languages, I would like to talk about C++ a bit.

software-outsourcing-cartoon-3

Well, the most common questions arises at this particular point are, “Why C++ ?”, “Isn’t it dying ?”.

Yes.. C++ is not used in the same frequency as Java or C#. But, its not dying, its not obsolete.. In fact it’s the heart of modern software development process.

‘Choosing the right tool’ is so important in the Software Development process. So in that context, C++ is a tool; a ‘complicate tool’. And it’s a tool that is difficult to learn, how to use properly.

The complexity of C++ has encouraged most software developers to go for alternatives like Java, C# etc. Then again, the complexity is a consequence of its programming power and the performance.

C++ -Complexity

The uniqueness of a proficient C++ programmers can be observed in many distinct ways

  • For a proficient C++ programmer, it’s essential to have a solid understanding of, What compiler does, what linker does, what run time system does and what OS does.
  • Memory Management. No garbage collection, so the developer needs to taken care of memory allocation and de-allocation. (‘pointers’ is always a nightmare to a C++ developer 😦 )
  • OS level Concurrency. No pseudo processes or threads (like threads provided by run time systems). Should use processes and threads directly provided by the operating system.
  • Generic Programming. (Templates)
  • Interfacing with hardware. (C++ is the best programming language to communicate with hardware)
  • Generic Programming. (Templates)

Why C++ ?

Nowadays, selecting C++ for the development of the conventional software system is simply a waste of time and money. We don’t need C++ to develop a traditional business management software system or a web application.

The famous programming languages like Java or C#.Net are often humiliated by some of the requirements of the software system. These applications are often termed “Demanding Applications”.

Demanding Applications

Performance is a major and a crucial requirement of a demanding application. Also, robustness, responsiveness and fault tolerance is also critical in the context of demanding systems. If you still confused about the demanding applications or systems, just now you are working on one of the most complicated demanding system. It’s the operating system. Linux or Windows; both use C/C++. The role that played by C++ in the modern era of Computer Science is obvious when we consider the most common demanding systems .

  • Operating Systems – Windows, Mac and Linux is mostly (if not totally) developed using C/C++.
  • Embedded SystemsEvery embedded system is a real mess of constrained resources and unlimited requirements. So, to maximize the throughput and minimize the resource consumption, C++ immerges as the best development methodology. In fact, embedded systems is the next era of C++ development.
  • Hardware Drivers – Almost all the hardware drivers are implemented by C++
  • ”The’ most popular windows application. – Well, everybody knows about it. It’s modern success story of Microsoft. Yes.. Its “Microsoft Office suit”. Each and every module of MS-Office is developed using C++.
  • Search Engine CoreGoogle Search Engines is a best example of a demanding system. The performance is the most crucial factor of a search engine. Therefore C++ is the consentaneous candidate to develop the core (or heart) of a search engine.
  • Trading Systems – Again this is for the sake of performance and handling the immense amount of load. C++ is a must.
  • Real Time Systems – This also a well known case where C++ is mandatory.

So, this is the showstopper for the jerks who claimed that ‘C++ is dying..its obsolete’ . (inspired from a interview with C++ genius, Scott Meyers)

Posted in Uncategorized | Leave a Comment »

True Colors of Patriotism

Posted by kasun04 on June 3, 2009

Last few weeks or so, the whole Colombo city was covered with Sri Lankan national flag. Sri Lankans are still celebrating not the victory over LTTE but the true independence. As well as enjoying the independence I also enchanted with the beauty of our national flag. To me.. It’s the most beautiful flag in the world (I even heard some Cricket commentators claim that it’s a very colorful flag.). Now we can hoist our national flag with the pride of being able to wipe out the terrorism from our country. It almost took three decades to bring the true independence to the island nation. And we should keep in mind that, the independence that we achieved is formed with the blood and tears of our true patriots. The unanimous credit should goes to HE the president, Armed forces and to the whole Sri Lankan community which is the sheer power behind everything.

However, there are so many people who acts as co-possessor of Victory and Independence of Sri Lanka. Patriotism is suddenly in vogue. Its everywhere.. Nowadays in SL anybody that you meet in a street is a ‘lion hearted patriot’.. anybody who migrated to US or Europe is a ‘true nationalist’ .. There are some people whose ‘patriotism’ is often ignited by alcohols… It’s a dilemma that how a man who migrated to some other country can boast about the sovereignty and the independency of his motherland. All these ‘Non-returning Sri Lankans are the genuine illustrations of ‘Pseudo Patriotism’.

Untitled

Defeating Terrorism in Sri Lanka is a sacred and a precious thing for all its citizens. As we enjoy the independency we should salute and show our gratitude to its contributor; armed forces, the president, families of all the armed forces, government, local and international community (India, Pakistan, China and Russia in particular). (But not those bloody ‘pseudo patriots’)

Finally, the courage and the dauntlessness of our heroes cannot be explain through a couple of sentences.. But this is a nice poem(translation) written during the Indian revolution around 1940′ when they fight against the British governors.

This day, we walk along with death, and laugh at its pale spectre..

We will not fear those cruel swords, our courage is far sharper..

Mistake not our silence for submission. For beneath lies lava, molten..

O Martyre, O men of valour.. One day the enemy will sing your praises…

We will show our mettle .. When the moment of truth arrives….

For courage lives in deeds, not boastful lies.

We’ve gathered in the enemy’s lair, my friend

In the hope of dying for our motherland…

We will not fear those cruel swords, our courage is far sharper.

This day, we walk along with death and laugh at its pale spectre…..

Posted in Uncategorized | Leave a Comment »

Google – Business, Ethics and Life

Posted by kasun04 on April 15, 2009

‘Google’ – One of the innovative and revolutionary concept which was blossomed in early ’96, as a research project of Larry Page and Sergey Brin (Google’s President incharge of Technology). Google only consume a spam of 13 years to climb to the peek of internet search technology and now, its in a state which is almost untouchable and unreachable to the competitors.

googlearticle1

I recently watch a nice interview with Sergey Brin and its really awesome as it reveals some of interesting  facts about Google, it’s challenges and  Sergey him self.

Google manufactures Nothing!

artgooglegi

Yeah… it manufactures nothing but produce an IDEA. The idea or the mission is,
‘Accumulate all the world’s information and make it accessible and useful to everyone’. This sounds like an non-commercial or non-profitable but we should keep in mind that there is a underlying business process which may based on several revenue sources. In general, “Search Engines” use one or more of the following revenue sources.

– Charge advertisers for presenting online “banner” ads to users
– Collect marketing data on consumer habits, then selling the data or using it for targeted advertising.
– Charging websites to become listed
– Charging websites for better placement in lists
– Charging websites to purchase keywords for themselves
– Charging users for searches
– Charging other search engines to use their catalog

Google’s mission is to critical as having the correct information that you need  and accessible anytime that you want is very important to each and every aspect of the human life.

Google is a  gigantic “Jewish” company?

Both founders of Google, Sergey and Larry are Jewish. Its first employee was Jewish and many of the seniors were Jewish too. So definitely, Google has some Jewish characteristic  in its business process. In this case, Google’s main opponent, Microsoft was also  founded by a Jewish person.

larry-page-sergey-brinLarry Page and Sergey Brin (both are Jewish)

Unbiased Search Results!

Google doesn’t to any moderation or alteration to the search results (‘search hits’), which means Google provides results based on the query that the user executed. For example, I google for “Internet Browser” and the result set contains the ranked hits in this order.  – Opera, Firefox, Safari, IE7 and finally Chrome. This shows the integrity of search engine. Google supposed to provides users with the information they required and the presentation of the result set is not modified by the Search Engine as it uses an unbiased scheme in ‘Query and Result Processing’.

Google- Privacy  and Identity

googlewatch

When it comes to internet search, Google track all the activities that you did using Google. (may be based on your IP or if you use gmail while searchine. And also even if you use dynamic IP, it gives some identity of  your country, region etc.). Google knows what your interest and may be there the things that you don’t want to reveal to anybody. So, its clear that there is some conflict here.  And the most important thing here is that, the commercial value (or may military) of the information kept with Google. It can be used as a global business survey where you can identify the potential clients of a given product and the geological distribution of the community.

In this particular interview, Sergey Brin was questioned  about this issue but his response was that there were no any cases of exposing of users search history. Yet he didn’t reject that fact that the amount of user information (and to which extent)  is kept  with Google

Google Earth ‘s impact on military and security activities

When we use Google Earth for the first time, it’s not a surprise that the impact of Google Earth on military activities comes to our mind.

edwards-3

But according to Sergey, most of the high ranking  security agencies are claimed that there is no impact of Google Earth on military and security activities.

Google – No 1 Place to work and 20% of working time is yours!

1wr1

2wr3wr

Google is considered to be the best place to work with numerous facilities and  good exposure to almost every technology. Apart from that, the employees are supposed to spend 20% of their working time on some personal work(technical stuff). The products like “Orkut”(social network) and “Google News” are the outcomes of this nice policy. I guess this is really a cool thing. Every company should think about giving there engineers a free time where they can work on their own projects etc. (Of course this allocated 20% time is not supposed be wasted on facebook or similar stuff :D)

Finally… Is Sergy Brin   a happy person

He claimed that he is a happy person and also comment on the fact that most of the business professionals at his level ‘Are NOT’.

google-founder-baby-wife-pregnant

Sergey Brin and Anne Wojcicki

You can watch the complete interview of Sergey Brin here.

Posted in Computer Science | 3 Comments »

i Love .. iTunes

Posted by kasun04 on March 22, 2009

Recently I’ve started using iTunes as my music player. Latest version of iTunes (8.1.0.52) looks really cool and GUI is splendid as any other apple application. As I noticed that the sound quality is also improved with ACC format though it’s a slight difference from mp3.

You can organize your music repository with the ease of use with iTunes and playing an album is just like fetching a CD and playing it. And you can search any ‘artist’, ‘album’, ‘Genre’, ‘song’ etc using the embedded quick search. Albums are displayed on a 3-D pane where you can select the album by just clicking on album cover. (see the screen shot)
albums

Also you can listen to any type of radio streaming through iTunes and you’ll nearly 1000+ radio streams.
You have ‘alternative’, ‘ambient’, ‘blues’, ‘Folk’, ‘Jazz’, ‘Latino’, ‘Pop’ (you name it.. iTunes got it)and many more streams.

radio

Looks soo.. ‘sympathetic’ to Vista too…

all

Use it once and enjoy forever….

Posted in Uncategorized | Leave a Comment »