Kasun’s Blog

Kasun Indrasiri

  • Kasun Indrasiri

  • Info

    Department of CSE
    University of Moratuwa,
    Sri Lanka

  • Archives

  • Categories

Archive for the ‘Uncategorized’ Category

Moving to … P A N O R A M A : Kasun’s Blog

Posted by kasun04 on January 11, 2010

I’ve moved this blog to my new blogger.

Thank YOU!

Please Visit

http://kasunpanorama.blogspot.com/

Advertisements

Posted in Uncategorized | Leave a Comment »

Crafty ‘Pimpl’ : Compilation Firewall

Posted by kasun04 on November 9, 2009

The Pimpl idiom or ‘Pointer to Implementation’ (aka compilation firewall or Cheshire Cat technique) is a technique of hiding implementation details by moving private members (both data and functions) of a class into an internal private struct (or class). So it simply minimizes coupling via the separation of interface and implementation and then implementation hiding. Therefore pimpl idiom insulates clients from all knowledge about the encapsulated parts of a class.

Although, ‘pimpl’ is a cool technique to achieve ‘pure encapsulation’ and decoupling, it is often used as a compiling time optimization technique. In a non-pimpl scenario, the clients depend on the header file of a class, any changes to the header will affect clients, even if they are made to the private or protected sections. But with pimpl idiom, it hides those private details by putting private data (and functions) in a separate type defined in the implementation file and then forward declaring the type in the header file and storing a pointer to it (pointer to implementation).

Source code sample – here!

‘Pimpl’ : How to do it

This is how we can convert a non-pimpl class to a pimpl. (say FooStr is the main class and Ximpl is the implementation class)

  • Put all the private member variables into a struct/class. (move all private data from FooStr to Ximpl)
  • Define the struct/class in .cpp. (Ximpl definition)
  • Forward declaration of the struct/class in .h (class Ximpl)
  • Declare a pointer to the implementation class as a (smart) pointer (of course as a private attribute) – boost::scoped_ptr<Ximpl> pimpl
  • Instantiate struct/class in the ctor of the main class
  • May be we need to modify the copy ctor and also the destructor (if boost smart ptrs are not used)
  • (please refer to source code for further clarifications)

I know this makes your code looks so ambiguous at first sight, but since this is a standard C++ technique we don’t need to worry about the complexity. Let’s take a look at pros and cons of pimpl.

Benefits

  • Changing private member variables of a class does not require recompiling the dependent client classes – Faster Compiling time
  • The header file does not need to ‘#include’ classes that are used in private member variables – again compiles faster
  • True and pure ‘Encapsulation’

Drawbacks

  • Tough work for the developer (J)
  • Less readable
  • Doesn’t support ‘protected’ member variables

 

For further information on pimpl, please refer ‘Exceptional C++’.

 

Posted in Uncategorized | 3 Comments »

B-Trees and Inverted Index – De facto standard for file organization (II)

Posted by kasun04 on September 27, 2009

In the context of modern IR Systems, using a dictionary or hash table to represent inversion list would be kind of tedious as it request huge amount of memory. There for its obvious that we need to get the help of secondary storage to store the content and retrieve when required. Then again there is a huge overhead of using secondary storage to store and read a dictionary. So the solution that we came across was B-Tree, which is a used almost each and every implementation of inverted index.

Definition of a B-Tree of order ‘m’

  • A root node with 2 ~ 2m keys
  • All the other internal nodes have between m ~ 2m keys
  • All keys are kept in ascending order
  • All levels have the same level of differ (at most 1)

B-Tree node of order ‘d’ as shown below.

e.g.:

Inversion lists structures are used because they provide optimum performance in searching large databases. The optimality comes from the minimization of data flow in resolving a query. Only data directly related to the query are retrieved from secondary storage. The beauty of B-Trees lies in the methods for insertion and deletion of records which always leaves the tree balanced.

In the most command secondary storage ; the hard disk, the number of disk accesses is measured in terms of the number of pages of information that need to be read from or written to the disk. And the disk access time is not constant-it depends on the distance between the current track and the desired track and also on the initial rotational state of the disk. We shall nonetheless use the number of pages read or written as a first-order approximation of the total time spent accessing the disk.

In a typical IR System B-tree, the amount of data handled is so large that all the data do not fit into main memory at once. The B-tree algorithms copy selected pages from disk into main memory as needed and write back onto disk the pages that have changed. B-tree algorithms are designed so that only a constant number of pages are in main memory at any time; thus, the size of main memory does not limit the size of B-trees that can be handled.

It can be prove that the cost of processing a ‘find’ operation in a B-Tree grows as the logarithm of the file size. For example a B-Tree of order 50 which can index a file of 1 million records can be searched with in 4 disk accesses (worst case). Also for insertion and deletion, a B-Tree of order d, for a file of n records, insertion/deletion time is proportional to log(d) n.

However, for B-Trees there are some practical limitations where as the amount of data that can be transferred to with one secondary storage access is limited as well as the track size of hardware should be taken in to the account. So, in practice, optimum node size is depends critically on the characteristics of hardware.

Further Information on B-Trees

http://www.cl.cam.ac.uk/~smh22/docs/ubiquitous_btree.pdf

Posted in Uncategorized | Leave a Comment »

B-Trees and Inverted Index – De facto standard for file organization (I)

Posted by kasun04 on September 27, 2009

‘Secondary Storage’ is still a nightmare in achieving higher performance in modern computer systems and often the power of multi-core CPU is more or less negate due secondary storage low performance IO. Of course the performance of secondary storage IO has improved in the recent past but it is still inferior relative to CPU performances.

A computer must retrieve an item and place it in main memory before it can be processed. In order to overcome the low performances of the system one must organize the files intelligently and making the retrieval efficient. The file organization depends on the retrieval method; sequential or random. Particularly secondary storage IO is a huge overhead in the context of random access method. Therefore associated with a large, randomly accessed file in a computer system is an index. An index is often a file that stored in the disk and it contains a mapping between the terms and content.

Inverted Index

An index if often used in Information Retrieval systems, Database systems, file indexer for user and general purpose access methods. The important thing here is that how we physically represent index in a computer system. That’s where B-Trees come in to play. So, before we going deep into B-Trees it’s worthy to have a close look at index and its structure.

‘Inverted Index’ is the more specific name given to the index as it’s the most common data structure used in Database Management Systems and Information Retrieval Systems (Search Engines etc.). It consists of three different basic files,

  • Document File
  • Inversion list (posting files)
  • Dictionary

The name “inverted file” comes from its underlying methodology of storing an inversion of the documents: inversion of the document from the perspective that, for each word, a list of documents in which the word is found in is stored (the inversion list for that word). Each document in the system is given a unique numerical identifier (Document ID). It is that identifier that is stored in the inversion list. The way to locate the inversion list for a particular word is via the Dictionary. The Dictionary is typically a sorted list of all unique words (processing tokens/terms) in the system and a pointer to the location of its inversion list. Dictionaries can also store other information used in query optimization such as the length of inversion lists.

The above inverted index is the simplest representation of the inverted index but for modern IR systems there can be more attributes in the inversion list itself. For example, to support proximity, contiguous word phrases and term weighting algorithms the inversion list contains the offset of each word in the document. So for Document ID = 1, the term bit is the 4th, 8th and 30th term in the document, then the inversion list would look as follows.

  • bit -1(4), 1(8), 1(30)

So once we create the inverted index, the search process or system is the actual component that uses the created inverted index. So, when a search is performed for a given query, the inversion lists for the terms in the query are located and the appropriate logic is applied on the inversion list sets. So the outcome is the set of hits for a given query (inversion list may ordered based on ranks), and the hits will be given in the form of document IDs. These document ID will be used to retrieve the accrual documents.

So where are the B-trees ?.. Well that comes in to play when we implement the above conceptual models. Although we stored a set of normalized terms in the form of a ‘dictionary’, it actually uses a B-Tree to hold the normalized set of terms and make Secondary Storage IO more efficient. We should keep in mind that, for an enormous set of document we must get an extremely huge set of terms and the inverted index would be equally huge. If we are going to store these dictionaries in the memory, it would be very costly if not impossible. So we tend to store in the secondary storage and of course we have the performance nightmare which is always bounded to secondary storage. So we’ll see how B-Trees solve our problem in this context from part II of this post.

Posted in Uncategorized | Leave a Comment »

Powered by MS Word 2007

Posted by kasun04 on September 21, 2009

This is my first blog post through MS Word 2007. And I’m really happy about the fact that wordpress can interact with MS products. This is a big achievement by MS Word as well as WordPress as MS Word is considered to be the best word processing software.

    

Posted in Uncategorized | Leave a Comment »

Unique Perspective on C++

Posted by kasun04 on June 21, 2009

Its not worthy to write a blog post on the impacts of the Software Development on economy, society and life style. Its one of the major driving forces in the modern world. When it comes to software development, the programming languages has an immense impact on software engineering. As an incontestably candidate of programming languages, I would like to talk about C++ a bit.

software-outsourcing-cartoon-3

Well, the most common questions arises at this particular point are, “Why C++ ?”, “Isn’t it dying ?”.

Yes.. C++ is not used in the same frequency as Java or C#. But, its not dying, its not obsolete.. In fact it’s the heart of modern software development process.

‘Choosing the right tool’ is so important in the Software Development process. So in that context, C++ is a tool; a ‘complicate tool’. And it’s a tool that is difficult to learn, how to use properly.

The complexity of C++ has encouraged most software developers to go for alternatives like Java, C# etc. Then again, the complexity is a consequence of its programming power and the performance.

C++ -Complexity

The uniqueness of a proficient C++ programmers can be observed in many distinct ways

  • For a proficient C++ programmer, it’s essential to have a solid understanding of, What compiler does, what linker does, what run time system does and what OS does.
  • Memory Management. No garbage collection, so the developer needs to taken care of memory allocation and de-allocation. (‘pointers’ is always a nightmare to a C++ developer 😦 )
  • OS level Concurrency. No pseudo processes or threads (like threads provided by run time systems). Should use processes and threads directly provided by the operating system.
  • Generic Programming. (Templates)
  • Interfacing with hardware. (C++ is the best programming language to communicate with hardware)
  • Generic Programming. (Templates)

Why C++ ?

Nowadays, selecting C++ for the development of the conventional software system is simply a waste of time and money. We don’t need C++ to develop a traditional business management software system or a web application.

The famous programming languages like Java or C#.Net are often humiliated by some of the requirements of the software system. These applications are often termed “Demanding Applications”.

Demanding Applications

Performance is a major and a crucial requirement of a demanding application. Also, robustness, responsiveness and fault tolerance is also critical in the context of demanding systems. If you still confused about the demanding applications or systems, just now you are working on one of the most complicated demanding system. It’s the operating system. Linux or Windows; both use C/C++. The role that played by C++ in the modern era of Computer Science is obvious when we consider the most common demanding systems .

  • Operating Systems – Windows, Mac and Linux is mostly (if not totally) developed using C/C++.
  • Embedded SystemsEvery embedded system is a real mess of constrained resources and unlimited requirements. So, to maximize the throughput and minimize the resource consumption, C++ immerges as the best development methodology. In fact, embedded systems is the next era of C++ development.
  • Hardware Drivers – Almost all the hardware drivers are implemented by C++
  • ”The’ most popular windows application. – Well, everybody knows about it. It’s modern success story of Microsoft. Yes.. Its “Microsoft Office suit”. Each and every module of MS-Office is developed using C++.
  • Search Engine CoreGoogle Search Engines is a best example of a demanding system. The performance is the most crucial factor of a search engine. Therefore C++ is the consentaneous candidate to develop the core (or heart) of a search engine.
  • Trading Systems – Again this is for the sake of performance and handling the immense amount of load. C++ is a must.
  • Real Time Systems – This also a well known case where C++ is mandatory.

So, this is the showstopper for the jerks who claimed that ‘C++ is dying..its obsolete’ . (inspired from a interview with C++ genius, Scott Meyers)

Posted in Uncategorized | Leave a Comment »

True Colors of Patriotism

Posted by kasun04 on June 3, 2009

Last few weeks or so, the whole Colombo city was covered with Sri Lankan national flag. Sri Lankans are still celebrating not the victory over LTTE but the true independence. As well as enjoying the independence I also enchanted with the beauty of our national flag. To me.. It’s the most beautiful flag in the world (I even heard some Cricket commentators claim that it’s a very colorful flag.). Now we can hoist our national flag with the pride of being able to wipe out the terrorism from our country. It almost took three decades to bring the true independence to the island nation. And we should keep in mind that, the independence that we achieved is formed with the blood and tears of our true patriots. The unanimous credit should goes to HE the president, Armed forces and to the whole Sri Lankan community which is the sheer power behind everything.

However, there are so many people who acts as co-possessor of Victory and Independence of Sri Lanka. Patriotism is suddenly in vogue. Its everywhere.. Nowadays in SL anybody that you meet in a street is a ‘lion hearted patriot’.. anybody who migrated to US or Europe is a ‘true nationalist’ .. There are some people whose ‘patriotism’ is often ignited by alcohols… It’s a dilemma that how a man who migrated to some other country can boast about the sovereignty and the independency of his motherland. All these ‘Non-returning Sri Lankans are the genuine illustrations of ‘Pseudo Patriotism’.

Untitled

Defeating Terrorism in Sri Lanka is a sacred and a precious thing for all its citizens. As we enjoy the independency we should salute and show our gratitude to its contributor; armed forces, the president, families of all the armed forces, government, local and international community (India, Pakistan, China and Russia in particular). (But not those bloody ‘pseudo patriots’)

Finally, the courage and the dauntlessness of our heroes cannot be explain through a couple of sentences.. But this is a nice poem(translation) written during the Indian revolution around 1940′ when they fight against the British governors.

This day, we walk along with death, and laugh at its pale spectre..

We will not fear those cruel swords, our courage is far sharper..

Mistake not our silence for submission. For beneath lies lava, molten..

O Martyre, O men of valour.. One day the enemy will sing your praises…

We will show our mettle .. When the moment of truth arrives….

For courage lives in deeds, not boastful lies.

We’ve gathered in the enemy’s lair, my friend

In the hope of dying for our motherland…

We will not fear those cruel swords, our courage is far sharper.

This day, we walk along with death and laugh at its pale spectre…..

Posted in Uncategorized | Leave a Comment »

i Love .. iTunes

Posted by kasun04 on March 22, 2009

Recently I’ve started using iTunes as my music player. Latest version of iTunes (8.1.0.52) looks really cool and GUI is splendid as any other apple application. As I noticed that the sound quality is also improved with ACC format though it’s a slight difference from mp3.

You can organize your music repository with the ease of use with iTunes and playing an album is just like fetching a CD and playing it. And you can search any ‘artist’, ‘album’, ‘Genre’, ‘song’ etc using the embedded quick search. Albums are displayed on a 3-D pane where you can select the album by just clicking on album cover. (see the screen shot)
albums

Also you can listen to any type of radio streaming through iTunes and you’ll nearly 1000+ radio streams.
You have ‘alternative’, ‘ambient’, ‘blues’, ‘Folk’, ‘Jazz’, ‘Latino’, ‘Pop’ (you name it.. iTunes got it)and many more streams.

radio

Looks soo.. ‘sympathetic’ to Vista too…

all

Use it once and enjoy forever….

Posted in Uncategorized | Leave a Comment »

The Romantic Opera – Kasun Kalhara

Posted by kasun04 on February 21, 2009

frontcoverlo8

Nowadays Sri Lankan music stream is flooded with a lot RnB and Hiphop music and many people merely want to listen to such styles of music. And all the radio stations are backing such music and ruining the natural music taste of Sri Lankans. All those commercial oriented artists tend to release their 5-6 tracks per a month. This is something simply related to the quantity and they simply want to create a music track using computers and other equipments. Once they create the track there are people who can write some strange words to tally with the track and the beat. Finally they’ll come up with a music video and all the TV stations eagerly waiting to play their music video.This is the story of Iraj, BnS, .. (you name it)

But .. They have forgotten the fact that there are some (I think its few :() people in Sri Lanka who still loves the instinctive slow music (mood songs) which is a blend of soothing voices and acoustic music.

l22550177690_5867

Kasun Kalhara is one of the exceptions thrown from the Sri Lankan modern music stream. As I see he is the best production by Dr. Premasiri Kemadasaa (Kemadasa master) and doubtless selection of the best voice in the modern generation. Not only his voice but he is a composer too.. But not a synthetic composer who solitary depends on Computer and keyboard.

‘The Romantic Opera’ is his latest album and in my opinion it is the best music release of him. The titled rack, ‘The Romantic Opera’ is inspired from Opera style music.

Opera is an art form in which singers and musicians perform a dramatic work (called an opera) which combines a text (called a libretto) and a musical score -wikipedia

And this album is a combination of a set of distinct music tracks. It has some ‘Latin’ music, some of ‘Chinese or Japanese’ music, some of ‘Indian’ music and a lot of ‘Western’ and ‘Sri Lankan’ music. Most of the songs are composed using acoustic music and very less number of music instruments were used.

You’ll not able to listen to this sort of music when you switch on the radio or TV.. You’ll never hear any of these songs when you are travelling on a bus… Soo… you need to find/buy it and listen to it… because that’s the way that most true Sri Lankan music lovers used to do…

scan0003

‘The Romantic Opera’ is a must listen music album of true Sri Lankan music fan.. Believe me .. It awesome.

We always call that the ‘music’ is an universal thing… yes it is.. That’s why we love A.R Rahman’s … that’s why we love Josh Groban.. and that why we should love Kasun Kalhara.

Posted in Uncategorized | 19 Comments »

emacs rocks!

Posted by kasun04 on September 18, 2008

emacs with ecb!

Posted in Uncategorized | 1 Comment »