More memcached

November 20th, 2008

While playing with memcached I wanted to find a way to monitor what was getting set.  You see, we are trying to pre-warm our cache with lots of colours. (roughly 16.5 million of them :) )

Anyway, I remembered I’d had a play a (long) while ago with expect and decided it would be just the thing…

So here we are - a status report from memcached every 5 minutes.

#!/bin/bash
# script to monitor memcached

expect << EOF
set timeout 1
spawn telnet localhost 11211
while 1 {
send "stats\n";
expect #
send "stats slabs\n";
expect #
puts "sleeping for 5 mins"
sleep 300;
}

# ends

Our pre_warming script has been running for almost an hour and has warmed the cache with 319116 items - so about 52 hours to go….

In terms of memory usage, I am running memcached with a Gig of Ram, and after the hour we are only using 4.7 meg of that :). So my estimate is that we would need approximately 250MB.

Target = 16,581,375 items (colours)
1 hour = 319,116 items
1 hour = 5,005,741 bytes written
so 16,581,375/319,116 = 51.96 Hours
so 5,005,741*51.96 = 260,098,302 bytes = 248.05 MB

And now I’ve done these calculations, something makes me want to revist my memcache monitor script so that I pass it a target and then get a “time left” estimate :)

Colourphon ontology for digital images

November 17th, 2008

I was recently talking about my experiences with OWL over on ITO, and so have decided to go ahead and put my OWL ontology out there, and invite comment.  I am quite open to criticism, as I am sure that I will need to revise this, but here it is.

The aim of this ontology is to clarify some of the classes of owl:thing that concern us here at Colourphon. 

We are describing the colours found within digital images. Note: I use the term ‘digital image’ as opposed to just ‘image’ as we are specifically dealing with data gleaned from digital representations of images, and not images in any other format.

Currently the ontology deals with DigitalImage, the image itself; Pixel, a point within the image; Coordinate, the location of a pixel; RGBValue, the RGB value of the pixel; ColourName, the word or phrase used to describe the colour and GuessedColourName, A word or phrase used to describe the colour (but that word or phrase may not be accurate).

BTW, this ontology is modelled using Protégé 4, if you want to see it as I see it…

If you are aware of any work already in this area, then please do let us know.

memcached

November 7th, 2008

The Wolverhampton Hell’s Angels are shaking our windows and foundations with their stomach churning firework display, which I would be watching now rather than typing this, but for the trees at the back of the garden that are in the way!

This post comes in too parts, the good news and the bad news.

First the good news. At lunchtime, Rich and I had a chat while doing our lunchtime circuit of the business park, and we were talking about ways to make Colourphon quicker.  We know we can do the analysis which takes a bunch of coloured pixels and puts a human friendly name to the most frequently occurring in the image.  But the problem was that this was taking upwards of 30 seconds, so PHP, quite rightly, kept throwing back a maximum execution timeout error.

Rich has been thinking alot about application architecture, and in particular memcached.

So tonight, on my ubuntu development machine, I installed memcached

sudo apt-get install memcached

I installed a pecl extension for PHP.

sudo pecl install memcache

Then I added a function to instantiate a Memcache object with an array of servers, finding a neat way to get an array stored as a constant, by having the constant contain an object reference that could be evaluated.

$arr=array(1,2,3);

define("ARRAY_CONSTANT","return ".var_export($arr,1).";");

Then when we want to use the constant we simply evaluate it.

foreach(eval(ARRAY_CONSTANT) as $val ){

//do stuff with $val

}

The upshot is:
1st page load: 29.94 seconds - including several hundred calls to the class with most processing overhead.
2nd page load: 1.61 seconds - with not a single call to the class with most processing overhead!!

So if you need massive performance boost, use memcache - originally desiged for database cals, but if you want to cache a bunch of frequently used data - even objects, simply serealize and deserialize when you need it.

It has taken longer to write and proof read this post than it did to add the 12 lines of code to get it to work.

The bad news?  This isn’t live on a public machine yet :).

LibraryThing gives away covers!

August 8th, 2008

Interesting news from LibraryThing:

LibraryThing: A million free covers from LibraryThing
A few days ago, just before hitting thirty million books, we hit one million user-uploaded covers. So, we’ve decided to give them away—to libraries, to bookstores, to everyone.

Excellent stuff. We’ll have to see if we can get a key :)


We been spotted :)

March 13th, 2008

Dave Pattern up in Huddersfield has reminded me that he has already done some work on identifying colours in book jackets and then retrieving them to good effect.  Now I have been prompted, I vaguely remember having looked at Dave’s work.  Yet when Richard suggested we could do something interesting with book jackets… nope… blank… Thanks dave for reminding me!  I see also that Ed Vielmetti has spotted us at work…  Ed, a bit of the back story can be found on the Colourphon Wiki ;) .

Of course this little project is in very early days - we are not even storing anything yet!! - so come back later and see how we are getting on.  And of course, If you have any ideas or things that you think might be rather cool applications - let us know, either here, or on the aforementioned Wiki.

Exif captured from jpg images

March 8th, 2008

Tonight I added support for Exif data capture from the image (if it is a JPEG). We will need to map this to something useful, but I have already found a schema and a potential description vocabulary.

Try this example.

Weighted colour matches

March 6th, 2008

We were figuring that it might be useful to have a colour match that was in some way relevancey ranked, so I have been working on ways to achieve this.

What we have now is a result set that is sorted according to the position of the frequent colours in the image.  We also know where each cell is in the image, and can calulate - at it’s simplest - a sort order based on centre weighting.

Need an example?  These examples will take a moment or two to calculate…

Try this one: Test number one.
Or this one: Test number two.

We divide the image into an odd number of cells, ensuring that there is one in the middle. We then scan each cell and analyse the colour content.  Then we give you the results.

Simple yet strangely satisfying, and not only that, but you can analyse an image from any source, be it File, URL or book jacket retrieved by isbn search courtesy of the Talis Platform.

Progress

February 26th, 2008

After some early prototypes, proof of concepts and all round “learning” experience Tim and I have overhauled all that we have written and kept all functionality. My Uni days are now long gone, with every bit of 5 minute code at prototype easily taking 15 minutes to perfect… :)

Anyway, Tim has worked hard and put in some groovy functions, which in all means what we originally set out to do is *almost* complete - with one major flaw - we’re not *yet* storing anything.

This was our original problem, mulled over lunch whilst walking around the Business Park …

So, someone walks into a library and says … ‘I saw this book last week, in the history section and it was blue … do you know where it is now?’

The librarian replies ‘Do you know the Author, or the Title?’

‘No’ said the *now* frustrated customer…

Ah - but if we could somehow harvest the colours of a book, store them and allow users to search against them …

And this is how we were born!

Submit an ISBN

February 16th, 2008

Wouldn’t it be great if you could submit an ISBN, and have Colourphon go away and find a jacket image for you?

Well now you can!

Try this out now!  and if you can’t find an ISBN to test, try this one: 0764555871

Cracked it!

February 15th, 2008

Yeay! Fixed a rather niggly bug.

Displayed colour counts were inaccurate, although the displayed colours were correctly identified as being the most prevalent in the image. This was a frustrating bug, as it meant that calculation of ‘percentage of image of particular colour’ could not be reliably calculated. I mean, I am looking at 36000 odd pixels, and the top colour is reported to have occured 12 times. hmmm.

The bug was mainly caused by an incorrect comparison, which assumed that all colours in the incoming array were repeated sequentially rather than randomly. All I had to do was check for existence of the key, rather than comparing the current value to the previous value. Voila.

go on, try it…