Garo Garabedyan- a private blog

Everything here is written and belongs to me except otherwise noted.

Archive for January 2008

Google offer of truncating words with the aim to save memory

with one comment

In a video uploaded in YouTube (“Google Developers Day US – Theorizing from Datahttp://youtube.com/watch?v=nU8DcBF-qo4), Peter Norvig from Google in a part of his talk (31:17-33:00) presents results from tests which aims to find the shortest length of any word by which length to save the uniqueness of the word and to not mass it with other words. The need of cutting words is based on the need of saving memory and ignoring the lexical form and keeping only the semantics (as much as possible), many times when you search with Google you see bolded words like “robots”, “robotical” while yu have typed “robot” in the search query. It will be useful when we want to claim is this two strings of characters are one and the same word in different revisions, or completely different things.

Google Research Director offers to cut the length of words up to 4 letters.

Here I want to present to your attention a passage written by people from Cambridge. Read it.

THE PAOMNNEHAL PWEOR OF THE HMUAN MNID

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

Amzanig huh?

Lets apply the experience from this example to the unawareness about the natural genesis of words and their form of memorization in the mind. Psychologist say that we remember the beginning, the end and the rest content but without the letter order. If this is the way human mind works, applying it in machines which operate with natural human words will not cause problems. Never the less this have to continue be true for not only English texts, but for any language which uses array of characters for presenting words.

I think that in order to understand the nature of one phenomena you should look analytically back at its history and find the circumstances that created (caused) it, its behavior and characteristics. May be it is wiser to apply the above algo to the first one and the last one sounds, instead on the first and the last letter, because of the verbal genesis of language, instead of its later written representation.

We have to believe that by reordering all the words except the first and the last, we will not loose the identity (uniqueness) of the word and we will not lie ourselves about it’s equality to another different word.

Lets apply this Cambridge’s approach as a solution for saving memory in manipulation of big amounts of text. I think the application will look like this:

Aoccdrnig // first saving the first and the last letters

A ccdinor g // then order in alphabet order the rest letters

A c2dinor g // coding the same characters by letting only one of the group and next to it placing the count of all similar letters

May be the resulting array of characters which even can contain some numbers will be hard to be red by human, but if the Cambridge’s study is write about the nature of how people memorize words in English the above transformations will keep the uniqueness of the words after this transformation. (Can be used by a spell checker which generates advices by this algo if the word is written with proper first and last letter).

Applying this ideas to international words like names of geographical places and others we I offer one my observation I find useful: International words keep at least their first two groups of consonants in every language they are translated to.

Google and Peter Norvig are completely right about data.
I share the view of Google that by collecting input data we as an algorithm architects understand the problem better than when we have no data. It is a very deep understanding that collecting data can help you when you are taking decisions.

But

“Mathematics was found with the analogy between two apples and two pears, not by collecting two tons of apples only. And in this way Mathematics teaches us that data is not the only way to model the problem and build a solution which deals with every case of the problem which is reflected on the input data. Statistics in not the only one general algorithm by which people make solutions. We all know that there is data which can’t be collected which uncollected data is relevant (important) to the final results, the brain in this cases works analogically by finding close in some aspect (crucial for the concrete problem) problems which have solutions and even better which has solutions with proved (understood) solutions’ semantics. And this way finding solutions which in the concrete situation are equivalent and able to be applied in the raised problem. Data contains unsorted information about many problem solving possibilities, it is important to understand what data says (analogy) and collect it (statistics).” [part of AI-research page]

We have to recognize Google’s deep understanding about statistical problems and how and where to apply statistical methods. They are very good at this.

Written by garabedyan

January 27, 2008 at 19:52

Tragedy and how to fight it (about the film Titanic 1997)

without comments

I was watching the film Titanic (the film from 1997) a couple of weeks ago, and what actually makes me angry was that I couldn’t do anything. Just sitting and watching how the tragedy becomes reality through the screen. I really hate this. It makes me feel so weak like I am in such a bad situation that I just gave up, which I don’t want to do.

I think that in order to get out of a tragedy you should not give up. Something becomes a tragedy just at the moment when you stop fighting against it.

I think that Jack and Rose didn’t give up. And I find this as the moral of the film.

Having in mind the rest passengers according to the film, I hate that they gave up. I can’t judge them but I still can’t believe that the God’s creature, the man, have to give up.

I hope that God’s hand will be always on me and make me not feel fear. I believe that everything else is possible with the God’s permission.

As the story about Titanic tells, the ship is called “unsinkable”, watching the film you will hear many times “Even God can’t sink this ship”.

Even the whole science which is based on observation on many phenomenas is weak to say why something is like this or not. The science is based on suggestions only and anything else. Everything is treated as true until there are no evidences about that it is false, but only evidences about that it is true.

Things are not understandable by the man. Because of the nature of knowledge in human’s view is infinite and it can’t be solved with statistical methods.

Written by garabedyan

January 27, 2008 at 18:11

Posted in Uncategorized

HTTPS and is it used properly

without comments

HTTPS ensures reasonable protection from eavesdroppers and man-in-the-middle attacks. Data sent to the server through secure http will be in some manner protected. But is all data sent protected as it have to be in order to ensure security?

Gmail and Yahoo! Mail in example no matter if you are a logged on user are establishing with every visitor a HTTPS connection and this way secures the log in form and user name and password which are sent through the form.

Does all pages do this way? I think more of them secures the connection (if they secure it) after you log in.

Be aware to not type user name and password on non-secure web pages, if you see a web which asks you to enter identification data on non-secure connection, immediately inform the administrators of the application.

Written by garabedyan

January 26, 2008 at 23:19

Posted in Uncategorized

IP address is a personal data

without comments

There was a debate about treating IP address as a personal information.

I share the opinion that the IP address is a personal data.

We all know that there exists networks which have one gateway and all network computers use the IP address of the gateway in order to identify themselves in the WWW. But this doesn’t mean that if the IP address is used by a group of people it (the IP address) stops to be a personal data. If in this cases the IP address doesn’t contain any kind of personal information I don’t see a problem in treating them as personal information.

If we take the name as an example of personal information, this three strings of characters aren’t unique, you need to know more things about somebody in order to find him. Lets make analogy to the IP address, there can be many machines which use one IP address and some conditions have to be met in order to find the exact person behind the machine. As the name remains private data, the IP address have to be a private data, too.
You can’t be sure that an IP address as the name are unique referrals to somebody.

I now that the IP address contains in its own special information about the country, the city and the ISP of the machine which caries this particular IP. If you want to make some statistics or present special content to people from different geographical locations there is no problem to write down this three information components and erase the IP address.

Storing the IP address for a long period of time can allow you to collect some information about the user if some technical circumstances are met, which information is approved by the law as private.

Why don’t you get the information that you need from the IP address and then delete the address.

Written by garabedyan

January 26, 2008 at 12:09

Posted in Uncategorized

Tagged with ,

Web 2.0 Looking Like an Operating System, Some Things that Ought to be Done

without comments

YouTube video naming cuttings( Case Study on YouTube’s Interface )

One of the things I like in Graphical Operating Systems is that they try to understand what are the user needs. If we look at this video list (on the left), I really can’t understand what are they about. Making Web 2.0 looking more powerful and a first competitor of the Operating Systems, it (Web 2.0) needs to present to the user with as much information as needed in an easy way. Imagine you mouse over the video link and see its all name, not the short version anymore. Isn’t it a great web interface-user interaction.

|

|

|


Scroll BarYou can see on the left a custom designed scroll bar for viewing video lists. (which control can be used for presenting even every big list)

While you explore a list of videos the page presents you the ability to click on “See all ** videos” link which opens you a new page containing the all videos from the viewed category, author or whatever.

The idea of this User Interface Control is to let user get as much items as he wants from the huge amount of items that exists on the server (or wherever). Of course at some moment you will get interested in the videos and viewing them this way in a small list will not satisfy you, so it is good to have down to the list a link which will drive you to a page which contains in general the video list (like now YouTube does) of course not from the begging but on place where you stopped using the list box.

The control doesn’t present the user with the ability to undo some adding or remove the last items from the loaded list.

Horizontal Scroll While watching a very long video (i.e. 52 min long) you will realize that the scroll bar of the video will get practically unusable. This custom scroll bar enables setting a step which helps you navigate around.

These two scroll bars are able to be combined in one universal scroll bar.

The controls are able to be implemented by JavaScript and Ajax.

It would be cool to comment a concrete scene from movie, not the all one. In such a way to search around long time movies.

I believe that the trend of Web 2.0 have to be making users keep everything about one topic in one page. There is no need of many tabs to do just one thing. People want to place everything about some their work on one page. In web developers’ prospective this will help them to understand a lot of things about the user and how he is using theirs page and about what. Just help him and/ or give him an ad.

Written by garabedyan

January 24, 2008 at 21:15

Posted in Uncategorized

Tagged with , , ,

“Design patterns” or just Implementation Patterns

without comments

Having in mind the best-seller of Gang of Four, Design Patterns.

Design patterns are a big achieve in the Object-Oriented programming. They settle down a new era of finding, describing and collecting recipes about composing objects. But are they really helpful in Object Design or are only part of the implementation process and provide developers to use a level of abstraction fulfilled with ability of fast and easy recomposition and use of the business logic of the system.

GoF, Design Patterns is presenting examples about the implementation of the listed pattern in C++ programming language. Now a lot of people are sure that any concrete pattern have to be written in the programming language it is going to be used. I can’t agree that Object Design have to deal with problems from the abstraction class of programming languages and implementation.

Design Patterns according to me are one level over drawing objects and naming attributes, but both UML techniques and design patterns are the final product of Object design and they in their own doesn’t contain design, but only implementation of the technology needs of the roles, responsibilities and collaborations between already designed objects.

“Design in its nature is a human act of recognizing and well translating the important (needed for the purpose of the system) parts of one or more phenomenons from the phenomenons area of existence to an artificial system-world.” Written in the page About me.

I think we all know that the technique of translation examined in the paragraph above is upgradeable and we see this upgrade in the history of Object Design, which in past was only Design and Object-Oriented programming wasn’t invented.

Written by garabedyan

January 23, 2008 at 16:34

Posted in Uncategorized

Off-road vehicle not 4×4, but 6×6

without comments

I like off-road cars. I think that they underline their owners’ serious characters and how they succeed in everything they engage. I want to share with you one of my dreams about an off-road car model.
We all know that cars are with 4 tires, but I offer you to add on the back an additional couple of tires and reconfigure the vehicle from 4×4 to 6×6.

I think that if we take as an example Toyota Sequoia or Hyundai Santa Fe it will look extremely powerful and full with strength when it has not 4 but 6 vehicles and each of them powered by the engine.

I hope that this additional couple of tires will not increase the length of the car not more than 0.5 meters.

I was thinking about the friction of the two couples back tires while the car is taking a turn, I think that the tires are not wider enough to cause serious wearing outs.

Like Michelangelo uses composition of different volumes of physical objects in one final homogeneous work in order to impress power and motion, I think making the end of the car a little bit higher than the beginning will cause strong impression (about security and control) in the eyes of a spectators.

I found that this idea is already implemented in Hummer H6.

http://www.automotoportal.com/article/freelander-2-hst-tops-the-range
Land Rover Freelander 2 HST has a nice wings. While the vehicle is taking a turn, this wings makes it look like a extremely big machine, because of their bold design, they makes the big tire to look like a small thing is parallel with the all car.

Written by garabedyan

January 23, 2008 at 16:03

Posted in Uncategorized

View page redirection history and cookie setting times

without comments

I want to express two ideas about web browser tools which are usable from professional web developers, but even from any web surfer.

For the user’s eye:

  1. Viewing all page redirections (history of the page) and manipulating them as strings, when some string is changed and pressed Enter the changed address is loaded on the current tab/ window.
  2. Cookie session viewer presenting information which web site sets which cookie(s) and how long they knew what are you doing. When in a web page is placed image to another domain (different from the page) to let the user choose to send or not the cookies (which are set by the domain hosting the image)- this will prevent the both domains understand is the user visiting them at same time is one person or not.

Why browsers (in default) send/ receive cookies to/ from third-party web resources: Same-Origin Policy and Third-party cookies.

1)

Many times web pages redirect visitors, mostly it is not bad, but when they sell web traffic it compromises the web surfing experience. I haven’t seen any web browser presenting all the page redirections, yet.

When the address of the page is from another page, capture the href value as a first redirection element, in order to capture changes of the address by JavaScript code in the page hosting the link.

When is executed a form from a page (loaded in the same browser), set as a first redirection element the action attribute of the form DOM element.

2)

It is important to have this tool because cookies are in general stored after the page which sets them is unloaded. I find it good to inform the user about the information that web sites can collect about him by any particular cookie’s life and the whole group of cookies set by a particular domain (every time the user visits a web page the server receives through HTTP all cookies and is free to set new ones; the web page can change the values and names of the cookies every time you attend a connection with it but this will not obstruct the ability of capturing as information as when there were no changes on the cookie names and values).

Web Cookies, in my opinion, does not have to store information about you, but only about your habits. In this way of thinking the cookie have to not contain any long strings of code which is unique for every visitor. The need of authorizing the visitor is fulfilled by hidden fields in forms and hrefs (links), which technique is multi platform and the unique codes are lost when you close the pages containing them, when you lose the ability to access a page through link or submit a cookie that contains a hidden field with session information.

Never the less it is not a good practice in web development view to store session information only in a cookie pair of name and value. After the boom of using Ajax techniques web developers face a new problem called Cross-site Request Forgery based on using Same-Origin Policy and executing forms/links with the cookies of the user who doesn’t know anything (which can be logged at the same time and this way making a lot of things with its account by knowing the forms and links- the interface of the web application).

Written by garabedyan

January 20, 2008 at 12:19

OpenOffice Documents storing pictures by their web addresses enables understanding when and who views OpenOffice file

without comments

I copied some text and pictures from Mozilla and pasted it in OpenOffice Writer. When I open the document again I saw that it loads very slow and there is some internet transfer happening. I know that ODF files are archived XML files with additional folders containing files which are addressed in the XML(s). I was thinking that when I copied the content from the web browser the OpenOffice software will not copy the link to the image but the whole picture file and will place it in the folder for additional parts of the document. But I was wrong.

In OpenDocument pictures can not be copied but only their web addresses can be saved and when you open an OpenOffice Document the computer will download from the internet the all needed pictures.

If you are a web publisher, you can use ODF files(*) in order to record how many times someone downloaded your picture and this way conclude approximately how many times your document is opened. The same approach is famous to e-mail users which want to know how many times and when sent e-mail message by them is opened, imagine sending the message in ODF file attachment. In the same case, you can trace the IPs of the viewers in order to try to recognize the different persons opening your document, but I am not sure that you can set cookies to the viewers’ machines, I think you can’t but this have to be tested on different platforms in order to be sure that when Internet browser component connects to the picture host it ignores any receiving of cookie and doesn’t send any.

I think that this is not a problem of Open Document Format, but of OpenOffice (I don’t know how this problem stays in MS Office which can save and open ODF files) which program policy is technologically free to choose in which way to process the pasted web pictures in a document.

* No matter of the product opening the ODF file it (the product) can’t ignore the need of downloading the picture in order to show it to the user.

Written by garabedyan

January 19, 2008 at 22:03

Posted in Uncategorized

Tagged with ,

Garbage Collection in Java and Design Patterns

with 2 comments

Managing object deletion in Java is made automatically by Garbage Collector. Having this in mind it is crucial to know how to implement some patterns (like the Observer) which has a communication line (realized with storing refs) between objects.

Instead of the default object reference I offer you to use java.lang.ref.*

It is important to use not default object references in order instruct Garbage Collector to delete the object when up to him are only weak object references (which refs are supposed to keep the object in some pattern but not to keep it away of Garbage Collector).

Written by garabedyan

January 8, 2008 at 22:05