Is it possible to protect your content, is it still possible to identify copy cats?

Kjell Gunnar Bleivik First version 02.15.2012

Last updated 02.15.2012 Status: Will be updated with additional information.

This article is a follow up of the above article and we start this article with a quote (is it unique on the internet?) from one of my favourite books Professor Bjørn Kirkerud (1989): "Object Oriented Programming With Simula". Addison Wesley Publishing Company ISBN 0 201 17574 6 chapter 10.10 with the heading, Can computers "learn" real games?

Whether or not the program in this chapter can be said to learn from experience does not seem to be a very interesting question, since its answer depends almost solely on how you choose to use words like "learn" and "experience". More interesting is the question of whether or not the ideas behind the program are generally applicable. Can the same ideas be utilized to "teach" a computer to play chess or to "learn" to perform other tasks?

Fortunately (some would say "alas") the idea are not applicable to any but the simplest tasks. The main reason for this is that the number of possible situations that could arise is overwhelmingly large for most interesting games. For example, the number of possible situations in the game of chess is larger than the number of elementary particles in the known universe. That means that it is impossible - and will always remain impossible, no matter what size of computers may be constructed in the future - to declare arrays that are large enough to hold bead values for every possible situation in a game of chess. It also means that the number of training games necessary to produce acceptable values in a bead array is far too large to be played in the lifetime of any human being. Even the period between two "Big Bangs" of the universe would be too short to produce a chess-player that plays measurably better than completely randomly by the method described in this chapter. There do exist programs that enable computers to play acceptable chess, but they are based on very different ideas, and cannot be said to have learned to play chess.

Even if search engines don't like duplicate content, landing pages and gate way pages, are they able to find them? Are search engines able to identify duplicate content made by article spinning (read the article above)? How many instances do you need before the combination can be infinite? Two (0,1) is definitely enough. If you have a long enough bitstream, it can always be made "infinite" for practical purposes. The precision of a 64 bit computer is 2 raised to the negative power of 64. Even if you had all the computing power in the universe, you would not be able to compute pi since pi is a trancendental number.

Don't let services like Copyscape fool you. Of course, identical copies of your articles can be identified with this tool. Search engines can identify some sort of dublicate content, but they can not identify advanced automatic article generation and spinning. How many compinations are there of the letters in the English alphabet? I have read that Shakespeare had a vocabulary of about 100 000 words. Some years ago, the Language monitor had a story that the number of words in the English language passed one million. These words can of course be combined in endless combinatiions and new words are created daily, not least words related to the digital age and the internet.

Infinite divided by 2 is infinite, so the result divided by 4 is of course infinite, too. A string of 0 and 1, starting with 1, where the characters are of the size used in this article around our galaxy, the Milky Way is absolutely finite. The number of sand grains in Sahara can be counted or more precisely approximated. They are absolutely finite, but for practical purposes they can be "regarded as infinite". So what if a content spinner has a network of screen scraping [] bots scraping information from the internet 24 hours a day, seven days a week and merging the content more or less sematically together, is it possible to identify the copy? If you are a serious content writer that sells ad space on your sites, can you be competing with what you have written yourself?

Related links