Jump to content

Edit filter log

Details for log entry 5947615

00:55, 13 December 2011: 156.56.172.235 (talk) triggered filter 135, performing the action "edit" on Network science. Actions taken: Tag; Filter description: Repeating characters (examine)

Changes made in edit

====Web link analysis====
====Web link analysis====
Several [[Web search]] [[ranking]] algorithms use link-based centrality metrics, including (in order of appearance) [[Massimo Marchiori|Marchiori]]'s [[Hyper Search]], [[Google]]'s [[PageRank]], Kleinberg's [[HITS algorithm]], the [[CheiRank]] and [[TrustRank]] algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example the analysis might be of the interlinking between politicians' web sites or blogs.
Several [[Web search]] [[ranking]] algorithms use link-based centrality metrics, including (in order of appearance) [[Massimo Marchiori|Marchiori]]'s [[Hyper Search]], [[Google]]'s [[PageRank]], Kleinberg's [[HITS algorithm]], the [[CheiRank]] and [[TrustRank]] algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example the analysis might be of the interlinking between politicians' web sites or blogs.

====Page Rank====
==Description==
[[File:PageRank-hi-res.png|thumb|right|Principles of PageRank]]

A PageRank results from a mathematical algorithm based on the [[Graph (mathematics)|graph]], the [[webgraph]], created by all World Wide Web pages as nodes and [[hyperlink]]s as edges, taking into consideration authority hubs such as [[cnn.com]] or [[usa.gov]]. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined [[recursion|recursively]] and depends on the number and PageRank metric of all pages that link to it ("[[incoming link]]s"). A page that is linked to by many pages with high PageRank receives a high rank itself. If there are no links to a web page there is no support for that page.

Numerous academic papers concerning PageRank have been published since Page and Brin's original paper.<ref name="originalpaper">{{cite doi|10.1016/S0169-7552(98)00110-X}}</ref> In practice, the PageRank concept has proven to be vulnerable to manipulation, and extensive research has been devoted to identifying falsely inflated PageRank and ways to ignore links from documents with falsely inflated PageRank.

Other link-based ranking algorithms for Web pages include the [[HITS algorithm]] invented by [[Jon Kleinberg]] (used by [[Teoma]] and now [[Ask.com]]), the IBM [[CLEVER project]], and the [[TrustRank]] algorithm.
==History==
PageRank was developed at [[Stanford University]] by [[Larry Page]] (hence the name ''Page''-Rank<ref>{{cite book | author = David Vise and Mark Malseed | year = 2005 | title = The Google Story | url = http://www.thegooglestory.com/ | isbn = ISBN 0-553-80457-X | page = 37}}</ref>) and [[Sergey Brin]] as part of a research project about a new kind of search engine.<ref>Page, Larry,
[http://web.archive.org/web/20020506051802/www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0072?1 "PageRank: Bringing Order to the Web"], Stanford Digital Library Project, talk. August 18, 1997 (archived 2002)</ref> Sergey Brin had the idea that information on the web could be ordered in a hierarchy by "link popularity": a page is ranked higher as there are more links to it.<ref name="gpower">[http://www.google-watch.org/gpower.pdf 187-page study from Graz University, Austria], includes the note that also human brains are used when determining the page rank in Google</ref> It was co-authored by [[Rajeev Motwani]] and [[Terry Winograd]]. The first paper about the project, describing PageRank and the initial prototype of the [[Google search]] engine, was published in 1998:<ref name="originalpaper" /> shortly after, Page and Brin founded [[Google Inc.]], the company behind the Google search engine. While just one of many factors that determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools.<ref name="googletechnology">{{cite web|url=http://www.google.com/technology/ |title=Google Technology |publisher=Google.com |date= |accessdate=2011-05-27}}</ref>

PageRank has been influenced by [[citation analysis]], early developed by [[Eugene Garfield]] in the 1950s at the University of Pennsylvania, and by [[Hyper Search]], developed by [[Massimo Marchiori]] at the University of Padua. In the same year PageRank was introduced (1998), [[Jon Kleinberg]] published his important work on [[HITS algorithm|HITS]]. Google's founders cite Garfield, Marchiori, and Kleinberg in their original paper.<ref name="originalpaper" />

A small search engine called "RankDex" from IDD Information Services designed by [[Robin Li]] was, since 1996, already exploring a similar strategy for site-scoring and page ranking.<ref>{{cite journal |last=Li |first=Yanhong |date=August 6, 2002 |title=Toward a qualitative search engine |journal=Internet Computing, IEEE |volume=2 |issue=4 |page= |pages=24–29 |publisher=IEEE Computer Society |doi=10.1109/4236.707687}}</ref> The technology in RankDex would be patented by 1999<ref>USPTO,
[http://www.google.com/patents?hl=en&lr=&vid=USPAT5920859&id=x04ZAAAAEBAJ&oi=fnd&dq=yanhong+li&printsec=abstract#v=onepage&q=yanhong%20li&f=false "Hypertext Document Retrieval System and Method"], U.S. Patent number: 5920859, Inventor: Yanhong Li, Filing date: Feb 5, 1997, Issue date: Jul 6, 1999</ref> and used later when Li founded [[Baidu]] in China.<ref>Greenberg, Andy, [http://www.forbes.com/forbes/2009/1005/technology-baidu-robin-li-man-whos-beating-google_2.html "The Man Who's Beating Google"], ''Forbes'' magazine, October 05, 2009</ref><ref>[http://www.rankdex.com/about.html "About: RankDex"], ''rankdex.com''</ref> Li's work would be referenced by some of Larry Page's U.S. patents for his Google search methods.<ref>Cf. especially Lawrence Page, U.S. patents 6,799,176 (2004) "Method for scoring documents in a linked database", 7,058,628 (2006) "Method for node ranking in a linked database", and 7,269,587 (2007) "Scoring documents in a linked database"2011</ref>
==Algorithm==
PageRank is a [[probability distribution]] used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.

A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a "50% chance" of something happening. Hence, a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank.

===Simplified algorithm===
Assume a small universe of four web pages: '''A''', '''B''', '''C''' and '''D'''. The initial approximation of PageRank would be evenly divided between these four documents. Hence, each document would begin with an estimated PageRank of 0.25.

In the original form of PageRank initial values were simply 1. This meant that the sum of all pages was the total number of pages on the web at that time. Later versions of PageRank (see the formulas below) would assume a probability distribution between 0 and 1. Here a simple probability distribution will be used—hence the initial value of 0.25.

If pages '''B''', '''C''', and '''D''' each only link to '''A''', they would each confer 0.25 PageRank to '''A'''. All PageRank '''PR( )''' in this simplistic system would thus gather to '''A''' because all links would be pointing to '''A'''.

:<math>PR(A)= PR(B) + PR(C) + PR(D).\,</math>
This is 0.75.

Suppose that page '''B''' has a link to page '''C''' as well as to page '''A''', while page '''D''' has links to all three pages. The ''value of the link-votes is divided among all the outbound links on a page''. Thus, page '''B''' gives a vote worth 0.125 to page '''A''' and a vote worth 0.125 to page '''C'''. Only one third of '''D'''<nowiki>'</nowiki>s PageRank is counted for A's PageRank (approximately 0.083).

:<math>PR(A)= \frac{PR(B)}{2}+ \frac{PR(C)}{1}+ \frac{PR(D)}{3}.\,</math>

In other words, the PageRank conferred by an outbound link is equal to the document's own PageRank score divided by the normalized number of outbound links '''L( )''' (it is assumed that links to specific URLs only count once per document).

:<math>PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}. \,</math>

In the general case, the PageRank value for any page '''u''' can be expressed as:

:<math>PR(u) = \sum_{v \in B_u} \frac{PR(v)}{L(v)}</math>,

i.e. the PageRank value for a page '''u''' is dependent on the PageRank values for each page '''v''' out of the set '''B<sub>u</sub>''' (this set contains all pages linking to page '''u'''), divided by the number ''L''(''v'') of links from page '''v'''.

===Damping factor===
The PageRank theory holds that even an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor ''d''. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.<ref name="originalpaper" />

The damping factor is subtracted from 1 (and in some variations of the algorithm, the result is divided by the number of documents (''N'') in the collection) and this term is then added to the product of the damping factor and the sum of the incoming PageRank scores. That is,

: <math>PR(A) = {1 - d \over N} + d \left( \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}+\,\cdots \right).</math>

So any page's PageRank is derived in large part from the PageRanks of other pages. The damping factor adjusts the derived value downward. The original paper, however, gave the following formula, which has led to some confusion:

: <math>PR(A)= 1 - d + d \left( \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}+\,\cdots \right).</math>

The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank gets multiplied by ''N'' and the sum becomes ''N''. A statement in Page and Brin's paper that "the sum of all PageRanks is one"<ref name="originalpaper" /> and claims by other Google employees<ref>[[Matt Cutts]]'s blog: [http://www.mattcutts.com/blog/seo-for-bloggers/ Straight from Google: What You Need to Know], see page 15 of his slides.</ref> support the first variant of the formula above.

To be more specific, in the latter formula, the probability for the random surfer reaching a page is weighted by the total number of web pages. So, in this version PageRank is an expected value for the random surfer visiting a page, when he restarts this procedure as often as the web has pages. If the web had 100 pages and a page had a PageRank value of 2, the random surfer would reach that page in an average twice if he restarts 100 times. Basically, the two formulas do not differ fundamentally from each other. A PageRank that has been calculated by using the former formula has to be multiplied by the total number of web pages to get the according PageRank that would have been calculated by using the latter formula. Even Page and Brin mixed up the two formulas in their most popular paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine", where they claim the latter formula to form a probability distribution over web pages with the sum of all pages' PageRanks being one.<ref name="originalpaper" />

Google recalculates PageRank scores each time it crawls the Web and rebuilds its index. As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents.<!-- If a million new documents are added and ALL of them point to page "A", does the initial approximation of A's PageRank decrease? Or should it say the AVERAGE PageRank of all old document decreases, rather than that all of them, unanimously, decrease? -->

The formula uses a model of a ''random surfer'' who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the chance that the random surfer will land on that page by clicking on a link. It can be understood as a [[Markov chain]] in which the states are pages, and the transitions are all equally probable and are the links between pages.

If a page has no links to other pages, it becomes a sink and therefore terminates the random surfing process. If the random surfer arrives at a sink page, it picks another [[Uniform Resource Locator|URL]] at random and continues surfing again.

When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are therefore divided evenly among all other pages. In other words, to be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually ''d ''= 0.85, estimated from the frequency that an average surfer uses his or her browser's bookmark feature.

So, the equation is as follows:

:<math>PR(p_i) = \frac{1-d}{N} + d \sum_{p_j \in M(p_i)} \frac{PR (p_j)}{L(p_j)}</math>

where <math>p_1, p_2, ..., p_N</math> are the pages under consideration, <math>M(p_i)</math> is the set of pages that link to <math>p_i</math>, <math>L(p_j)</math> is the number of outbound links on page <math>p_j</math>, and ''N'' is the total number of pages.

The PageRank values are the entries of the dominant [[eigenvector]] of the modified [[adjacency matrix]]. This makes PageRank a particularly elegant metric: the eigenvector is

:<math>
\mathbf{R} =
\begin{bmatrix}
PR(p_1) \\
PR(p_2) \\
\vdots \\
PR(p_N)
\end{bmatrix}
</math>
where '''R''' is the solution of the equation
:<math>
\mathbf{R} =

\begin{bmatrix}
{(1-d)/ N} \\
{(1-d) / N} \\
\vdots \\
{(1-d) / N}
\end{bmatrix}

+ d

\begin{bmatrix}
\ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell(p_1,p_N) \\
\ell(p_2,p_1) & \ddots & & \vdots \\
\vdots & & \ell(p_i,p_j) & \\
\ell(p_N,p_1) & \cdots & & \ell(p_N,p_N)
\end{bmatrix}

\mathbf{R}

</math>

where the adjacency function <math>\ell(p_i,p_j)</math> is 0 if page <math>p_j</math> does not link to <math>p_i</math>, and normalized such that, for each ''j''

:<math>\sum_{i = 1}^N \ell(p_i,p_j) = 1</math>,

i.e. the elements of each column sum up to 1, so the matrix is a [[stochastic matrix]] (for more details see the [[#Computation|computation]] section below). Thus this is a variant of the [[eigenvector centrality]] measure used commonly in [[network theory|network analysis]].

Because of the large eigengap of the modified adjacency matrix above,<ref>
{{cite journal
| author = Taher Haveliwala and Sepandar Kamvar.
| year = 2003
| month = March
| page = 7056
| title = The Second Eigenvalue of the Google Matrix
| journal = Stanford University Technical Report
| url = http://www-cs-students.stanford.edu/~taherh/papers/secondeigenvalue.pdf
|format=PDF
| bibcode = 2003math......7056N
| arxiv = math/0307056
| class = math.FA
}}</ref> the values of the PageRank eigenvector are fast to approximate (only a few iterations are needed).

As a result of [[Markov process|Markov theory]], it can be shown that the PageRank of a page is the probability of being at that page after lots of clicks. This happens to equal <math>t^{-1}</math> where <math>t</math> is the [[expected value|expectation]] of the number of clicks (or random jumps) required to get from the page back to itself.

The main disadvantage is that it favors older pages, because a new page, even a very good one, will not have many links unless it is part of an existing site (a site being a densely connected set of pages, such as [[Wikipedia]]). The Google Directory (itself a derivative of the [[Open Directory Project]]) allows users to see results sorted by PageRank within categories. The Google Directory is the only service offered by Google where PageRank directly determines display order.{{Citation needed|date=October 2009}} In Google's other search services (such as its primary Web search) PageRank is used to weight the relevance scores of pages shown in search results.

Several strategies have been proposed to accelerate the computation of PageRank.<ref>{{cite journal |doi=10.1.1.118.5422 | title=Fast PageRank Computation via a Sparse Linear System | author= Gianna M. Del Corso, Antonio Gullí, Francesco Romani | journal=Internet Mathematics | year = 2005 | volume = 2 | issue = 3}}</ref>

Various strategies to manipulate PageRank have been employed in concerted efforts to improve search results rankings and monetize advertising links. These strategies have severely impacted the reliability of the PageRank concept, which seeks to determine which documents are actually highly valued by the Web community.

Google is known to penalize [[link farm]]s and other schemes designed to artificially inflate PageRank. In December 2007 Google started ''actively'' penalizing sites selling paid text links. How Google identifies link farms and other PageRank manipulation tools are among Google's [[trade secret]]s.

===Computation===

To summarize, PageRank can be either computed iteratively or algebraically. The iterative method can be viewed differently as the
[[power iteration]] method,<ref>{{cite conference |doi=10.1.1.18.5264 | title=PageRank computation and the structure of the web: Experiments and algorithms | author=Arasu, A. and Novak, J. and Tomkins, A. and Tomlin, J. | year=2002 | booktitle = Proceedings of the Eleventh International World Wide Web Conference, Poster Track | pages = 107–117 | location = Brisbane, Australia }}</ref><ref>{{cite arxiv |eprint=1002.2858 |author1=Massimo Franceschet |title=PageRank: Standing on the shoulders of giants |class=cs.IR |year=2010}}</ref> or power method. The basic mathematical operations performed in the iterative method and the power method are identical.

====Iterative====

In the former case, at <math>t=0</math>, an initial probability distribution is assumed, usually
:<math>PR(p_i; 0) = \frac{1}{N}</math>.

At each time step, the computation, as detailed above, yields
:<math>PR(p_i;t+1) = \frac{1-d}{N} + d \sum_{p_j \in M(p_i)} \frac{PR (p_j; t)}{L(p_j)}</math>,
or in matrix notation
:<math>\mathbf{R}(t+1) = d \mathcal{M}\mathbf{R}(t) + \frac{1-d}{N} \mathbf{1}</math>, &nbsp; &nbsp; &nbsp; (*)
where
<math>\mathbf{R}_i(t)=PR(p_i; t)</math> and <math>\mathbf{1}</math> is the column vector of length <math>N</math> containing only ones.

The matrix <math>\mathcal{M}</math> is defined as
: <math>\mathcal{M}_{ij} = \begin{cases} 1 /L(p_j) , & \mbox{if }j\mbox{ links to }i\ \\ 0, & \mbox{otherwise} \end{cases}
</math>
i.e.,
:<math>\mathcal{M} := (K^{-1} A)^T</math>,
where
<math>A</math> denotes the [[adjacency matrix]] of the graph and <math>K</math> is the diagonal matrix with the outdegrees in the diagonal.

The computation ends when for some small <math>\epsilon</math>
:<math>|\mathbf{R}(t+1) - \mathbf{R}(t)| < \epsilon</math>,
i.e., when convergence is assumed.

====Algebraic====

In the latter case, for <math>t \to \infty</math> (i.e., in the [[steady state]]), the above equation (*) reads
:<math>\mathbf{R} = d \mathcal{M}\mathbf{R} + \frac{1-d}{N} \mathbf{1}</math>. &nbsp; &nbsp; &nbsp; (**)
The solution is given by
:<math>\mathbf{R} = (\mathbf{I}-d \mathcal{M})^{-1} \frac{1-d}{N} \mathbf{1}</math>,
with the [[identity matrix]] <math>\mathbf{I}</math>.

The solution exists and is unique for <math>0 < d < 1</math>. This can be seen by noting that <math>\mathcal{M}</math> is by construction a [[stochastic matrix]] and hence has an eigenvalue equal to one because of the [[Perron–Frobenius theorem]].

====Power Method====

If the matrix <math>\mathcal{M}</math> is a transition probability, i.e., column-stochastic with no columns consisting of
just zeros and <math>\mathbf{R}</math> is a probability distribution (i.e., <math>|\mathbf{R}|=1</math>, <math>\mathbf{E}\mathbf{R}=1</math> where <math>\mathbf{E}</math> is matrix of all ones), Eq. (**) is equivalent to
:<math>\mathbf{R} = \left( d \mathcal{M} + \frac{1-d}{N} \mathbf{E} \right)\mathbf{R} =: \widehat{ \mathcal{M}} \mathbf{R}</math>. &nbsp; &nbsp; &nbsp; (***)
Hence PageRank <math>\mathbf{R}</math> is the principal eigenvector of <math>\widehat{\mathcal{M}}</math>. A fast and easy
way to compute this is using the [[Power iteration|power method]]: starting with an arbitrary vector <math>x(0)</math>, the operator <math>\widehat\mathcal{M}</math> is applied in succession, i.e.,
:<math> x(t+1) = \widehat{\mathcal{M}} x(t)</math>,
until
:<math>|x(t+1) - x(t)| < \epsilon</math>.

Note that in Eq. (***) the matrix on the right-hand side in the parenthesis can be interpreted as
:<math> \frac{1-d}{N} \mathbf{I} = (1-d)\mathbf{P} \mathbf{1}^t</math>,
where <math>\mathbf{P}</math> is an initial probability distribution. In the current case
:<math>\mathbf{P} := \frac{1}{N} \mathbf{1}</math>.

Finally, if <math>\mathcal{M}</math> has columns with only zero values, they should be replaced with the initial
probability vector
<math>\mathbf{P}</math>. In other words
:<math>\mathcal{M}^\prime := \mathcal{M} + \mathcal{D}</math>,
where the matrix <math>\mathcal{D}</math> is defined as
:<math>\mathcal{D} := \mathbf{P} \mathbf{D}^t</math>,
with
: <math>\mathbf{D}_i = \begin{cases} 1, & \mbox{if }L(p_i)=0\ \\ 0, & \mbox{otherwise} \end{cases}</math>
In this case, the above two computations using <math>\mathcal{M}</math> only give the same PageRank if their
results are normalized:
: <math> \mathbf{R}_{\textrm{power}} = \frac{\mathbf{R}_{\textrm{iterative}}}{|\mathbf{R}_{\textrm{iterative}}|} =
\frac{\mathbf{R}_{\textrm{algebraic}}}{|\mathbf{R}_{\textrm{algebraic}}|}</math>.

PageRank matlab/octave implementation
<source lang="matlab">
% Parameter M adjacency matrix where Mi,j represents the link from 'j' to 'i', such that for all 'j' sum(i, Mi,j) = 1
% Parameter d damping factor
% Parameter v_quadratic_error quadratic error for v
% Return v vector rank such as vi is the i-th rank from [0, 1]
function [v] = rank(M, d, v_quadratic_error)
N = size(M, 2);
v = rand(N, 1);
v = v ./ norm(v, 2);
last_v = ones(N, 1) * inf;
M_hat = (d .* M) + (((1 - d) / N) .* ones(N, N));
while(norm(v - last_v, 2) > v_quadratic_error)
last_v = v;
v = M_hat * v;
v = v ./ norm(v, 2);
end
</source>

Code example for the octave/matlab rank function
<source lang="matlab">
M = [0 0 0 0 1 ; 0.5 0 0 0 0 ; 0.5 0 0 0 0 ; 0 1 0.5 0 0 ; 0 0 0.5 1 0];
rank(M, 0.80, 0.001)
</source>

====Efficiency====

Depending on the framework used to perform the computation, the exact implementation of the methods, and the required accuracy of the result, the computation time of the these methods can vary greatly.
==Variations==
===Google Toolbar===
The [[Google Toolbar]]'s PageRank feature displays a visited page's PageRank as a whole number between 0 and 10. The most popular websites have a PageRank of 10. The least have a PageRank of 0. Google has not disclosed the precise method for determining a Toolbar PageRank value. The displayed value is not the actual value Google uses so it is only a rough guide. ‘Toolbar’ PageRank is different from Google PageRank because the PageRank displayed in the toolbar is not 100% reflective of the way Google judges the value of a website.

PageRank measures number of sites which link to particular page.<ref>[http://www.google.com/support/forum/p/Webmasters/thread?tid=4aeb4d5fce33350b&hl=en Google Webmaster central] discussion on PR</ref> The PageRank of a particular page is roughly based upon the quantity of inbound links as well as the PageRank of the pages providing the links. Other factors are also part of the algorithm such as the size of a page, the number of changes and its up-to-dateness, the key texts in headlines and the words of hyperlinked anchor texts.<ref name="gpower"/>

The Google Toolbar's PageRank is updated infrequently, so often shows out of date values. The last confirmed update was 27 June 2011,<ref name="autogenerated1">http://googlepagerankupdate.com{{cite web|title = Google Page Rank Update
|first = | last = |date= May. 29, 2011}}</ref><ref>[http://www.seroundtable.com/pagerank-update-jan11-12832.html SeRoundTable] January 2011 Google Toolbar PageRank Update</ref> however, there was previously no confirmed update for more than a year.
<!-- Please cite Google-confirmed references *only* about the PR updates, lots of unconfirmed rumors on the web -->

===SERP Rank===
The [[Search engine results page]] (SERP) is the actual result returned by a search engine in response to a keyword query. The SERP consists of a list of links to web pages with associated text snippets. The SERP rank of a web page refers to the placement of the corresponding link on the SERP, where higher placement means higher SERP rank. The SERP rank of a web page is not only a function of its PageRank, but depends on a relatively large and continuously adjusted set of factors (over 200),<ref name="Vaughn09">{{cite web
| author = Aubuchon, Vaughn
| title = Google Ranking Factors - SEO Checklist
| url = http://www.vaughns-1-pagers.com/internet/google-ranking-factors.htm
}}</ref><ref>{{cite web
| url = http://www.seomoz.org/article/search-ranking-factors
| title = Search Engine Ranking Factors - Version 2
| first = Rand
| last = Fishkin
| authorlink = Rand Fishkin
| coauthors = Jeff Pollard
| publisher = seomoz.org
| date = April 2, 2007
| accessdate = May 11, 2009
}}</ref> commonly referred to by internet marketers as "Google Love".<ref>http://www.infoworld.com/t/search-engines/google-corrupt-search-me-428</ref> [[Search engine optimization]] (SEO) is aimed at achieving the highest possible SERP rank for a website or a set of web pages.

With the introduction of [[Google Places]] into the mainstream organic SERP, PageRank plays little to no role in ranking a business in the Local Business Results.<ref>{{cite web|url=http://google.com/support/places/bin/answer.py?hl=en&answer=7091 |title=Ranking of listings : Ranking - Google Places Help |publisher=Google.com |date= |accessdate=2011-05-27}}</ref> While the theory of citations is still computed in their algorithm, PageRank is not a factor since Google ranks business listings and not web pages.

===Google directory PageRank===
The [[Google Directory]] PageRank is an 8-unit measurement. These values can be viewed in the Google Directory. Unlike the Google Toolbar, which shows the PageRank value by a mouseover of the green bar, the Google Directory does not show the PageRank as a numeric value but only as a green bar.

===False or spoofed PageRank===
While the PageRank shown in the Toolbar is considered to be derived from an accurate PageRank value (at some time prior to the time of publication by Google) for most sites, this value was at one time easily manipulated. A previous flaw was that any low PageRank page that was redirected, via a [[HTTP 302]] response or a "Refresh" [[meta tag]], to a high PageRank page caused the lower PageRank page to acquire the PageRank of the destination page. In theory a new, PR 0 page with no incoming links could have been redirected to the Google home page—which is a PR 10—and then the PR of the new page would be upgraded to a PR10. This [[Website spoofing|spoofing]] technique, also known as [[Page hijacking|302 Google Jacking]], was a known failing or bug in the system. Any page's PageRank could have been spoofed to a higher or lower number of the webmaster's choice and only Google has access to the real PageRank of the page. Spoofing is generally detected by running a Google search for a URL with questionable PageRank, as the results will display the URL of an entirely different site (the one redirected to) in its results.

===Manipulating PageRank===
For [[search engine optimization]] purposes, some companies offer to sell high PageRank links to webmasters.<ref name="Cutts-0414"/> As links from higher-PR pages are believed to be more valuable, they tend to be more expensive. It can be an effective and viable marketing strategy to buy link advertisements on content pages of quality and relevant sites to drive traffic and increase a webmaster's link popularity. However, Google has publicly warned webmasters that if they are or were discovered to be selling links for the purpose of conferring PageRank and reputation, their links will be devalued (ignored in the calculation of other pages' PageRanks). The practice of buying and selling links is intensely debated across the Webmaster community. Google advises webmasters to use the [[nofollow]] [[HTML]] [[attribute (computing)|attribute]] value on sponsored links. According to [[Matt Cutts]], Google is concerned about webmasters who try to [[game the system]], and thereby reduce the quality and relevancy of Google search results.<ref name="Cutts-0414">{{cite web|publisher=mattcutts.com/blog|date=April 14, 2007|accessdate=2007-05-28|title=How to report paid links|url=http://www.mattcutts.com/blog/how-to-report-paid-links/}}</ref>

===The intentional surfer model===
The original PageRank algorithm reflects the so-called random surfer model, meaning that the PageRank of a particular page is derived from the theoretical probability of visiting that page when clicking on links at random. However, real users do not randomly surf the web, but follow links according to their interest and intention. A page ranking model that reflects the importance of a particular page as a function of how many actual visits it receives by real users is called the ''intentional surfer model''.<ref name="Jos07">{{Cite book | author = J&oslash;sang, A. | contribution = Trust and Reputation Systems | title = Foundations of Security Analysis and Design IV, FOSAD 2006/2007 Tutorial Lectures. | year = 2007 | publisher = Springer LNCS 4677 | editor = Aldini, A. | pages = 209–245 | doi = 10.1007/978-3-540-74810-6 | url = http://www.unik.no/people/josang/papers/Jos2007-FOSAD.pdf|format=PDF | volume = 4677}}</ref> The Google toolbar sends information to Google for every page visited, and thereby provides a basis for computing PageRank based on the intentional surfer model. The introduction of the [[nofollow]] attribute by Google to combat [[Spamdexing]] has the side effect that webmasters commonly use it on outgoing links to increase their own PageRank. This causes a loss of actual links for the Web crawlers to follow, thereby making the original PageRank algorithm based on the random surfer model potentially unreliable. Using information about users' browsing habits provided by the Google toolbar partly compensates for the loss of information caused by the [[nofollow]] attribute. The [[Search engine results page|SERP]] rank of a page, which determines a page's actual placement in the search results, is based on a combination of the random surfer model (PageRank) and the intentional surfer model (browsing habits) in addition to other factors.<ref name="SEOnotepad09">{{cite web | author = SEOnotepad | title = Myth of the Google Toolbar Ranking | url = http://www.seonotepad.com/search-engines/google-seo/myth-of-the-google-toolbar-ranking/ }}</ref>

===Other uses===
A version of PageRank has recently been proposed as a replacement for the traditional [[Institute for Scientific Information]] (ISI) [[impact factor]],<ref>{{cite journal | author = Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. | year = 2006 | month = December | title = Journal Status | page = 1030 | journal = Scientometrics | volume = 69 | issue = 3 | arxiv = cs.GL/0601030 | bibcode = 2006cs........1030B}}</ref> and implemented at [http://www.eigenfactor.org eigenfactor.org]. Instead of merely counting total citation to a journal, the "importance" of each citation is determined in a PageRank fashion.

A similar new use of PageRank is to rank academic doctoral programs based on their records of placing their graduates in faculty positions. In PageRank terms, academic departments link to each other by hiring their faculty from each other (and from themselves).<ref>{{cite journal | author = Benjamin M. Schmidt and Matthew M. Chingos | title=Ranking Doctoral Programs by Placement: A New Method | year=2007 | journal=PS: Political Science and Politics | volume=40 | issue=July | pages=523–529 | url=http://www.people.fas.harvard.edu/~gillum/rankings_paper.pdf |format=PDF}}</ref>

PageRank has been used to rank spaces or streets to predict how many people (pedestrians or vehicles) come to the individual spaces or streets.<ref>{{cite journal | author = B. Jiang | title = Ranking spaces for predicting human movement in an urban environment | year=2006 | journal=International Journal of Geographical Information Science | volume=23 | issue = 7 | pages=823–837 | arxiv=physics/0612011 | doi = 10.1080/13658810802022822}}</ref><ref>{{cite journal | author = Jiang B., Zhao S., and Yin J. | title = Self-organized natural roads for predicting traffic flow: a sensitivity study | year=2008 | journal=Journal of Statistical Mechanics: Theory and Experiment | page = 008 | issue = 07 | volume=P07008 | arxiv=0804.1630 | doi = 10.1088/1742-5468/2008/07/P07008 | bibcode = 2008JSMTE..07..008J }}</ref> In [[lexical semantics]] it has been used to perform [[Word Sense Disambiguation]]<ref>Roberto Navigli, Mirella Lapata. [http://www.dsi.uniroma1.it/~navigli/pubs/PAMI_2010_Navigli_Lapata.pdf "An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation"]. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(4), IEEE Press, 2010, pp. 678–692.</ref> and to automatically rank [[WordNet]] [[synsets]] according to how strongly they possess a given semantic property, such as positivity or negativity.<ref>{{cite web | author = Andrea Esuli and Fabrizio Sebastiani | title=PageRanking WordNet synsets: An Application to Opinion-Related Properties | work=In Proceedings of the 35th Meeting of the Association for Computational Linguistics, Prague, CZ, 2007, pp. 424–431 | url=http://nmis.isti.cnr.it/sebastiani/Publications/ACL07.pdf |format=PDF | accessdate=June 30, 2007 }}</ref>

A dynamic weighting method similar to PageRank has been used to generate customized reading lists based on the link structure of Wikipedia.<ref>{{cite journal | author = Wissner-Gross, A. D. | url = http://www.alexwg.org/publications/ProcIEEEICALT_6-825.pdf | title=Preparation of topical readings lists from the link structure of Wikipedia | journal = Proceedings of the IEEE International Conference on Advanced Learning Technology | location = Rolduc, Netherlands | year = 2006 | page = 825 | doi = 10.1109/ICALT.2006.1652568}}</ref>

A [[Web crawler]] may use PageRank as one of a number of importance metrics it uses to determine which URL to visit during a crawl of the web. One of the early working papers
<ref>{{cite web
| title=Working Papers Concerning the Creation of Google
| work=Google
| url=http://dbpubs.stanford.edu:8091/diglib/pub/projectdir/google.html
| accessdate=November 29, 2006
}}</ref> that were used in the creation of Google is ''Efficient crawling through URL ordering'',<ref>{{cite journal
|author = Cho, J., Garcia-Molina, H., and Page, L.
| url = http://dbpubs.stanford.edu:8090/pub/1998-51
| title = Efficient crawling through URL ordering
| journal = Proceedings of the seventh conference on World Wide Web
| location = Brisbane, Australia
| year = 1998}}</ref>
which discusses the use of a number of different importance metrics to determine how deeply, and how much of a site Google will crawl. PageRank is presented as one of a number of these importance metrics, though there are others listed such as the number of inbound and outbound links for a URL, and the distance from the root directory on a site to the URL.

The PageRank may also be used as a [http://de.scientificcommons.org/23846375 methodology] to measure the apparent impact of a community like the [[Blogosphere]] on the overall Web itself. This approach uses therefore the PageRank to measure the distribution of attention in reflection of the [[Scale-free network]] paradigm.

In any ecosystem, a modified version of PageRank may be used to determine species that are essential to the continuing health of the environment.<ref>{{cite news|last=Burns |first=Judith |url=http://news.bbc.co.uk/2/hi/science/nature/8238462.stm |title=Google trick tracks extinctions |publisher=BBC News |date=2009-09-04 |accessdate=2011-05-27}}</ref>

An application of PageRank to the analysis of protein networks in biology is reported recently.<ref>{{cite journal
|doi = 10.1093/bioinformatics/btq680
|author = G. Ivan and V. Grolmusz
| url = http://bioinformatics.oxfordjournals.org/content/27/3/405
| title = When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks
| journal = Bioinformatics
|volume = 27
|issue = 3
|pages = 405–7
| location = Vol. 27, No. 3. pp. 405-407
| year = 2011
|pmid = 21149343}}</ref>
==nofollow==
In early 2005, Google implemented a new value, "[[nofollow]]",<ref>{{cite web | title=Preventing Comment Spam | work=Google | url=http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html | accessdate=January 1, 2005 }}</ref> for the [[Semantic link|rel]] attribute of HTML link and anchor elements, so that website developers and [[blog]]gers can make links that Google will not consider for the purposes of PageRank—they are links that no longer constitute a "vote" in the PageRank system. The nofollow relationship was added in an attempt to help combat [[spamdexing]].

As an example, people could previously create many message-board posts with links to their website to artificially inflate their PageRank. With the nofollow value, message-board administrators can modify their code to automatically insert "rel='nofollow'" to all hyperlinks in posts, thus preventing PageRank from being affected by those particular posts. This method of avoidance, however, also has various drawbacks, such as reducing the link value of legitimate comments. (See: [[Spam in blogs#nofollow]])

In an effort to manually control the flow of PageRank among pages within a website, many webmasters practice what is known as PageRank Sculpting<ref>{{cite web|url=http://www.seomoz.org/blog/pagerank-sculpting-parsing-the-value-and-potential-benefits-of-sculpting-pr-with-nofollow |title=PageRank Sculpting: Parsing the Value and Potential Benefits of Sculpting PR with Nofollow |publisher=SEOmoz |date= |accessdate=2011-05-27}}</ref>—which is the act of strategically placing the nofollow attribute on certain internal links of a website in order to funnel PageRank towards those pages the webmaster deemed most important. This tactic has been used since the inception of the nofollow attribute, but may no longer be effective since Google announced that blocking PageRank transfer with nofollow does not redirect that PageRank to other links.<ref>{{cite web|url=http://www.mattcutts.com/blog/pagerank-sculpting/ |title=PageRank sculpting |publisher=Mattcutts.com |date=2009-06-15 |accessdate=2011-05-27}}</ref>
== Deprecation ==

PageRank was once available for the verified site maintainers through the Google Webmaster Tools interface. However on October 15, 2009, a Google employee confirmed<ref name="Moskwa"/> that the company had removed PageRank from its ''Webmaster Tools'' section, explaining that "We’ve been telling people for a long time that they shouldn’t focus on PageRank so much; many site owners seem to think it's the most important metric for them to track, which is simply not true."<ref name="Moskwa">{{cite web
| author = Susan Moskwa
| title = PageRank Distribution Removed From WMT
| contribution = PageRank Distribution Removed From WMT
| url = http://www.google.com/support/forum/p/Webmasters/thread?tid=6a1d6250e26e9e48&hl=en
| accessdate=October 16, 2009
| postscript = <!-- Bot inserted parameter. Either remove it; or change its value to "." for the cite to end in a ".", as necessary. -->{{inconsistent citations}}
}}</ref> The PageRank indicator is not available in Google's own [[Google Chrome|Chrome]] browser.

The visible page rank is updated very infrequently. After the rank in the Google toolbar was updated in 3 April 2010,<ref name="autogenerated1"/> there was no confirmed update for more than 9 months, while it was finally updated afterwards{{Citation needed|date=August 2011}}<!-- most likely was, cannot find original reference from Google -->. Some authors ask not ''when'' but rather ''if'' the rank will be updated again.<ref name='ditch'>{{cite web|url=http://www.infocarnivore.com/2010/12/05/is-google-ditching-pagerank/ |title=Is Google Ditching PageRank? |publisher=Infocarnivore.com |date=2010-12-05 |accessdate=2011-05-27}}</ref>

Google may be moving away from a publicly visible page rank in order to motivate work on content quality by discouraging [[content farm]]ing as a means of boosting a site's reputation. A visible, high rank can inflate the value of a site, and provides useful feedback for paid [[search engine optimization|SEOs]] that add no value to the system.<ref name='ditch'/> In addition, Google has been criticized and [[Search engine optimization#Legal precedents|even sued]] for perceived low ranks.

On {{date|6 October 2011}}, many users mistakenly thought Google PageRank was gone. As it turns out, it was simply an update to the URL used to query the PageRank from Google.<ref name=whatculture>{{cite news|first=Peter|url=http://whatculture.com/technology/google-pagerank-is-not-dead.php|accessdate=7 October 2011|newspaper=WhatCulture!|date=6 October 2011}}</ref>
==Notes==
{{Reflist|2}}


===Centrality measures===
===Centrality measures===

Action parameters

VariableValue
Name of the user account (user_name)
'156.56.172.235'
Page ID (page_id)
16981683
Page namespace (page_namespace)
0
Page title without namespace (page_title)
'Network science'
Full page title (page_prefixedtitle)
'Network science'
Action (action)
'edit'
Edit summary/reason (summary)
'/* Web link analysis */ '
Whether or not the edit is marked as minor (no longer in use) (minor_edit)
false
Old page wikitext, before the edit (old_wikitext)
''''Network science''' is an area of [[computer science]] and [[network science]] and part of [[graph theory]] that examines the interconnections among diverse physical or [[communication networks|engineered networks]], [[information networks]], [[Biological neural network|biological networks]], cognitive and [[semantic networks]], and [[social networks]]. This field of science seeks to discover common principles, algorithms and tools that govern network behavior. The National Research Council defines Network Science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena." == Background and history == The study of networks has emerged in diverse disciplines as a means of analyzing complex relational data. The earliest known paper in this field is the famous [[Seven Bridges of Königsberg]] written by [[Leonhard Euler]] in 1736. Euler's mathematical description of vertices and edges was the foundation of [[graph theory]], a branch of mathematics that studies the properties of pairwise relations in a network structure. The field of [[graph theory]] continued to develop and found applications in chemistry (Sylvester, 1878). In the 1930s [[Jacob Moreno]], a psychologist in the Gestalt tradition, arrived in the United States. He developed the sociogram and presented it to the public in April 1933 at a convention of medical scholars. Moreno claimed that "before the advent of sociometry no one knew what the interpersonal structure of a group 'precisely' looked like (Moreno, 1953). The sociogram was a representation of the social structure of a group of elementary school students. The boys were friends of boys and the girls were friends of girls with the exception of one boy who said he liked a single girl. The feeling was not reciprocated. This network representation of social structure was found so intriguing that it was printed in [[The New York Times]] (April 3, 1933, page 17). The sociogram has found many applications and has grown into the field of [[social network analysis]]. Probabilistic theory in network science developed as an off-shoot of [[graph theory]] with [[Paul Erdős]] and [[Alfréd Rényi]]'s eight famous papers on [[random graphs]]. For [[social networks]] the [[exponential random graph model]] or p* is a notational framework used to represent the probability space of a tie occurring in a [[social network]]. An alternate approach to network probability structures is the [[network probability matrix]], which models the probability of edges occurring in a network, based on the historic presence or absence of the edge in a sample of networks. In 1998, David Krackhardt and Kathleen Carley introduced the idea of a meta-network with the PCANS Model. They suggest that "all organizations are structured along these three domains, Individuals, Tasks, and Resources". Their paper introduced the concept that networks occur across multiple domains and that they are interrelated. This field has grown into another sub-discipline of network science called [[dynamic network analysis]]. More recently other network science efforts have focused on mathematically describing different network topologies. Duncan Watts reconciled empirical data on networks with mathematical representation, describing the [[small-world network]]. [[Albert-László Barabási]] and Reka Albert developed the [[scale-free network]] which is a loosely defined network topology that contains hub vertices with many connections, that grow in a way to maintain a constant ratio in the number of the connections versus all other nodes. Although many networks, such as the internet, appear to maintain this aspect, other networks have long tailed distributions of nodes that only approximate scale free ratios. Today, network science is an exciting and growing field. Scientists from many diverse fields are working together. Network science holds the promise of increasing collaboration across disciplines, by sharing data, algorithms, and software tools. == Network Models == Network models serve as a foundation to understanding interactions within empirical complex networks. Various [[random graph]] generation models produce network structures that may be used in comparison to real-world complex networks. ===Department of Defense Initiatives=== The U.S. military first became interested in [[network-centric warfare]] as an operational concept based on network science in 1996. John A. Parmentola, the U.S. Army Director for Research and Laboratory Management, proposed to the Army’s Board on Science and Technology (BAST) on December 1, 2003 that Network Science become a new Army research area. The BAST, the Division on Engineering and Physical Sciences for the National Research Council (NRC) of the National Academies, serves as a convening authority for the discussion of science and technology issues of importance to the Army and oversees independent Army-related studies conducted by the National Academies. The BAST conducted a study to find out whether identifying and funding a new field of investigation in basic research, Network Science, could help close the gap between what is needed to realize Network-Centric Operations and the current primitive state of fundamental knowledge of networks. As a result, the BAST issued the NRC study in 2005 titled Network Science (referenced above) that defined a new field of basic research in Network Science for the Army. Based on the findings and recommendations of that study and the subsequent 2007 NRC report titled Strategy for an Army Center for Network Science, Technology, and Experimentation, Army basic research resources were redirected to initiate a new basic research program in Network Science. To build a new theoretical foundation for complex networks, some of the key Network Science research efforts now ongoing in Army laboratories address: * Mathematical models of network behavior to predict performance with network size, complexity, and environment * Optimized human performance required for network-enabled warfare * Networking within ecosystems and at the molecular level in cells. As initiated in 2004 by Frederick I. Moxley with support he solicited from [[David S. Alberts]], the Department of Defense helped to establish the first Network Science Center in conjunction with the U.S. Army at the United States Military Academy. Subsequently, the U.S. Department of Defense has funded numerous research projects in the area of Network Science. In 2006, the U.S. Army and the United Kingdom (UK) formed the Network and Information Science [[International Technology Alliance]], a collaborative partnership among the Army Research Laboratory, UK Ministry of Defense and a consortium of industries and universities in the U.S. and UK. The goal of the alliance is to perform basic research in support of Network- Centric Operations across the needs of both nations. In 2009, the U.S. Army formed the [[Network Science CTA]], a collaborative research alliance among the [[Army Research Laboratory]], [[CERDEC]], and a consortium of about 30 industrial R&D labs and universities in the U.S. The goal of the alliance is to develop a deep understanding of the underlying commonalities among intertwined social/cognitive, information, and communications networks, and as a result improve our ability to analyze, predict, design, and influence complex systems interweaving many kinds of networks. ==Network analysis== ===Social network analysis=== '''[[Social network]] analysis''' examines the structure of relationships between social entities.<ref>Wasserman, Stanley and Katherine Faust. 1994. ''Social Network Analysis: Methods and Applications.'' Cambridge: Cambridge University Press. </ref> These entities are often persons, but may also be [[Group (sociology)|groups]], [[organizations]], [[nation states]], [[web sites]], [[scientometrics|scholarly publications]]. Since the 1970s, the empirical study of networks has played a central role in social science, and many of the [[Mathematics|mathematical]] and [[Statistics|statistical]] tools used for studying networks have been first developed in [[sociology]].<ref name="Newman">Newman, M.E.J. ''Networks: An Introduction.'' Oxford University Press. 2010</ref> Amongst many other applications, social network analysis has been used to understand the [[diffusion of innovations]], news and rumors. Similarly, it has been used to examine the spread of both [[epidemiology|diseases]] and [[Medical sociology|health-related behaviors]]. It has also been applied to the [[Economic sociology|study of markets]], where it has been used to examine the role of trust in [[Social exchange|exchange relationships]] and of social mechanisms in setting prices. Similarly, it has been used to study recruitment into [[political movement]]s and social organizations. It has also been used to conceptualize scientific disagreements as well as academic prestige. More recently, network analysis (and its close cousin [[traffic analysis]]) has gained a significant use in military intelligence, for uncovering insurgent networks of both hierarchical and [[leaderless resistance|leaderless]] nature. ===Biological network analysis=== With the recent explosion of publicly available high throughput biological data, the analysis of molecular networks has gained significant interest. The type of analysis in this content are closely related to social network analysis, but often focusing on local patterns in the network. For example [[network motif]]s are small subgraphs that are over-represented in the network. [[Activity motifs]] are similar over-represented patterns in the attributes of nodes and edges in the network that are over represented given the network structure. ===Link analysis=== Link analysis is a subset of network analysis, exploring associations between objects. An example may be examining the addresses of suspects and victims, the telephone numbers they have dialed and financial transactions that they have partaken in during a given timeframe, and the familial relationships between these subjects as a part of police investigation. Link analysis here provides the crucial relationships and associations between very many objects of different types that are not apparent from isolated pieces of information. Computer-assisted or fully automatic computer-based link analysis is increasingly employed by [[bank]]s and [[insurance]] agencies in [[fraud]] detection, by telecommunication operators in telecommunication network analysis, by medical sector in [[epidemiology]] and [[pharmacology]], in law enforcement [[Criminal procedure|investigation]]s, by [[search engine]]s for [[relevance]] rating (and conversely by the [[search engine spammer|spammers]] for [[spamdexing]] and by business owners for [[search engine optimization]]), and everywhere else where relationships between many objects have to be analyzed. ====Network robustness==== The structural robustness of networks <ref>{{cite book |title= Complex Networks: Structure, Robustness and Function | author =R. Cohen, S. Havlin|year= 2010 |publisher= Cambridge University Press |url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_com_net.php}}</ref> is studied using [[percolation theory]]. When a critical fraction of nodes is removed the network becomes fragmented into small clusters. This phenomenon is called percolation <ref>{{cite book |title= Fractals and Disordered Systems | author =A. Bunde, S. Havlin|year= 1996 |publisher= Springer |url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_fds.php}}</ref> and it represents an order-disorder type of [[phase transition]] with [[critical exponents]]. ====Percolation==== ==Introduction== A representative question (and the [[etymology|source]] of the name) is as follows. Assume that some liquid is poured on top of some [[porosity|porous]] material. Will the liquid be able to make its way from hole to hole and reach the bottom? This physical question is [[mathematical model|modelled]] mathematically as a three-dimensional network of ''n''&nbsp;&times;&nbsp;''n''&nbsp;&times;&nbsp;''n'' points (or [[graph (mathematics)|vertices]]/sites) the connections (or [[graph (mathematics)|edge]]s/bonds) between each two neighbors may be open (allowing the liquid through) with probability ''p'', or closed with probability 1&nbsp;–&nbsp;''p'', and they are assumed to be independent. Therefore, for a given ''p'', what is the probability that an open path exists from the top to the bottom? The behavior for large&nbsp;''n'' is of primary interest. This problem, called now bond-percolation, was introduced in the mathematics literature by {{harvtxt|Broadbent|Hammersley|1957}}, and has been studied intensively by mathematicians and physicists since. When a site is occupied with probability ''p'' or empty (its edges are also removed) with probability ''1-p'', the problem is called "site percolation". The question is the same: for a given ''p'', what is the probability that a path exists between top and bottom? Of course the same questions can be asked for any lattice dimension. As is quite typical, it is actually easier to examine [[Infinite graph|infinite]] networks than just large ones. In this case the corresponding question is: does an infinite open cluster exist? That is, is there a path of connected points of infinite length "through" the network? By [[Kolmogorov's zero-one law]], for any given ''p'', the probability that an infinite cluster exists is either zero or one. Since this probability is an increasing function of ''p'' (proof via [[Coupling (probability)|coupling]] argument), there must be a '''critical''' ''p'' (denoted by&nbsp;''p''<sub>c</sub>) below which the probability is always 0 and above which the probability is always&nbsp;1. In practice, this criticality is very easy to observe. Even for ''n'' as small as 100, the probability of an open path from the top to the bottom increases sharply from very close to zero to very close to one in a short span of values of&nbsp;''p''. [[Image:Bond percolation p 51.png|thumb|Detail of a bond percolation on the square lattice in two dimensions with percolation probability ''p''&nbsp;=&nbsp;0.51]] In some cases ''p''<sub>c</sub> may be calculated explicitly. For example, for the [[square lattice]] '''Z'''<sup>2</sup> in two dimensions, ''p''<sub>c</sub>&nbsp;=&nbsp;1/2, a fact which was an open question for more than 20 years and was finally resolved by [[Harry Kesten]] in the early 1980s, see {{harvtxt|Kesten|1982}}. A limit case for lattices in many dimensions is given by the [[Bethe lattice]], whose threshold is at ''p''<sub>c</sub>&nbsp;=&nbsp;1/(''z''&nbsp;&minus;&nbsp;1) for a [[coordination number]]&nbsp;''z''. For most infinite lattice graphs, ''p''<sub>c</sub> cannot be calculated exactly. For example, ''p''<sub>c</sub> is not known for bond-percolation in [[hypercubic]] lattice in two dimensions. ==Universality== The [[Universality (dynamical systems)|universality principle]] states that the value of ''p''<sub>c</sub> is connected to the local structure of the graph, while the behavior of clusters below, at, and above ''p''<sub>c</sub> are invariant with respect to the local structure, and therefore, in some sense are more natural quantities to consider. This universality also means that for the same dimension independent of the type of the lattice or type of percolation (e.g., bond or site) the [[fractal dimension]] of the clusters at ''p''<sub>c</sub> is the same. == Phases == ===Subcritical and supercritical=== The main fact in the subcritical phase is "exponential decay". That is, when ''p''&nbsp;<&nbsp;''p''<sub>c</sub>, the probability that a specific point (for example, the origin) is contained in an open cluster of size ''r'' decays to zero [[Big O notation#Common orders of functions|exponentially]] in&nbsp;''r''. This was proved for percolation in three and more dimensions by {{harvtxt|Menshikov|1986}} and independently by {{harvtxt|Aizenman|Barsky|1987}}. In two dimensions, it formed part of Kesten's proof that ''p''<sub>c</sub>&nbsp;=&nbsp;1/2. The [[dual graph]] of the square lattice '''Z'''<sup>2</sup> is also the square lattice. It follows that, in two dimensions, the supercritical phase is dual to a subcritical percolation process. This provides essentially full information about the supercritical model with ''d''&nbsp;=&nbsp;2. The main result for the supercritical phase in three and more dimensions is that, for sufficiently large&nbsp;''N'', there is an infinite open cluster in the two-dimensional slab&nbsp;'''Z'''<sup>2</sup>&nbsp;&times;&nbsp;[0,&nbsp;''N'']<sup>''d''&minus;2</sup>. This was proved by {{harvtxt|Grimmett|Marstrand|1990}}. In two dimensions with ''p''&nbsp;<&nbsp;1/2, there is with probability one a unique infinite closed cluster. Thus the subcritical phase may be described as finite open islands in an infinite closed ocean. When ''p''&nbsp;>&nbsp;1/2 just the opposite occurs, with finite closed islands in an infinite open ocean. The picture is more complicated when ''d''&nbsp;≥&nbsp;3 since ''p''<sub>c</sub>&nbsp;<&nbsp;1/2, and there is coexistence of infinite open and closed clusters for ''p'' between ''p''<sub>c</sub> and&nbsp;1&nbsp;&minus;&nbsp;''p''<sub>c</sub>. ===Critical === The model has a [[mathematical singularity|singularity]] at the critical point ''p''&nbsp;=&nbsp;''p''<sub>c</sub> believed to be of power-law type. [[Critical scaling|Scaling theory]] predicts the existence of [[critical exponents]], depending on the number ''d'' of dimensions, that determine the class of the singularity. When ''d''&nbsp;=&nbsp;2 these predictions are backed up by arguments from [[quantum field theory]] and [[quantum gravitation]], and include predicted numerical values for the exponents. Most of these predictions are conjectural except when the number ''d'' of dimensions satisfies either ''d''&nbsp;=&nbsp;2 or&nbsp;''d''&nbsp;≥&nbsp;19. They include: * There are no infinite clusters (open or closed) * The probability that there is an open path from some fixed point (say the origin) to a distance of ''r'' decreases ''polynomially'', i.e. is [[big O notation|on the order of]] ''r''<sup>&nbsp;''α''</sup> for some&nbsp;''α'' ** ''α'' does not depend on the particular lattice chosen, or on other local parameters. It depends only on value of the dimension ''d'' (this is an instance of the [[Universality (dynamical systems)|universality]] principle). ** ''α''<sub>''d''</sub> decreases from ''d''&nbsp;=&nbsp;2 until ''d''&nbsp;=&nbsp;6 and then stays fixed. ** ''α''<sub>6</sub>&nbsp;=&nbsp;&minus;1 ** ''α''<sub>2</sub>&nbsp;=&nbsp;&minus;5/48. * The shape of a large cluster in two dimensions is [[conformal map|conformally invariant]]. See {{harvtxt|Grimmett|1999}}. In dimension&nbsp;≥&nbsp;19, these facts are largely proved using a technique known as the [[lace expansion]]. It is believed that a version of the lace expansion should be valid for 7 or more dimensions, perhaps with implications also for the threshold case of 6 dimensions. The connection of percolation to the lace expansion is found in {{harvtxt|Hara|Slade|1990}}. In dimension&nbsp;2, the first fact ("no percolation in the critical phase") is proved for many lattices, using duality. Substantial progress has been made on two-dimensional percolation through the conjecture of [[Oded Schramm]] that the [[scaling limit]] of a large cluster may be described in terms of a [[Schramm&ndash;Loewner evolution]]. This conjecture was proved by {{harvtxt|Smirnov|2001}} in the special case of site percolation on the triangular lattice. ==Different models== *The first model studied was [[Bernoulli percolation]]. In this model all bonds are independent. This model is called bond percolation by physicists. *A generalization next was introduced as the [[Fortuin–Kasteleyn random cluster model]], which has many connections with the [[Ising model]] and other [[Potts model]]s. *Bernoulli (bond) percolation on [[complete graph]]s is an example of a [[random graph]]. The critical probability is&nbsp;''p''&nbsp;=&nbsp;1/''N''. *[[Directed percolation]], which has connections with the [[contact process (mathematics)|contact process]]. *[[First passage percolation]]. *[[Invasion percolation]]. * Percolation with dependency links was introduced by [http://havlin.biu.ac.il/Publications.php?keyword=bashan+%26+parshani&year=2011&match=any Parshani et.al.] * Opinion Model <!-- LOTS of todo! Other models of percolation. references. Percolation on graphs.--> ==References== {{Reflist}} *{{Citation | last1=Aizenman | first1=Michael | authorlink1=Michael Aizenman | last2=Barsky |first2=David | title=Sharpness of the phase transition in percolation models | journal=Communications in Mathematical Physics | volume=108 | year=1987 | pages=489–526 | doi=10.1007/BF01212322|bibcode = 1987CMaPh.108..489A | issue=3 }} *{{Citation | url=http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521872324 | title=Percolation | first1=Bela | last1=Bollobas | authorlink1=Bela Bollobas | first2=Oliver | last2=Riordan | publisher =Cambridge University Press | year= 2006 | doi=10.2277/}} *{{Citation | last1=Broadbent | first1=Simon | last2=Hammersley | first2=John | authorlink2=John Hammersley | title=Percolation processes I. Crystals and mazes | journal=Proceedings of the Cambridge Philosophical Society | volume=53 | pages=629–641 | year=1957|bibcode = 1957PCPS...53..629B |doi = 10.1017/S0305004100032680 }} *{{Citation | author= Bunde A. and [[Shlomo Havlin|Havlin S.]] | title=Fractals and Disordered Systems | publisher= Springer| year=1996| url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_fds.php}} *{{Citation | author= Cohen R. and [[Shlomo Havlin|Havlin S.]] | title=Complex Networks: Structure, Robustness and Function | publisher= Cambridge University Press | year=2010| url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_com_net.php}} *{{Citation | last1=Grimmett | first1=Geoffrey |authorlink1=Geoffrey Grimmett| title=Percolation | publisher=Springer | year=1999 | url=http://www.statslab.cam.ac.uk/~grg/papers/perc/perc.html}} *{{Citation | last1=Grimmett | first1=Geoffrey | authorlink1=Geoffrey Grimmett | last2=Marstrand | first2=John | title=The supercritical phase of percolation is well behaved | journal=Proceedings of the Royal Society (London), Series A | volume=430 | year=1990 | pages=439–457 | doi=10.1098/rspa.1990.0100|bibcode = 1990RSPSA.430..439G | issue=1879 }} *{{Citation | last1=Hara | first1=Takashi | last2=Slade | first2=Gordon | year=1990 | title=Mean-field critical behaviour for percolation in high dimensions | journal=Communications in Mathematical Physics | volume=128 | pages=333–391 | doi=10.1007/BF02108785|bibcode = 1990CMaPh.128..333H | issue=2 }} *{{Citation | title=Percolation theory for mathematicians | last1=Kesten | first1=Harry | publisher=Birkhauser | year=1982 | authorlink1=Harry Kesten}} *{{Citation | last1=Menshikov | first1=Mikhail | authorlink1=Mikhail Vasiliyevich Menshikov| title=Coincidence of critical points in percolation problems | journal=Soviet Mathematics Doklady | volume=33 | year=1986 | pages=856–859}} *{{Citation | last1=Smirnov | first1=Stanislav | authorlink1=Stanislav Smirnov | title=Critical percolation in the plane: conformal invariance, Cardy's formula, scaling limits | year=2001 | journal=Comptes Rendus de l'Academie des Sciences | volume=333 | pages=239–244 | doi=10.1016/S0764-4442(01)01991-7|bibcode = 2001CRASM.333..239S | issue=3 }} *{{citation|last1= Stauffer|first1= Dietrich |last2= Aharony|first2= Anthony|title= Introduction to Percolation Theory|edition=2nd|publisher=CRC Press|year=1994|isbn=978-0748402533}} ====Web link analysis==== Several [[Web search]] [[ranking]] algorithms use link-based centrality metrics, including (in order of appearance) [[Massimo Marchiori|Marchiori]]'s [[Hyper Search]], [[Google]]'s [[PageRank]], Kleinberg's [[HITS algorithm]], the [[CheiRank]] and [[TrustRank]] algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example the analysis might be of the interlinking between politicians' web sites or blogs. ===Centrality measures=== Information about the relative importance of nodes and edges in a graph can be obtained through [[centrality]] measures, widely used in disciplines like [[sociology]]. For example, [[eigenvector centrality]] uses the [[eigenvectors]] of the [[adjacency matrix]] to determine nodes that tend to be frequently visited. ==Spread of content in networks== Content in a [[complex network]] can spread via two major methods: conserved spread and non-conserved spread.<ref>Newman, M., Barabási, A.-L., Watts, D.J. [eds.] (2006) The Structure and Dynamics of Networks. Princeton, N.J.: Princeton University Press.</ref> In conserved spread, the total amount of content that enters a complex network remains constant as it passes through. The model of conserved spread can best be represented by a pitcher containing a fixed amount of water being poured into a series of funnels connected by tubes . Here, the pitcher represents the original source and the water is the content being spread. The funnels and connecting tubing represent the nodes and the connections between nodes, respectively. As the water passes from one funnel into another, the water disappears instantly from the funnel that was previously exposed to the water. In non-conserved spread, the amount of content changes as it enters and passes through a complex network. The model of non-conserved spread can best be represented by a continuously running faucet running through a series of funnels connected by tubes . Here, the amount of water from the original source is infinite Also, any funnels that have been exposed to the water continue to experience the water even as it passes into successive funnels. The non-conserved model is the most suitable for explaining the transmission of most [[infectious diseases]]. ==Interdependent networks== Interdependent networks is a system of coupled networks where nodes of one or more networks depend on nodes in other networks. Such dependencies are enhanced by the developments in modern technology. Dependencies may lead to cascading failures between the networks and a relatively small failure can lead to a catastrophic breakdown of the system. Blackouts are a fascinating demonstration of the important role played by the dependencies between networks. A recent study developed a framework to study the cascading failures in an interdependent networks system.<ref>{{Cite journal|author = S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, S. Havlin |title = Catastrophic cascade of failures in interdependent networks|journal = Nature |volume = 464 |pages = 1025–28 |year = 2010 | doi=10.1038/nature08932 |issue=7291|url=http://havlin.biu.ac.il/Publications.php?keyword=Catastrophic+cascade+of+failures+in+interdependent+networks&year=*&match=all}}</ref> <ref>{{cite journal|last=Jianxi Gao|first=Sergey V. Buldyrev3, Shlomo Havlin4, and H. Eugene Stanley|title=Robustness of a Network of Networks|journal=Phys. Rev. Lett|year=2011|volume=107|pages=195701|url=http://havlin.biu.ac.il/Publications.php?keyword=Robustness+of+a+Tree-like+Network+of+Interdependent+Networks&year=*&match=all | doi = 10.1103/PhysRevLett.107.195701 }}</ref> ==Network optimization== Network problems that involve finding an optimal way of doing something are studied under the name of [[combinatorial optimization]]. Examples include [[flow network|network flow]], [[shortest path problem]], [[transport problem]], [[transshipment problem]], [[location problem]], [[Matching (graph theory)|matching problem]], [[assignment problem]], [[packing problem]], [[routing| routing problem]], [[Critical Path Analysis]] and [[PERT]] (Program Evaluation & Review Technique). ==Network Analysis and Visualization Tools== *[[Graph-tool]] and [[NetworkX]], [[Free Software|free]] and efficient Python modules for manipulation and statistical analysis of networks. [http://graph-tool.skewed.de/] [http://networkx.lanl.gov/] *[[Orange (software)|Orange]], a free data mining software suite, module [http://www.ailab.si/orange/doc/modules/orngNetwork.htm orngNetwork] *[http://pajek.imfm.si/doku.php Pajek], program for (large) network analysis and visualization. *[[Tulip (software)|Tulip]], a free data mining and visualization software dedicated to the analysis and visualization of relational data. [http://tulip.labri.fr/] ==See also== * [[Complex network]] * [[Collaborative innovation network]] * [[Dynamic network analysis]] * [[Higher category theory]] * [[Immune network theory]] * [[Irregular warfare]] * [[Network Theory in Risk Assessment]] * [[Polytely]] * [[Systems theory]] *[[Constructal law]]<ref>[http://www.constructal.org/en/art/Phil.%20Trans.%20R.%20Soc.%20B%20%282010%29%20365,%201335%961347.pdf] Bejan A., Lorente S., The Constructal Law of Design and Evolution in Nature. ''Philosophical Transactions of the Royal Society B'', Biological Science, Vol. 365, 2010, pp. 1335-1347.</ref> *[[Percolation]] *[[Network Theory in Risk Assessment]] *[[Network topology]] *[[Network analyzer]] *[[Small-world networks]] *[[Social-circles network model|Social circles]] *[[Scale-free networks]] *[[Sequential dynamical system]] ==References== * "Network Science Center," http://www.dodccrp.org/files/Network_Science_Center.asf * "Connected: The Power of Six Degrees," http://ivl.slis.indiana.edu/km/movies/2008-talas-connected.mov * R. Cohen, K. Erez, D. ben-Avraham, S. Havlin, "[http://havlin.biu.ac.il/Publications.php?keyword=Resilience+of+the+Internet+to+random+breakdown&year=*&match=all Resilience of the Internet to random breakdown]" Phys. Rev. Lett. 85, 4626 (2000). * R. Cohen, K. Erez, D. ben-Avraham, S. Havlin, "[http://havlin.biu.ac.il/Publications.php?keyword=Breakdown+of+the+Internet+under+intentional+attack&year=*&match=all Breakdown of the Internet under intentional attack]" Phys. Rev. Lett. 86, 3682 (2001) * R. Cohen, S. Havlin, "[http://havlin.biu.ac.il/Publications.php?keyword=Scale-free+networks+are+ultrasmall&year=*&match=all Scale-free networks are ultrasmall]" Phys. Rev. Lett. 90, 058701 (2003) == Further reading== * "The Burgeoning Field of Network Science," http://themilitaryengineer.com/index.php?option=com_content&task=view&id=88 * S.N. Dorogovtsev and J.F.F. Mendes, ''Evolution of Networks: From biological networks to the Internet and WWW'', Oxford University Press, 2003, ISBN 0-19-851590-1 * ''Linked: The New Science of Networks'', A.-L. Barabási (Perseus Publishing, Cambridge * ''[http://www.nap.edu/catalog.php?record_id=11516 Network Science]'', Committee on Network Science for Future Army Applications, National Research Council. 2005. The National Academies Press (2005)ISBN 0-309-10026-7 * ''Network Science Bulletin'', USMA (2007) ISBN 978-1-934808-00-9 * ''The Structure and Dynamics of Networks'' Mark Newman, Albert-László Barabási, & Duncan J. Watts (The Princeton Press, 2006) ISBN 0-691-11357-2 * ''Dynamical processes on complex networks'', Alain Barrat, Marc Barthelemy, Alessandro Vespignani (Cambridge University Press, 2008) ISBN 978-0-521-87950-7 * ''Network Science: Theory and Applications'', Ted G. Lewis (Wiley, March 11, 2009) ISBN 0470331887 * ''Nexus: Small Worlds and the Groundbreaking Theory of Networks'', Mark Buchanan (W. W. Norton & Company, June 2003) ISBN 0393324427 * ''Six Degrees: The Science of a Connected Age'', Duncan J. Watts (W. W. Norton & Company, February 17, 2004) ISBN 0393325423 *[http://netwiki.amath.unc.edu/ netwiki] Scientific wiki dedicated to network theory *[http://www.networkcultures.org/networktheory/ New Network Theory] International Conference on 'New Network Theory' *[http://nwb.slis.indiana.edu/ Network Workbench]: A Large-Scale Network Analysis, Modeling and Visualization Toolkit *[http://www.orgnet.com/SocialLifeOfRouters.pdf Network analysis of computer networks] *[http://www.orgnet.com/orgnetmap.pdf Network analysis of organizational networks] *[http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/941/863 Network analysis of terrorist networks] *[http://www.orgnet.com/AJPH2007.pdf Network analysis of a disease outbreak] *[http://linkanalysis.wlv.ac.uk/ Link Analysis: An Information Science Approach] (book) *[http://gephi.org/2008/how-kevin-bacon-cured-cancer/ Connected: The Power of Six Degrees] (documentary) *[http://havlin.biu.ac.il/Publications.php?keyword=Identification+of+influential+spreaders+in+complex+networks++&year=*&match=all Influential Spreaders in Networks], M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, H.A. Makse, ''Nature Physics'' 6, 888 (2010) *[http://havlin.biu.ac.il/course4.php A short course on complex networks] == External links== * [http://www.netscience.usma.edu Network Science Center at the U.S. Military Academy at West Point, NY] * http://press.princeton.edu/titles/8114.html * http://www.cra.org/ccc/NSE.ppt.pdf * http://www.ifr.ac.uk/netsci08/ * [http://www.fis.ua.pt/grupoteorico/gteorico.htm GNET] &mdash; Group of Complex Systems & Random Networks * http://www.netsci09.net/ * [http://cns.slis.indiana.edu/cyber.html Cyberinfrastructure] * [http://www.prospectmagazine.co.uk/2010/02/let%E2%80%99s-all-be-friends/ Prof. Nicholas A Christakis' introduction to network science in Prospect magazine] *[http://havlin.biu.ac.il/videos1.php Video Lectures on complex networks] by [[Shlomo Havlin|Prof. Shlomo Havlin]] ==Notes== {{reflist}} [[Category:Cybernetics]] [[Category:Networks]] [[Category:Network theory]] [[fr:Science des réseaux]] [[nl:Netwerkwetenschap]] [[de:Netzwerkforschung]] [[es:Análisis de redes]] [[fr:Théorie des réseaux]] [[ko:네트워크 이론]] [[pt:Análise de rede]] [[fi:Verkkoteoria]]'
New page wikitext, after the edit (new_wikitext)
''''Network science''' is an area of [[computer science]] and [[network science]] and part of [[graph theory]] that examines the interconnections among diverse physical or [[communication networks|engineered networks]], [[information networks]], [[Biological neural network|biological networks]], cognitive and [[semantic networks]], and [[social networks]]. This field of science seeks to discover common principles, algorithms and tools that govern network behavior. The National Research Council defines Network Science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena." == Background and history == The study of networks has emerged in diverse disciplines as a means of analyzing complex relational data. The earliest known paper in this field is the famous [[Seven Bridges of Königsberg]] written by [[Leonhard Euler]] in 1736. Euler's mathematical description of vertices and edges was the foundation of [[graph theory]], a branch of mathematics that studies the properties of pairwise relations in a network structure. The field of [[graph theory]] continued to develop and found applications in chemistry (Sylvester, 1878). In the 1930s [[Jacob Moreno]], a psychologist in the Gestalt tradition, arrived in the United States. He developed the sociogram and presented it to the public in April 1933 at a convention of medical scholars. Moreno claimed that "before the advent of sociometry no one knew what the interpersonal structure of a group 'precisely' looked like (Moreno, 1953). The sociogram was a representation of the social structure of a group of elementary school students. The boys were friends of boys and the girls were friends of girls with the exception of one boy who said he liked a single girl. The feeling was not reciprocated. This network representation of social structure was found so intriguing that it was printed in [[The New York Times]] (April 3, 1933, page 17). The sociogram has found many applications and has grown into the field of [[social network analysis]]. Probabilistic theory in network science developed as an off-shoot of [[graph theory]] with [[Paul Erdős]] and [[Alfréd Rényi]]'s eight famous papers on [[random graphs]]. For [[social networks]] the [[exponential random graph model]] or p* is a notational framework used to represent the probability space of a tie occurring in a [[social network]]. An alternate approach to network probability structures is the [[network probability matrix]], which models the probability of edges occurring in a network, based on the historic presence or absence of the edge in a sample of networks. In 1998, David Krackhardt and Kathleen Carley introduced the idea of a meta-network with the PCANS Model. They suggest that "all organizations are structured along these three domains, Individuals, Tasks, and Resources". Their paper introduced the concept that networks occur across multiple domains and that they are interrelated. This field has grown into another sub-discipline of network science called [[dynamic network analysis]]. More recently other network science efforts have focused on mathematically describing different network topologies. Duncan Watts reconciled empirical data on networks with mathematical representation, describing the [[small-world network]]. [[Albert-László Barabási]] and Reka Albert developed the [[scale-free network]] which is a loosely defined network topology that contains hub vertices with many connections, that grow in a way to maintain a constant ratio in the number of the connections versus all other nodes. Although many networks, such as the internet, appear to maintain this aspect, other networks have long tailed distributions of nodes that only approximate scale free ratios. Today, network science is an exciting and growing field. Scientists from many diverse fields are working together. Network science holds the promise of increasing collaboration across disciplines, by sharing data, algorithms, and software tools. == Network Models == Network models serve as a foundation to understanding interactions within empirical complex networks. Various [[random graph]] generation models produce network structures that may be used in comparison to real-world complex networks. ===Department of Defense Initiatives=== The U.S. military first became interested in [[network-centric warfare]] as an operational concept based on network science in 1996. John A. Parmentola, the U.S. Army Director for Research and Laboratory Management, proposed to the Army’s Board on Science and Technology (BAST) on December 1, 2003 that Network Science become a new Army research area. The BAST, the Division on Engineering and Physical Sciences for the National Research Council (NRC) of the National Academies, serves as a convening authority for the discussion of science and technology issues of importance to the Army and oversees independent Army-related studies conducted by the National Academies. The BAST conducted a study to find out whether identifying and funding a new field of investigation in basic research, Network Science, could help close the gap between what is needed to realize Network-Centric Operations and the current primitive state of fundamental knowledge of networks. As a result, the BAST issued the NRC study in 2005 titled Network Science (referenced above) that defined a new field of basic research in Network Science for the Army. Based on the findings and recommendations of that study and the subsequent 2007 NRC report titled Strategy for an Army Center for Network Science, Technology, and Experimentation, Army basic research resources were redirected to initiate a new basic research program in Network Science. To build a new theoretical foundation for complex networks, some of the key Network Science research efforts now ongoing in Army laboratories address: * Mathematical models of network behavior to predict performance with network size, complexity, and environment * Optimized human performance required for network-enabled warfare * Networking within ecosystems and at the molecular level in cells. As initiated in 2004 by Frederick I. Moxley with support he solicited from [[David S. Alberts]], the Department of Defense helped to establish the first Network Science Center in conjunction with the U.S. Army at the United States Military Academy. Subsequently, the U.S. Department of Defense has funded numerous research projects in the area of Network Science. In 2006, the U.S. Army and the United Kingdom (UK) formed the Network and Information Science [[International Technology Alliance]], a collaborative partnership among the Army Research Laboratory, UK Ministry of Defense and a consortium of industries and universities in the U.S. and UK. The goal of the alliance is to perform basic research in support of Network- Centric Operations across the needs of both nations. In 2009, the U.S. Army formed the [[Network Science CTA]], a collaborative research alliance among the [[Army Research Laboratory]], [[CERDEC]], and a consortium of about 30 industrial R&D labs and universities in the U.S. The goal of the alliance is to develop a deep understanding of the underlying commonalities among intertwined social/cognitive, information, and communications networks, and as a result improve our ability to analyze, predict, design, and influence complex systems interweaving many kinds of networks. ==Network analysis== ===Social network analysis=== '''[[Social network]] analysis''' examines the structure of relationships between social entities.<ref>Wasserman, Stanley and Katherine Faust. 1994. ''Social Network Analysis: Methods and Applications.'' Cambridge: Cambridge University Press. </ref> These entities are often persons, but may also be [[Group (sociology)|groups]], [[organizations]], [[nation states]], [[web sites]], [[scientometrics|scholarly publications]]. Since the 1970s, the empirical study of networks has played a central role in social science, and many of the [[Mathematics|mathematical]] and [[Statistics|statistical]] tools used for studying networks have been first developed in [[sociology]].<ref name="Newman">Newman, M.E.J. ''Networks: An Introduction.'' Oxford University Press. 2010</ref> Amongst many other applications, social network analysis has been used to understand the [[diffusion of innovations]], news and rumors. Similarly, it has been used to examine the spread of both [[epidemiology|diseases]] and [[Medical sociology|health-related behaviors]]. It has also been applied to the [[Economic sociology|study of markets]], where it has been used to examine the role of trust in [[Social exchange|exchange relationships]] and of social mechanisms in setting prices. Similarly, it has been used to study recruitment into [[political movement]]s and social organizations. It has also been used to conceptualize scientific disagreements as well as academic prestige. More recently, network analysis (and its close cousin [[traffic analysis]]) has gained a significant use in military intelligence, for uncovering insurgent networks of both hierarchical and [[leaderless resistance|leaderless]] nature. ===Biological network analysis=== With the recent explosion of publicly available high throughput biological data, the analysis of molecular networks has gained significant interest. The type of analysis in this content are closely related to social network analysis, but often focusing on local patterns in the network. For example [[network motif]]s are small subgraphs that are over-represented in the network. [[Activity motifs]] are similar over-represented patterns in the attributes of nodes and edges in the network that are over represented given the network structure. ===Link analysis=== Link analysis is a subset of network analysis, exploring associations between objects. An example may be examining the addresses of suspects and victims, the telephone numbers they have dialed and financial transactions that they have partaken in during a given timeframe, and the familial relationships between these subjects as a part of police investigation. Link analysis here provides the crucial relationships and associations between very many objects of different types that are not apparent from isolated pieces of information. Computer-assisted or fully automatic computer-based link analysis is increasingly employed by [[bank]]s and [[insurance]] agencies in [[fraud]] detection, by telecommunication operators in telecommunication network analysis, by medical sector in [[epidemiology]] and [[pharmacology]], in law enforcement [[Criminal procedure|investigation]]s, by [[search engine]]s for [[relevance]] rating (and conversely by the [[search engine spammer|spammers]] for [[spamdexing]] and by business owners for [[search engine optimization]]), and everywhere else where relationships between many objects have to be analyzed. ====Network robustness==== The structural robustness of networks <ref>{{cite book |title= Complex Networks: Structure, Robustness and Function | author =R. Cohen, S. Havlin|year= 2010 |publisher= Cambridge University Press |url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_com_net.php}}</ref> is studied using [[percolation theory]]. When a critical fraction of nodes is removed the network becomes fragmented into small clusters. This phenomenon is called percolation <ref>{{cite book |title= Fractals and Disordered Systems | author =A. Bunde, S. Havlin|year= 1996 |publisher= Springer |url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_fds.php}}</ref> and it represents an order-disorder type of [[phase transition]] with [[critical exponents]]. ====Percolation==== ==Introduction== A representative question (and the [[etymology|source]] of the name) is as follows. Assume that some liquid is poured on top of some [[porosity|porous]] material. Will the liquid be able to make its way from hole to hole and reach the bottom? This physical question is [[mathematical model|modelled]] mathematically as a three-dimensional network of ''n''&nbsp;&times;&nbsp;''n''&nbsp;&times;&nbsp;''n'' points (or [[graph (mathematics)|vertices]]/sites) the connections (or [[graph (mathematics)|edge]]s/bonds) between each two neighbors may be open (allowing the liquid through) with probability ''p'', or closed with probability 1&nbsp;–&nbsp;''p'', and they are assumed to be independent. Therefore, for a given ''p'', what is the probability that an open path exists from the top to the bottom? The behavior for large&nbsp;''n'' is of primary interest. This problem, called now bond-percolation, was introduced in the mathematics literature by {{harvtxt|Broadbent|Hammersley|1957}}, and has been studied intensively by mathematicians and physicists since. When a site is occupied with probability ''p'' or empty (its edges are also removed) with probability ''1-p'', the problem is called "site percolation". The question is the same: for a given ''p'', what is the probability that a path exists between top and bottom? Of course the same questions can be asked for any lattice dimension. As is quite typical, it is actually easier to examine [[Infinite graph|infinite]] networks than just large ones. In this case the corresponding question is: does an infinite open cluster exist? That is, is there a path of connected points of infinite length "through" the network? By [[Kolmogorov's zero-one law]], for any given ''p'', the probability that an infinite cluster exists is either zero or one. Since this probability is an increasing function of ''p'' (proof via [[Coupling (probability)|coupling]] argument), there must be a '''critical''' ''p'' (denoted by&nbsp;''p''<sub>c</sub>) below which the probability is always 0 and above which the probability is always&nbsp;1. In practice, this criticality is very easy to observe. Even for ''n'' as small as 100, the probability of an open path from the top to the bottom increases sharply from very close to zero to very close to one in a short span of values of&nbsp;''p''. [[Image:Bond percolation p 51.png|thumb|Detail of a bond percolation on the square lattice in two dimensions with percolation probability ''p''&nbsp;=&nbsp;0.51]] In some cases ''p''<sub>c</sub> may be calculated explicitly. For example, for the [[square lattice]] '''Z'''<sup>2</sup> in two dimensions, ''p''<sub>c</sub>&nbsp;=&nbsp;1/2, a fact which was an open question for more than 20 years and was finally resolved by [[Harry Kesten]] in the early 1980s, see {{harvtxt|Kesten|1982}}. A limit case for lattices in many dimensions is given by the [[Bethe lattice]], whose threshold is at ''p''<sub>c</sub>&nbsp;=&nbsp;1/(''z''&nbsp;&minus;&nbsp;1) for a [[coordination number]]&nbsp;''z''. For most infinite lattice graphs, ''p''<sub>c</sub> cannot be calculated exactly. For example, ''p''<sub>c</sub> is not known for bond-percolation in [[hypercubic]] lattice in two dimensions. ==Universality== The [[Universality (dynamical systems)|universality principle]] states that the value of ''p''<sub>c</sub> is connected to the local structure of the graph, while the behavior of clusters below, at, and above ''p''<sub>c</sub> are invariant with respect to the local structure, and therefore, in some sense are more natural quantities to consider. This universality also means that for the same dimension independent of the type of the lattice or type of percolation (e.g., bond or site) the [[fractal dimension]] of the clusters at ''p''<sub>c</sub> is the same. == Phases == ===Subcritical and supercritical=== The main fact in the subcritical phase is "exponential decay". That is, when ''p''&nbsp;<&nbsp;''p''<sub>c</sub>, the probability that a specific point (for example, the origin) is contained in an open cluster of size ''r'' decays to zero [[Big O notation#Common orders of functions|exponentially]] in&nbsp;''r''. This was proved for percolation in three and more dimensions by {{harvtxt|Menshikov|1986}} and independently by {{harvtxt|Aizenman|Barsky|1987}}. In two dimensions, it formed part of Kesten's proof that ''p''<sub>c</sub>&nbsp;=&nbsp;1/2. The [[dual graph]] of the square lattice '''Z'''<sup>2</sup> is also the square lattice. It follows that, in two dimensions, the supercritical phase is dual to a subcritical percolation process. This provides essentially full information about the supercritical model with ''d''&nbsp;=&nbsp;2. The main result for the supercritical phase in three and more dimensions is that, for sufficiently large&nbsp;''N'', there is an infinite open cluster in the two-dimensional slab&nbsp;'''Z'''<sup>2</sup>&nbsp;&times;&nbsp;[0,&nbsp;''N'']<sup>''d''&minus;2</sup>. This was proved by {{harvtxt|Grimmett|Marstrand|1990}}. In two dimensions with ''p''&nbsp;<&nbsp;1/2, there is with probability one a unique infinite closed cluster. Thus the subcritical phase may be described as finite open islands in an infinite closed ocean. When ''p''&nbsp;>&nbsp;1/2 just the opposite occurs, with finite closed islands in an infinite open ocean. The picture is more complicated when ''d''&nbsp;≥&nbsp;3 since ''p''<sub>c</sub>&nbsp;<&nbsp;1/2, and there is coexistence of infinite open and closed clusters for ''p'' between ''p''<sub>c</sub> and&nbsp;1&nbsp;&minus;&nbsp;''p''<sub>c</sub>. ===Critical === The model has a [[mathematical singularity|singularity]] at the critical point ''p''&nbsp;=&nbsp;''p''<sub>c</sub> believed to be of power-law type. [[Critical scaling|Scaling theory]] predicts the existence of [[critical exponents]], depending on the number ''d'' of dimensions, that determine the class of the singularity. When ''d''&nbsp;=&nbsp;2 these predictions are backed up by arguments from [[quantum field theory]] and [[quantum gravitation]], and include predicted numerical values for the exponents. Most of these predictions are conjectural except when the number ''d'' of dimensions satisfies either ''d''&nbsp;=&nbsp;2 or&nbsp;''d''&nbsp;≥&nbsp;19. They include: * There are no infinite clusters (open or closed) * The probability that there is an open path from some fixed point (say the origin) to a distance of ''r'' decreases ''polynomially'', i.e. is [[big O notation|on the order of]] ''r''<sup>&nbsp;''α''</sup> for some&nbsp;''α'' ** ''α'' does not depend on the particular lattice chosen, or on other local parameters. It depends only on value of the dimension ''d'' (this is an instance of the [[Universality (dynamical systems)|universality]] principle). ** ''α''<sub>''d''</sub> decreases from ''d''&nbsp;=&nbsp;2 until ''d''&nbsp;=&nbsp;6 and then stays fixed. ** ''α''<sub>6</sub>&nbsp;=&nbsp;&minus;1 ** ''α''<sub>2</sub>&nbsp;=&nbsp;&minus;5/48. * The shape of a large cluster in two dimensions is [[conformal map|conformally invariant]]. See {{harvtxt|Grimmett|1999}}. In dimension&nbsp;≥&nbsp;19, these facts are largely proved using a technique known as the [[lace expansion]]. It is believed that a version of the lace expansion should be valid for 7 or more dimensions, perhaps with implications also for the threshold case of 6 dimensions. The connection of percolation to the lace expansion is found in {{harvtxt|Hara|Slade|1990}}. In dimension&nbsp;2, the first fact ("no percolation in the critical phase") is proved for many lattices, using duality. Substantial progress has been made on two-dimensional percolation through the conjecture of [[Oded Schramm]] that the [[scaling limit]] of a large cluster may be described in terms of a [[Schramm&ndash;Loewner evolution]]. This conjecture was proved by {{harvtxt|Smirnov|2001}} in the special case of site percolation on the triangular lattice. ==Different models== *The first model studied was [[Bernoulli percolation]]. In this model all bonds are independent. This model is called bond percolation by physicists. *A generalization next was introduced as the [[Fortuin–Kasteleyn random cluster model]], which has many connections with the [[Ising model]] and other [[Potts model]]s. *Bernoulli (bond) percolation on [[complete graph]]s is an example of a [[random graph]]. The critical probability is&nbsp;''p''&nbsp;=&nbsp;1/''N''. *[[Directed percolation]], which has connections with the [[contact process (mathematics)|contact process]]. *[[First passage percolation]]. *[[Invasion percolation]]. * Percolation with dependency links was introduced by [http://havlin.biu.ac.il/Publications.php?keyword=bashan+%26+parshani&year=2011&match=any Parshani et.al.] * Opinion Model <!-- LOTS of todo! Other models of percolation. references. Percolation on graphs.--> ==References== {{Reflist}} *{{Citation | last1=Aizenman | first1=Michael | authorlink1=Michael Aizenman | last2=Barsky |first2=David | title=Sharpness of the phase transition in percolation models | journal=Communications in Mathematical Physics | volume=108 | year=1987 | pages=489–526 | doi=10.1007/BF01212322|bibcode = 1987CMaPh.108..489A | issue=3 }} *{{Citation | url=http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521872324 | title=Percolation | first1=Bela | last1=Bollobas | authorlink1=Bela Bollobas | first2=Oliver | last2=Riordan | publisher =Cambridge University Press | year= 2006 | doi=10.2277/}} *{{Citation | last1=Broadbent | first1=Simon | last2=Hammersley | first2=John | authorlink2=John Hammersley | title=Percolation processes I. Crystals and mazes | journal=Proceedings of the Cambridge Philosophical Society | volume=53 | pages=629–641 | year=1957|bibcode = 1957PCPS...53..629B |doi = 10.1017/S0305004100032680 }} *{{Citation | author= Bunde A. and [[Shlomo Havlin|Havlin S.]] | title=Fractals and Disordered Systems | publisher= Springer| year=1996| url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_fds.php}} *{{Citation | author= Cohen R. and [[Shlomo Havlin|Havlin S.]] | title=Complex Networks: Structure, Robustness and Function | publisher= Cambridge University Press | year=2010| url= http://havlin.biu.ac.il/Shlomo%20Havlin%20books_com_net.php}} *{{Citation | last1=Grimmett | first1=Geoffrey |authorlink1=Geoffrey Grimmett| title=Percolation | publisher=Springer | year=1999 | url=http://www.statslab.cam.ac.uk/~grg/papers/perc/perc.html}} *{{Citation | last1=Grimmett | first1=Geoffrey | authorlink1=Geoffrey Grimmett | last2=Marstrand | first2=John | title=The supercritical phase of percolation is well behaved | journal=Proceedings of the Royal Society (London), Series A | volume=430 | year=1990 | pages=439–457 | doi=10.1098/rspa.1990.0100|bibcode = 1990RSPSA.430..439G | issue=1879 }} *{{Citation | last1=Hara | first1=Takashi | last2=Slade | first2=Gordon | year=1990 | title=Mean-field critical behaviour for percolation in high dimensions | journal=Communications in Mathematical Physics | volume=128 | pages=333–391 | doi=10.1007/BF02108785|bibcode = 1990CMaPh.128..333H | issue=2 }} *{{Citation | title=Percolation theory for mathematicians | last1=Kesten | first1=Harry | publisher=Birkhauser | year=1982 | authorlink1=Harry Kesten}} *{{Citation | last1=Menshikov | first1=Mikhail | authorlink1=Mikhail Vasiliyevich Menshikov| title=Coincidence of critical points in percolation problems | journal=Soviet Mathematics Doklady | volume=33 | year=1986 | pages=856–859}} *{{Citation | last1=Smirnov | first1=Stanislav | authorlink1=Stanislav Smirnov | title=Critical percolation in the plane: conformal invariance, Cardy's formula, scaling limits | year=2001 | journal=Comptes Rendus de l'Academie des Sciences | volume=333 | pages=239–244 | doi=10.1016/S0764-4442(01)01991-7|bibcode = 2001CRASM.333..239S | issue=3 }} *{{citation|last1= Stauffer|first1= Dietrich |last2= Aharony|first2= Anthony|title= Introduction to Percolation Theory|edition=2nd|publisher=CRC Press|year=1994|isbn=978-0748402533}} ====Web link analysis==== Several [[Web search]] [[ranking]] algorithms use link-based centrality metrics, including (in order of appearance) [[Massimo Marchiori|Marchiori]]'s [[Hyper Search]], [[Google]]'s [[PageRank]], Kleinberg's [[HITS algorithm]], the [[CheiRank]] and [[TrustRank]] algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example the analysis might be of the interlinking between politicians' web sites or blogs. ====Page Rank==== ==Description== [[File:PageRank-hi-res.png|thumb|right|Principles of PageRank]] A PageRank results from a mathematical algorithm based on the [[Graph (mathematics)|graph]], the [[webgraph]], created by all World Wide Web pages as nodes and [[hyperlink]]s as edges, taking into consideration authority hubs such as [[cnn.com]] or [[usa.gov]]. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined [[recursion|recursively]] and depends on the number and PageRank metric of all pages that link to it ("[[incoming link]]s"). A page that is linked to by many pages with high PageRank receives a high rank itself. If there are no links to a web page there is no support for that page. Numerous academic papers concerning PageRank have been published since Page and Brin's original paper.<ref name="originalpaper">{{cite doi|10.1016/S0169-7552(98)00110-X}}</ref> In practice, the PageRank concept has proven to be vulnerable to manipulation, and extensive research has been devoted to identifying falsely inflated PageRank and ways to ignore links from documents with falsely inflated PageRank. Other link-based ranking algorithms for Web pages include the [[HITS algorithm]] invented by [[Jon Kleinberg]] (used by [[Teoma]] and now [[Ask.com]]), the IBM [[CLEVER project]], and the [[TrustRank]] algorithm. ==History== PageRank was developed at [[Stanford University]] by [[Larry Page]] (hence the name ''Page''-Rank<ref>{{cite book | author = David Vise and Mark Malseed | year = 2005 | title = The Google Story | url = http://www.thegooglestory.com/ | isbn = ISBN 0-553-80457-X | page = 37}}</ref>) and [[Sergey Brin]] as part of a research project about a new kind of search engine.<ref>Page, Larry, [http://web.archive.org/web/20020506051802/www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0072?1 "PageRank: Bringing Order to the Web"], Stanford Digital Library Project, talk. August 18, 1997 (archived 2002)</ref> Sergey Brin had the idea that information on the web could be ordered in a hierarchy by "link popularity": a page is ranked higher as there are more links to it.<ref name="gpower">[http://www.google-watch.org/gpower.pdf 187-page study from Graz University, Austria], includes the note that also human brains are used when determining the page rank in Google</ref> It was co-authored by [[Rajeev Motwani]] and [[Terry Winograd]]. The first paper about the project, describing PageRank and the initial prototype of the [[Google search]] engine, was published in 1998:<ref name="originalpaper" /> shortly after, Page and Brin founded [[Google Inc.]], the company behind the Google search engine. While just one of many factors that determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools.<ref name="googletechnology">{{cite web|url=http://www.google.com/technology/ |title=Google Technology |publisher=Google.com |date= |accessdate=2011-05-27}}</ref> PageRank has been influenced by [[citation analysis]], early developed by [[Eugene Garfield]] in the 1950s at the University of Pennsylvania, and by [[Hyper Search]], developed by [[Massimo Marchiori]] at the University of Padua. In the same year PageRank was introduced (1998), [[Jon Kleinberg]] published his important work on [[HITS algorithm|HITS]]. Google's founders cite Garfield, Marchiori, and Kleinberg in their original paper.<ref name="originalpaper" /> A small search engine called "RankDex" from IDD Information Services designed by [[Robin Li]] was, since 1996, already exploring a similar strategy for site-scoring and page ranking.<ref>{{cite journal |last=Li |first=Yanhong |date=August 6, 2002 |title=Toward a qualitative search engine |journal=Internet Computing, IEEE |volume=2 |issue=4 |page= |pages=24–29 |publisher=IEEE Computer Society |doi=10.1109/4236.707687}}</ref> The technology in RankDex would be patented by 1999<ref>USPTO, [http://www.google.com/patents?hl=en&lr=&vid=USPAT5920859&id=x04ZAAAAEBAJ&oi=fnd&dq=yanhong+li&printsec=abstract#v=onepage&q=yanhong%20li&f=false "Hypertext Document Retrieval System and Method"], U.S. Patent number: 5920859, Inventor: Yanhong Li, Filing date: Feb 5, 1997, Issue date: Jul 6, 1999</ref> and used later when Li founded [[Baidu]] in China.<ref>Greenberg, Andy, [http://www.forbes.com/forbes/2009/1005/technology-baidu-robin-li-man-whos-beating-google_2.html "The Man Who's Beating Google"], ''Forbes'' magazine, October 05, 2009</ref><ref>[http://www.rankdex.com/about.html "About: RankDex"], ''rankdex.com''</ref> Li's work would be referenced by some of Larry Page's U.S. patents for his Google search methods.<ref>Cf. especially Lawrence Page, U.S. patents 6,799,176 (2004) "Method for scoring documents in a linked database", 7,058,628 (2006) "Method for node ranking in a linked database", and 7,269,587 (2007) "Scoring documents in a linked database"2011</ref> ==Algorithm== PageRank is a [[probability distribution]] used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value. A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a "50% chance" of something happening. Hence, a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank. ===Simplified algorithm=== Assume a small universe of four web pages: '''A''', '''B''', '''C''' and '''D'''. The initial approximation of PageRank would be evenly divided between these four documents. Hence, each document would begin with an estimated PageRank of 0.25. In the original form of PageRank initial values were simply 1. This meant that the sum of all pages was the total number of pages on the web at that time. Later versions of PageRank (see the formulas below) would assume a probability distribution between 0 and 1. Here a simple probability distribution will be used—hence the initial value of 0.25. If pages '''B''', '''C''', and '''D''' each only link to '''A''', they would each confer 0.25 PageRank to '''A'''. All PageRank '''PR( )''' in this simplistic system would thus gather to '''A''' because all links would be pointing to '''A'''. :<math>PR(A)= PR(B) + PR(C) + PR(D).\,</math> This is 0.75. Suppose that page '''B''' has a link to page '''C''' as well as to page '''A''', while page '''D''' has links to all three pages. The ''value of the link-votes is divided among all the outbound links on a page''. Thus, page '''B''' gives a vote worth 0.125 to page '''A''' and a vote worth 0.125 to page '''C'''. Only one third of '''D'''<nowiki>'</nowiki>s PageRank is counted for A's PageRank (approximately 0.083). :<math>PR(A)= \frac{PR(B)}{2}+ \frac{PR(C)}{1}+ \frac{PR(D)}{3}.\,</math> In other words, the PageRank conferred by an outbound link is equal to the document's own PageRank score divided by the normalized number of outbound links '''L( )''' (it is assumed that links to specific URLs only count once per document). :<math>PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}. \,</math> In the general case, the PageRank value for any page '''u''' can be expressed as: :<math>PR(u) = \sum_{v \in B_u} \frac{PR(v)}{L(v)}</math>, i.e. the PageRank value for a page '''u''' is dependent on the PageRank values for each page '''v''' out of the set '''B<sub>u</sub>''' (this set contains all pages linking to page '''u'''), divided by the number ''L''(''v'') of links from page '''v'''. ===Damping factor=== The PageRank theory holds that even an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor ''d''. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.<ref name="originalpaper" /> The damping factor is subtracted from 1 (and in some variations of the algorithm, the result is divided by the number of documents (''N'') in the collection) and this term is then added to the product of the damping factor and the sum of the incoming PageRank scores. That is, : <math>PR(A) = {1 - d \over N} + d \left( \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}+\,\cdots \right).</math> So any page's PageRank is derived in large part from the PageRanks of other pages. The damping factor adjusts the derived value downward. The original paper, however, gave the following formula, which has led to some confusion: : <math>PR(A)= 1 - d + d \left( \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}+\,\cdots \right).</math> The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank gets multiplied by ''N'' and the sum becomes ''N''. A statement in Page and Brin's paper that "the sum of all PageRanks is one"<ref name="originalpaper" /> and claims by other Google employees<ref>[[Matt Cutts]]'s blog: [http://www.mattcutts.com/blog/seo-for-bloggers/ Straight from Google: What You Need to Know], see page 15 of his slides.</ref> support the first variant of the formula above. To be more specific, in the latter formula, the probability for the random surfer reaching a page is weighted by the total number of web pages. So, in this version PageRank is an expected value for the random surfer visiting a page, when he restarts this procedure as often as the web has pages. If the web had 100 pages and a page had a PageRank value of 2, the random surfer would reach that page in an average twice if he restarts 100 times. Basically, the two formulas do not differ fundamentally from each other. A PageRank that has been calculated by using the former formula has to be multiplied by the total number of web pages to get the according PageRank that would have been calculated by using the latter formula. Even Page and Brin mixed up the two formulas in their most popular paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine", where they claim the latter formula to form a probability distribution over web pages with the sum of all pages' PageRanks being one.<ref name="originalpaper" /> Google recalculates PageRank scores each time it crawls the Web and rebuilds its index. As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents.<!-- If a million new documents are added and ALL of them point to page "A", does the initial approximation of A's PageRank decrease? Or should it say the AVERAGE PageRank of all old document decreases, rather than that all of them, unanimously, decrease? --> The formula uses a model of a ''random surfer'' who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the chance that the random surfer will land on that page by clicking on a link. It can be understood as a [[Markov chain]] in which the states are pages, and the transitions are all equally probable and are the links between pages. If a page has no links to other pages, it becomes a sink and therefore terminates the random surfing process. If the random surfer arrives at a sink page, it picks another [[Uniform Resource Locator|URL]] at random and continues surfing again. When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are therefore divided evenly among all other pages. In other words, to be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually ''d ''= 0.85, estimated from the frequency that an average surfer uses his or her browser's bookmark feature. So, the equation is as follows: :<math>PR(p_i) = \frac{1-d}{N} + d \sum_{p_j \in M(p_i)} \frac{PR (p_j)}{L(p_j)}</math> where <math>p_1, p_2, ..., p_N</math> are the pages under consideration, <math>M(p_i)</math> is the set of pages that link to <math>p_i</math>, <math>L(p_j)</math> is the number of outbound links on page <math>p_j</math>, and ''N'' is the total number of pages. The PageRank values are the entries of the dominant [[eigenvector]] of the modified [[adjacency matrix]]. This makes PageRank a particularly elegant metric: the eigenvector is :<math> \mathbf{R} = \begin{bmatrix} PR(p_1) \\ PR(p_2) \\ \vdots \\ PR(p_N) \end{bmatrix} </math> where '''R''' is the solution of the equation :<math> \mathbf{R} = \begin{bmatrix} {(1-d)/ N} \\ {(1-d) / N} \\ \vdots \\ {(1-d) / N} \end{bmatrix} + d \begin{bmatrix} \ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell(p_1,p_N) \\ \ell(p_2,p_1) & \ddots & & \vdots \\ \vdots & & \ell(p_i,p_j) & \\ \ell(p_N,p_1) & \cdots & & \ell(p_N,p_N) \end{bmatrix} \mathbf{R} </math> where the adjacency function <math>\ell(p_i,p_j)</math> is 0 if page <math>p_j</math> does not link to <math>p_i</math>, and normalized such that, for each ''j'' :<math>\sum_{i = 1}^N \ell(p_i,p_j) = 1</math>, i.e. the elements of each column sum up to 1, so the matrix is a [[stochastic matrix]] (for more details see the [[#Computation|computation]] section below). Thus this is a variant of the [[eigenvector centrality]] measure used commonly in [[network theory|network analysis]]. Because of the large eigengap of the modified adjacency matrix above,<ref> {{cite journal | author = Taher Haveliwala and Sepandar Kamvar. | year = 2003 | month = March | page = 7056 | title = The Second Eigenvalue of the Google Matrix | journal = Stanford University Technical Report | url = http://www-cs-students.stanford.edu/~taherh/papers/secondeigenvalue.pdf |format=PDF | bibcode = 2003math......7056N | arxiv = math/0307056 | class = math.FA }}</ref> the values of the PageRank eigenvector are fast to approximate (only a few iterations are needed). As a result of [[Markov process|Markov theory]], it can be shown that the PageRank of a page is the probability of being at that page after lots of clicks. This happens to equal <math>t^{-1}</math> where <math>t</math> is the [[expected value|expectation]] of the number of clicks (or random jumps) required to get from the page back to itself. The main disadvantage is that it favors older pages, because a new page, even a very good one, will not have many links unless it is part of an existing site (a site being a densely connected set of pages, such as [[Wikipedia]]). The Google Directory (itself a derivative of the [[Open Directory Project]]) allows users to see results sorted by PageRank within categories. The Google Directory is the only service offered by Google where PageRank directly determines display order.{{Citation needed|date=October 2009}} In Google's other search services (such as its primary Web search) PageRank is used to weight the relevance scores of pages shown in search results. Several strategies have been proposed to accelerate the computation of PageRank.<ref>{{cite journal |doi=10.1.1.118.5422 | title=Fast PageRank Computation via a Sparse Linear System | author= Gianna M. Del Corso, Antonio Gullí, Francesco Romani | journal=Internet Mathematics | year = 2005 | volume = 2 | issue = 3}}</ref> Various strategies to manipulate PageRank have been employed in concerted efforts to improve search results rankings and monetize advertising links. These strategies have severely impacted the reliability of the PageRank concept, which seeks to determine which documents are actually highly valued by the Web community. Google is known to penalize [[link farm]]s and other schemes designed to artificially inflate PageRank. In December 2007 Google started ''actively'' penalizing sites selling paid text links. How Google identifies link farms and other PageRank manipulation tools are among Google's [[trade secret]]s. ===Computation=== To summarize, PageRank can be either computed iteratively or algebraically. The iterative method can be viewed differently as the [[power iteration]] method,<ref>{{cite conference |doi=10.1.1.18.5264 | title=PageRank computation and the structure of the web: Experiments and algorithms | author=Arasu, A. and Novak, J. and Tomkins, A. and Tomlin, J. | year=2002 | booktitle = Proceedings of the Eleventh International World Wide Web Conference, Poster Track | pages = 107–117 | location = Brisbane, Australia }}</ref><ref>{{cite arxiv |eprint=1002.2858 |author1=Massimo Franceschet |title=PageRank: Standing on the shoulders of giants |class=cs.IR |year=2010}}</ref> or power method. The basic mathematical operations performed in the iterative method and the power method are identical. ====Iterative==== In the former case, at <math>t=0</math>, an initial probability distribution is assumed, usually :<math>PR(p_i; 0) = \frac{1}{N}</math>. At each time step, the computation, as detailed above, yields :<math>PR(p_i;t+1) = \frac{1-d}{N} + d \sum_{p_j \in M(p_i)} \frac{PR (p_j; t)}{L(p_j)}</math>, or in matrix notation :<math>\mathbf{R}(t+1) = d \mathcal{M}\mathbf{R}(t) + \frac{1-d}{N} \mathbf{1}</math>, &nbsp; &nbsp; &nbsp; (*) where <math>\mathbf{R}_i(t)=PR(p_i; t)</math> and <math>\mathbf{1}</math> is the column vector of length <math>N</math> containing only ones. The matrix <math>\mathcal{M}</math> is defined as : <math>\mathcal{M}_{ij} = \begin{cases} 1 /L(p_j) , & \mbox{if }j\mbox{ links to }i\ \\ 0, & \mbox{otherwise} \end{cases} </math> i.e., :<math>\mathcal{M} := (K^{-1} A)^T</math>, where <math>A</math> denotes the [[adjacency matrix]] of the graph and <math>K</math> is the diagonal matrix with the outdegrees in the diagonal. The computation ends when for some small <math>\epsilon</math> :<math>|\mathbf{R}(t+1) - \mathbf{R}(t)| < \epsilon</math>, i.e., when convergence is assumed. ====Algebraic==== In the latter case, for <math>t \to \infty</math> (i.e., in the [[steady state]]), the above equation (*) reads :<math>\mathbf{R} = d \mathcal{M}\mathbf{R} + \frac{1-d}{N} \mathbf{1}</math>. &nbsp; &nbsp; &nbsp; (**) The solution is given by :<math>\mathbf{R} = (\mathbf{I}-d \mathcal{M})^{-1} \frac{1-d}{N} \mathbf{1}</math>, with the [[identity matrix]] <math>\mathbf{I}</math>. The solution exists and is unique for <math>0 < d < 1</math>. This can be seen by noting that <math>\mathcal{M}</math> is by construction a [[stochastic matrix]] and hence has an eigenvalue equal to one because of the [[Perron–Frobenius theorem]]. ====Power Method==== If the matrix <math>\mathcal{M}</math> is a transition probability, i.e., column-stochastic with no columns consisting of just zeros and <math>\mathbf{R}</math> is a probability distribution (i.e., <math>|\mathbf{R}|=1</math>, <math>\mathbf{E}\mathbf{R}=1</math> where <math>\mathbf{E}</math> is matrix of all ones), Eq. (**) is equivalent to :<math>\mathbf{R} = \left( d \mathcal{M} + \frac{1-d}{N} \mathbf{E} \right)\mathbf{R} =: \widehat{ \mathcal{M}} \mathbf{R}</math>. &nbsp; &nbsp; &nbsp; (***) Hence PageRank <math>\mathbf{R}</math> is the principal eigenvector of <math>\widehat{\mathcal{M}}</math>. A fast and easy way to compute this is using the [[Power iteration|power method]]: starting with an arbitrary vector <math>x(0)</math>, the operator <math>\widehat\mathcal{M}</math> is applied in succession, i.e., :<math> x(t+1) = \widehat{\mathcal{M}} x(t)</math>, until :<math>|x(t+1) - x(t)| < \epsilon</math>. Note that in Eq. (***) the matrix on the right-hand side in the parenthesis can be interpreted as :<math> \frac{1-d}{N} \mathbf{I} = (1-d)\mathbf{P} \mathbf{1}^t</math>, where <math>\mathbf{P}</math> is an initial probability distribution. In the current case :<math>\mathbf{P} := \frac{1}{N} \mathbf{1}</math>. Finally, if <math>\mathcal{M}</math> has columns with only zero values, they should be replaced with the initial probability vector <math>\mathbf{P}</math>. In other words :<math>\mathcal{M}^\prime := \mathcal{M} + \mathcal{D}</math>, where the matrix <math>\mathcal{D}</math> is defined as :<math>\mathcal{D} := \mathbf{P} \mathbf{D}^t</math>, with : <math>\mathbf{D}_i = \begin{cases} 1, & \mbox{if }L(p_i)=0\ \\ 0, & \mbox{otherwise} \end{cases}</math> In this case, the above two computations using <math>\mathcal{M}</math> only give the same PageRank if their results are normalized: : <math> \mathbf{R}_{\textrm{power}} = \frac{\mathbf{R}_{\textrm{iterative}}}{|\mathbf{R}_{\textrm{iterative}}|} = \frac{\mathbf{R}_{\textrm{algebraic}}}{|\mathbf{R}_{\textrm{algebraic}}|}</math>. PageRank matlab/octave implementation <source lang="matlab"> % Parameter M adjacency matrix where Mi,j represents the link from 'j' to 'i', such that for all 'j' sum(i, Mi,j) = 1 % Parameter d damping factor % Parameter v_quadratic_error quadratic error for v % Return v vector rank such as vi is the i-th rank from [0, 1] function [v] = rank(M, d, v_quadratic_error) N = size(M, 2); v = rand(N, 1); v = v ./ norm(v, 2); last_v = ones(N, 1) * inf; M_hat = (d .* M) + (((1 - d) / N) .* ones(N, N)); while(norm(v - last_v, 2) > v_quadratic_error) last_v = v; v = M_hat * v; v = v ./ norm(v, 2); end </source> Code example for the octave/matlab rank function <source lang="matlab"> M = [0 0 0 0 1 ; 0.5 0 0 0 0 ; 0.5 0 0 0 0 ; 0 1 0.5 0 0 ; 0 0 0.5 1 0]; rank(M, 0.80, 0.001) </source> ====Efficiency==== Depending on the framework used to perform the computation, the exact implementation of the methods, and the required accuracy of the result, the computation time of the these methods can vary greatly. ==Variations== ===Google Toolbar=== The [[Google Toolbar]]'s PageRank feature displays a visited page's PageRank as a whole number between 0 and 10. The most popular websites have a PageRank of 10. The least have a PageRank of 0. Google has not disclosed the precise method for determining a Toolbar PageRank value. The displayed value is not the actual value Google uses so it is only a rough guide. ‘Toolbar’ PageRank is different from Google PageRank because the PageRank displayed in the toolbar is not 100% reflective of the way Google judges the value of a website. PageRank measures number of sites which link to particular page.<ref>[http://www.google.com/support/forum/p/Webmasters/thread?tid=4aeb4d5fce33350b&hl=en Google Webmaster central] discussion on PR</ref> The PageRank of a particular page is roughly based upon the quantity of inbound links as well as the PageRank of the pages providing the links. Other factors are also part of the algorithm such as the size of a page, the number of changes and its up-to-dateness, the key texts in headlines and the words of hyperlinked anchor texts.<ref name="gpower"/> The Google Toolbar's PageRank is updated infrequently, so often shows out of date values. The last confirmed update was 27 June 2011,<ref name="autogenerated1">http://googlepagerankupdate.com{{cite web|title = Google Page Rank Update |first = | last = |date= May. 29, 2011}}</ref><ref>[http://www.seroundtable.com/pagerank-update-jan11-12832.html SeRoundTable] January 2011 Google Toolbar PageRank Update</ref> however, there was previously no confirmed update for more than a year. <!-- Please cite Google-confirmed references *only* about the PR updates, lots of unconfirmed rumors on the web --> ===SERP Rank=== The [[Search engine results page]] (SERP) is the actual result returned by a search engine in response to a keyword query. The SERP consists of a list of links to web pages with associated text snippets. The SERP rank of a web page refers to the placement of the corresponding link on the SERP, where higher placement means higher SERP rank. The SERP rank of a web page is not only a function of its PageRank, but depends on a relatively large and continuously adjusted set of factors (over 200),<ref name="Vaughn09">{{cite web | author = Aubuchon, Vaughn | title = Google Ranking Factors - SEO Checklist | url = http://www.vaughns-1-pagers.com/internet/google-ranking-factors.htm }}</ref><ref>{{cite web | url = http://www.seomoz.org/article/search-ranking-factors | title = Search Engine Ranking Factors - Version 2 | first = Rand | last = Fishkin | authorlink = Rand Fishkin | coauthors = Jeff Pollard | publisher = seomoz.org | date = April 2, 2007 | accessdate = May 11, 2009 }}</ref> commonly referred to by internet marketers as "Google Love".<ref>http://www.infoworld.com/t/search-engines/google-corrupt-search-me-428</ref> [[Search engine optimization]] (SEO) is aimed at achieving the highest possible SERP rank for a website or a set of web pages. With the introduction of [[Google Places]] into the mainstream organic SERP, PageRank plays little to no role in ranking a business in the Local Business Results.<ref>{{cite web|url=http://google.com/support/places/bin/answer.py?hl=en&answer=7091 |title=Ranking of listings : Ranking - Google Places Help |publisher=Google.com |date= |accessdate=2011-05-27}}</ref> While the theory of citations is still computed in their algorithm, PageRank is not a factor since Google ranks business listings and not web pages. ===Google directory PageRank=== The [[Google Directory]] PageRank is an 8-unit measurement. These values can be viewed in the Google Directory. Unlike the Google Toolbar, which shows the PageRank value by a mouseover of the green bar, the Google Directory does not show the PageRank as a numeric value but only as a green bar. ===False or spoofed PageRank=== While the PageRank shown in the Toolbar is considered to be derived from an accurate PageRank value (at some time prior to the time of publication by Google) for most sites, this value was at one time easily manipulated. A previous flaw was that any low PageRank page that was redirected, via a [[HTTP 302]] response or a "Refresh" [[meta tag]], to a high PageRank page caused the lower PageRank page to acquire the PageRank of the destination page. In theory a new, PR 0 page with no incoming links could have been redirected to the Google home page—which is a PR 10—and then the PR of the new page would be upgraded to a PR10. This [[Website spoofing|spoofing]] technique, also known as [[Page hijacking|302 Google Jacking]], was a known failing or bug in the system. Any page's PageRank could have been spoofed to a higher or lower number of the webmaster's choice and only Google has access to the real PageRank of the page. Spoofing is generally detected by running a Google search for a URL with questionable PageRank, as the results will display the URL of an entirely different site (the one redirected to) in its results. ===Manipulating PageRank=== For [[search engine optimization]] purposes, some companies offer to sell high PageRank links to webmasters.<ref name="Cutts-0414"/> As links from higher-PR pages are believed to be more valuable, they tend to be more expensive. It can be an effective and viable marketing strategy to buy link advertisements on content pages of quality and relevant sites to drive traffic and increase a webmaster's link popularity. However, Google has publicly warned webmasters that if they are or were discovered to be selling links for the purpose of conferring PageRank and reputation, their links will be devalued (ignored in the calculation of other pages' PageRanks). The practice of buying and selling links is intensely debated across the Webmaster community. Google advises webmasters to use the [[nofollow]] [[HTML]] [[attribute (computing)|attribute]] value on sponsored links. According to [[Matt Cutts]], Google is concerned about webmasters who try to [[game the system]], and thereby reduce the quality and relevancy of Google search results.<ref name="Cutts-0414">{{cite web|publisher=mattcutts.com/blog|date=April 14, 2007|accessdate=2007-05-28|title=How to report paid links|url=http://www.mattcutts.com/blog/how-to-report-paid-links/}}</ref> ===The intentional surfer model=== The original PageRank algorithm reflects the so-called random surfer model, meaning that the PageRank of a particular page is derived from the theoretical probability of visiting that page when clicking on links at random. However, real users do not randomly surf the web, but follow links according to their interest and intention. A page ranking model that reflects the importance of a particular page as a function of how many actual visits it receives by real users is called the ''intentional surfer model''.<ref name="Jos07">{{Cite book | author = J&oslash;sang, A. | contribution = Trust and Reputation Systems | title = Foundations of Security Analysis and Design IV, FOSAD 2006/2007 Tutorial Lectures. | year = 2007 | publisher = Springer LNCS 4677 | editor = Aldini, A. | pages = 209–245 | doi = 10.1007/978-3-540-74810-6 | url = http://www.unik.no/people/josang/papers/Jos2007-FOSAD.pdf|format=PDF | volume = 4677}}</ref> The Google toolbar sends information to Google for every page visited, and thereby provides a basis for computing PageRank based on the intentional surfer model. The introduction of the [[nofollow]] attribute by Google to combat [[Spamdexing]] has the side effect that webmasters commonly use it on outgoing links to increase their own PageRank. This causes a loss of actual links for the Web crawlers to follow, thereby making the original PageRank algorithm based on the random surfer model potentially unreliable. Using information about users' browsing habits provided by the Google toolbar partly compensates for the loss of information caused by the [[nofollow]] attribute. The [[Search engine results page|SERP]] rank of a page, which determines a page's actual placement in the search results, is based on a combination of the random surfer model (PageRank) and the intentional surfer model (browsing habits) in addition to other factors.<ref name="SEOnotepad09">{{cite web | author = SEOnotepad | title = Myth of the Google Toolbar Ranking | url = http://www.seonotepad.com/search-engines/google-seo/myth-of-the-google-toolbar-ranking/ }}</ref> ===Other uses=== A version of PageRank has recently been proposed as a replacement for the traditional [[Institute for Scientific Information]] (ISI) [[impact factor]],<ref>{{cite journal | author = Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. | year = 2006 | month = December | title = Journal Status | page = 1030 | journal = Scientometrics | volume = 69 | issue = 3 | arxiv = cs.GL/0601030 | bibcode = 2006cs........1030B}}</ref> and implemented at [http://www.eigenfactor.org eigenfactor.org]. Instead of merely counting total citation to a journal, the "importance" of each citation is determined in a PageRank fashion. A similar new use of PageRank is to rank academic doctoral programs based on their records of placing their graduates in faculty positions. In PageRank terms, academic departments link to each other by hiring their faculty from each other (and from themselves).<ref>{{cite journal | author = Benjamin M. Schmidt and Matthew M. Chingos | title=Ranking Doctoral Programs by Placement: A New Method | year=2007 | journal=PS: Political Science and Politics | volume=40 | issue=July | pages=523–529 | url=http://www.people.fas.harvard.edu/~gillum/rankings_paper.pdf |format=PDF}}</ref> PageRank has been used to rank spaces or streets to predict how many people (pedestrians or vehicles) come to the individual spaces or streets.<ref>{{cite journal | author = B. Jiang | title = Ranking spaces for predicting human movement in an urban environment | year=2006 | journal=International Journal of Geographical Information Science | volume=23 | issue = 7 | pages=823–837 | arxiv=physics/0612011 | doi = 10.1080/13658810802022822}}</ref><ref>{{cite journal | author = Jiang B., Zhao S., and Yin J. | title = Self-organized natural roads for predicting traffic flow: a sensitivity study | year=2008 | journal=Journal of Statistical Mechanics: Theory and Experiment | page = 008 | issue = 07 | volume=P07008 | arxiv=0804.1630 | doi = 10.1088/1742-5468/2008/07/P07008 | bibcode = 2008JSMTE..07..008J }}</ref> In [[lexical semantics]] it has been used to perform [[Word Sense Disambiguation]]<ref>Roberto Navigli, Mirella Lapata. [http://www.dsi.uniroma1.it/~navigli/pubs/PAMI_2010_Navigli_Lapata.pdf "An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation"]. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(4), IEEE Press, 2010, pp. 678–692.</ref> and to automatically rank [[WordNet]] [[synsets]] according to how strongly they possess a given semantic property, such as positivity or negativity.<ref>{{cite web | author = Andrea Esuli and Fabrizio Sebastiani | title=PageRanking WordNet synsets: An Application to Opinion-Related Properties | work=In Proceedings of the 35th Meeting of the Association for Computational Linguistics, Prague, CZ, 2007, pp. 424–431 | url=http://nmis.isti.cnr.it/sebastiani/Publications/ACL07.pdf |format=PDF | accessdate=June 30, 2007 }}</ref> A dynamic weighting method similar to PageRank has been used to generate customized reading lists based on the link structure of Wikipedia.<ref>{{cite journal | author = Wissner-Gross, A. D. | url = http://www.alexwg.org/publications/ProcIEEEICALT_6-825.pdf | title=Preparation of topical readings lists from the link structure of Wikipedia | journal = Proceedings of the IEEE International Conference on Advanced Learning Technology | location = Rolduc, Netherlands | year = 2006 | page = 825 | doi = 10.1109/ICALT.2006.1652568}}</ref> A [[Web crawler]] may use PageRank as one of a number of importance metrics it uses to determine which URL to visit during a crawl of the web. One of the early working papers <ref>{{cite web | title=Working Papers Concerning the Creation of Google | work=Google | url=http://dbpubs.stanford.edu:8091/diglib/pub/projectdir/google.html | accessdate=November 29, 2006 }}</ref> that were used in the creation of Google is ''Efficient crawling through URL ordering'',<ref>{{cite journal |author = Cho, J., Garcia-Molina, H., and Page, L. | url = http://dbpubs.stanford.edu:8090/pub/1998-51 | title = Efficient crawling through URL ordering | journal = Proceedings of the seventh conference on World Wide Web | location = Brisbane, Australia | year = 1998}}</ref> which discusses the use of a number of different importance metrics to determine how deeply, and how much of a site Google will crawl. PageRank is presented as one of a number of these importance metrics, though there are others listed such as the number of inbound and outbound links for a URL, and the distance from the root directory on a site to the URL. The PageRank may also be used as a [http://de.scientificcommons.org/23846375 methodology] to measure the apparent impact of a community like the [[Blogosphere]] on the overall Web itself. This approach uses therefore the PageRank to measure the distribution of attention in reflection of the [[Scale-free network]] paradigm. In any ecosystem, a modified version of PageRank may be used to determine species that are essential to the continuing health of the environment.<ref>{{cite news|last=Burns |first=Judith |url=http://news.bbc.co.uk/2/hi/science/nature/8238462.stm |title=Google trick tracks extinctions |publisher=BBC News |date=2009-09-04 |accessdate=2011-05-27}}</ref> An application of PageRank to the analysis of protein networks in biology is reported recently.<ref>{{cite journal |doi = 10.1093/bioinformatics/btq680 |author = G. Ivan and V. Grolmusz | url = http://bioinformatics.oxfordjournals.org/content/27/3/405 | title = When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks | journal = Bioinformatics |volume = 27 |issue = 3 |pages = 405–7 | location = Vol. 27, No. 3. pp. 405-407 | year = 2011 |pmid = 21149343}}</ref> ==nofollow== In early 2005, Google implemented a new value, "[[nofollow]]",<ref>{{cite web | title=Preventing Comment Spam | work=Google | url=http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html | accessdate=January 1, 2005 }}</ref> for the [[Semantic link|rel]] attribute of HTML link and anchor elements, so that website developers and [[blog]]gers can make links that Google will not consider for the purposes of PageRank—they are links that no longer constitute a "vote" in the PageRank system. The nofollow relationship was added in an attempt to help combat [[spamdexing]]. As an example, people could previously create many message-board posts with links to their website to artificially inflate their PageRank. With the nofollow value, message-board administrators can modify their code to automatically insert "rel='nofollow'" to all hyperlinks in posts, thus preventing PageRank from being affected by those particular posts. This method of avoidance, however, also has various drawbacks, such as reducing the link value of legitimate comments. (See: [[Spam in blogs#nofollow]]) In an effort to manually control the flow of PageRank among pages within a website, many webmasters practice what is known as PageRank Sculpting<ref>{{cite web|url=http://www.seomoz.org/blog/pagerank-sculpting-parsing-the-value-and-potential-benefits-of-sculpting-pr-with-nofollow |title=PageRank Sculpting: Parsing the Value and Potential Benefits of Sculpting PR with Nofollow |publisher=SEOmoz |date= |accessdate=2011-05-27}}</ref>—which is the act of strategically placing the nofollow attribute on certain internal links of a website in order to funnel PageRank towards those pages the webmaster deemed most important. This tactic has been used since the inception of the nofollow attribute, but may no longer be effective since Google announced that blocking PageRank transfer with nofollow does not redirect that PageRank to other links.<ref>{{cite web|url=http://www.mattcutts.com/blog/pagerank-sculpting/ |title=PageRank sculpting |publisher=Mattcutts.com |date=2009-06-15 |accessdate=2011-05-27}}</ref> == Deprecation == PageRank was once available for the verified site maintainers through the Google Webmaster Tools interface. However on October 15, 2009, a Google employee confirmed<ref name="Moskwa"/> that the company had removed PageRank from its ''Webmaster Tools'' section, explaining that "We’ve been telling people for a long time that they shouldn’t focus on PageRank so much; many site owners seem to think it's the most important metric for them to track, which is simply not true."<ref name="Moskwa">{{cite web | author = Susan Moskwa | title = PageRank Distribution Removed From WMT | contribution = PageRank Distribution Removed From WMT | url = http://www.google.com/support/forum/p/Webmasters/thread?tid=6a1d6250e26e9e48&hl=en | accessdate=October 16, 2009 | postscript = <!-- Bot inserted parameter. Either remove it; or change its value to "." for the cite to end in a ".", as necessary. -->{{inconsistent citations}} }}</ref> The PageRank indicator is not available in Google's own [[Google Chrome|Chrome]] browser. The visible page rank is updated very infrequently. After the rank in the Google toolbar was updated in 3 April 2010,<ref name="autogenerated1"/> there was no confirmed update for more than 9 months, while it was finally updated afterwards{{Citation needed|date=August 2011}}<!-- most likely was, cannot find original reference from Google -->. Some authors ask not ''when'' but rather ''if'' the rank will be updated again.<ref name='ditch'>{{cite web|url=http://www.infocarnivore.com/2010/12/05/is-google-ditching-pagerank/ |title=Is Google Ditching PageRank? |publisher=Infocarnivore.com |date=2010-12-05 |accessdate=2011-05-27}}</ref> Google may be moving away from a publicly visible page rank in order to motivate work on content quality by discouraging [[content farm]]ing as a means of boosting a site's reputation. A visible, high rank can inflate the value of a site, and provides useful feedback for paid [[search engine optimization|SEOs]] that add no value to the system.<ref name='ditch'/> In addition, Google has been criticized and [[Search engine optimization#Legal precedents|even sued]] for perceived low ranks. On {{date|6 October 2011}}, many users mistakenly thought Google PageRank was gone. As it turns out, it was simply an update to the URL used to query the PageRank from Google.<ref name=whatculture>{{cite news|first=Peter|url=http://whatculture.com/technology/google-pagerank-is-not-dead.php|accessdate=7 October 2011|newspaper=WhatCulture!|date=6 October 2011}}</ref> ==Notes== {{Reflist|2}} ===Centrality measures=== Information about the relative importance of nodes and edges in a graph can be obtained through [[centrality]] measures, widely used in disciplines like [[sociology]]. For example, [[eigenvector centrality]] uses the [[eigenvectors]] of the [[adjacency matrix]] to determine nodes that tend to be frequently visited. ==Spread of content in networks== Content in a [[complex network]] can spread via two major methods: conserved spread and non-conserved spread.<ref>Newman, M., Barabási, A.-L., Watts, D.J. [eds.] (2006) The Structure and Dynamics of Networks. Princeton, N.J.: Princeton University Press.</ref> In conserved spread, the total amount of content that enters a complex network remains constant as it passes through. The model of conserved spread can best be represented by a pitcher containing a fixed amount of water being poured into a series of funnels connected by tubes . Here, the pitcher represents the original source and the water is the content being spread. The funnels and connecting tubing represent the nodes and the connections between nodes, respectively. As the water passes from one funnel into another, the water disappears instantly from the funnel that was previously exposed to the water. In non-conserved spread, the amount of content changes as it enters and passes through a complex network. The model of non-conserved spread can best be represented by a continuously running faucet running through a series of funnels connected by tubes . Here, the amount of water from the original source is infinite Also, any funnels that have been exposed to the water continue to experience the water even as it passes into successive funnels. The non-conserved model is the most suitable for explaining the transmission of most [[infectious diseases]]. ==Interdependent networks== Interdependent networks is a system of coupled networks where nodes of one or more networks depend on nodes in other networks. Such dependencies are enhanced by the developments in modern technology. Dependencies may lead to cascading failures between the networks and a relatively small failure can lead to a catastrophic breakdown of the system. Blackouts are a fascinating demonstration of the important role played by the dependencies between networks. A recent study developed a framework to study the cascading failures in an interdependent networks system.<ref>{{Cite journal|author = S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, S. Havlin |title = Catastrophic cascade of failures in interdependent networks|journal = Nature |volume = 464 |pages = 1025–28 |year = 2010 | doi=10.1038/nature08932 |issue=7291|url=http://havlin.biu.ac.il/Publications.php?keyword=Catastrophic+cascade+of+failures+in+interdependent+networks&year=*&match=all}}</ref> <ref>{{cite journal|last=Jianxi Gao|first=Sergey V. Buldyrev3, Shlomo Havlin4, and H. Eugene Stanley|title=Robustness of a Network of Networks|journal=Phys. Rev. Lett|year=2011|volume=107|pages=195701|url=http://havlin.biu.ac.il/Publications.php?keyword=Robustness+of+a+Tree-like+Network+of+Interdependent+Networks&year=*&match=all | doi = 10.1103/PhysRevLett.107.195701 }}</ref> ==Network optimization== Network problems that involve finding an optimal way of doing something are studied under the name of [[combinatorial optimization]]. Examples include [[flow network|network flow]], [[shortest path problem]], [[transport problem]], [[transshipment problem]], [[location problem]], [[Matching (graph theory)|matching problem]], [[assignment problem]], [[packing problem]], [[routing| routing problem]], [[Critical Path Analysis]] and [[PERT]] (Program Evaluation & Review Technique). ==Network Analysis and Visualization Tools== *[[Graph-tool]] and [[NetworkX]], [[Free Software|free]] and efficient Python modules for manipulation and statistical analysis of networks. [http://graph-tool.skewed.de/] [http://networkx.lanl.gov/] *[[Orange (software)|Orange]], a free data mining software suite, module [http://www.ailab.si/orange/doc/modules/orngNetwork.htm orngNetwork] *[http://pajek.imfm.si/doku.php Pajek], program for (large) network analysis and visualization. *[[Tulip (software)|Tulip]], a free data mining and visualization software dedicated to the analysis and visualization of relational data. [http://tulip.labri.fr/] ==See also== * [[Complex network]] * [[Collaborative innovation network]] * [[Dynamic network analysis]] * [[Higher category theory]] * [[Immune network theory]] * [[Irregular warfare]] * [[Network Theory in Risk Assessment]] * [[Polytely]] * [[Systems theory]] *[[Constructal law]]<ref>[http://www.constructal.org/en/art/Phil.%20Trans.%20R.%20Soc.%20B%20%282010%29%20365,%201335%961347.pdf] Bejan A., Lorente S., The Constructal Law of Design and Evolution in Nature. ''Philosophical Transactions of the Royal Society B'', Biological Science, Vol. 365, 2010, pp. 1335-1347.</ref> *[[Percolation]] *[[Network Theory in Risk Assessment]] *[[Network topology]] *[[Network analyzer]] *[[Small-world networks]] *[[Social-circles network model|Social circles]] *[[Scale-free networks]] *[[Sequential dynamical system]] ==References== * "Network Science Center," http://www.dodccrp.org/files/Network_Science_Center.asf * "Connected: The Power of Six Degrees," http://ivl.slis.indiana.edu/km/movies/2008-talas-connected.mov * R. Cohen, K. Erez, D. ben-Avraham, S. Havlin, "[http://havlin.biu.ac.il/Publications.php?keyword=Resilience+of+the+Internet+to+random+breakdown&year=*&match=all Resilience of the Internet to random breakdown]" Phys. Rev. Lett. 85, 4626 (2000). * R. Cohen, K. Erez, D. ben-Avraham, S. Havlin, "[http://havlin.biu.ac.il/Publications.php?keyword=Breakdown+of+the+Internet+under+intentional+attack&year=*&match=all Breakdown of the Internet under intentional attack]" Phys. Rev. Lett. 86, 3682 (2001) * R. Cohen, S. Havlin, "[http://havlin.biu.ac.il/Publications.php?keyword=Scale-free+networks+are+ultrasmall&year=*&match=all Scale-free networks are ultrasmall]" Phys. Rev. Lett. 90, 058701 (2003) == Further reading== * "The Burgeoning Field of Network Science," http://themilitaryengineer.com/index.php?option=com_content&task=view&id=88 * S.N. Dorogovtsev and J.F.F. Mendes, ''Evolution of Networks: From biological networks to the Internet and WWW'', Oxford University Press, 2003, ISBN 0-19-851590-1 * ''Linked: The New Science of Networks'', A.-L. Barabási (Perseus Publishing, Cambridge * ''[http://www.nap.edu/catalog.php?record_id=11516 Network Science]'', Committee on Network Science for Future Army Applications, National Research Council. 2005. The National Academies Press (2005)ISBN 0-309-10026-7 * ''Network Science Bulletin'', USMA (2007) ISBN 978-1-934808-00-9 * ''The Structure and Dynamics of Networks'' Mark Newman, Albert-László Barabási, & Duncan J. Watts (The Princeton Press, 2006) ISBN 0-691-11357-2 * ''Dynamical processes on complex networks'', Alain Barrat, Marc Barthelemy, Alessandro Vespignani (Cambridge University Press, 2008) ISBN 978-0-521-87950-7 * ''Network Science: Theory and Applications'', Ted G. Lewis (Wiley, March 11, 2009) ISBN 0470331887 * ''Nexus: Small Worlds and the Groundbreaking Theory of Networks'', Mark Buchanan (W. W. Norton & Company, June 2003) ISBN 0393324427 * ''Six Degrees: The Science of a Connected Age'', Duncan J. Watts (W. W. Norton & Company, February 17, 2004) ISBN 0393325423 *[http://netwiki.amath.unc.edu/ netwiki] Scientific wiki dedicated to network theory *[http://www.networkcultures.org/networktheory/ New Network Theory] International Conference on 'New Network Theory' *[http://nwb.slis.indiana.edu/ Network Workbench]: A Large-Scale Network Analysis, Modeling and Visualization Toolkit *[http://www.orgnet.com/SocialLifeOfRouters.pdf Network analysis of computer networks] *[http://www.orgnet.com/orgnetmap.pdf Network analysis of organizational networks] *[http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/941/863 Network analysis of terrorist networks] *[http://www.orgnet.com/AJPH2007.pdf Network analysis of a disease outbreak] *[http://linkanalysis.wlv.ac.uk/ Link Analysis: An Information Science Approach] (book) *[http://gephi.org/2008/how-kevin-bacon-cured-cancer/ Connected: The Power of Six Degrees] (documentary) *[http://havlin.biu.ac.il/Publications.php?keyword=Identification+of+influential+spreaders+in+complex+networks++&year=*&match=all Influential Spreaders in Networks], M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, H.A. Makse, ''Nature Physics'' 6, 888 (2010) *[http://havlin.biu.ac.il/course4.php A short course on complex networks] == External links== * [http://www.netscience.usma.edu Network Science Center at the U.S. Military Academy at West Point, NY] * http://press.princeton.edu/titles/8114.html * http://www.cra.org/ccc/NSE.ppt.pdf * http://www.ifr.ac.uk/netsci08/ * [http://www.fis.ua.pt/grupoteorico/gteorico.htm GNET] &mdash; Group of Complex Systems & Random Networks * http://www.netsci09.net/ * [http://cns.slis.indiana.edu/cyber.html Cyberinfrastructure] * [http://www.prospectmagazine.co.uk/2010/02/let%E2%80%99s-all-be-friends/ Prof. Nicholas A Christakis' introduction to network science in Prospect magazine] *[http://havlin.biu.ac.il/videos1.php Video Lectures on complex networks] by [[Shlomo Havlin|Prof. Shlomo Havlin]] ==Notes== {{reflist}} [[Category:Cybernetics]] [[Category:Networks]] [[Category:Network theory]] [[fr:Science des réseaux]] [[nl:Netwerkwetenschap]] [[de:Netzwerkforschung]] [[es:Análisis de redes]] [[fr:Théorie des réseaux]] [[ko:네트워크 이론]] [[pt:Análise de rede]] [[fi:Verkkoteoria]]'
Whether or not the change was made through a Tor exit node (tor_exit_node)
0
Unix timestamp of change (timestamp)
1323737748