André Joyal

From formulasearchengine
Revision as of 22:19, 4 July 2013 by 108.32.23.138 (talk) (External links: changed to use nlab template)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Picking the right webhosting service is vital. The last thing you wish to do is get a website released with a hosting service (after discovering to utilize that hosting service), then be disappointed with the service and need to mess around with moving your website to another hosting service.



I'm a delighted HostGator client. I utilize both HostGator and Bluehost. I suggest both. This short article is a thorough HostGator evaluation.

I'll begin with among the most important considerations: Cost

HostGator has several rates plans. They are as follows (since the date this short article was released):.

Hatchling Strategy: as low as $4.95 per month with a 3 year commitment. On this plan you can host one website.
Baby Plan: as reduced as $7.95 per month with a 3 year dedication. On this plan you can host an endless variety of websites.
Business Strategy: as reduced as $12.95 per month with 3 year commitment.

You do not need to commit for 3 years. With much shorter term commitments, you'll pay somewhat more per month.

You can likewise get your own specialized server if this is something you like to have.

If you loved this article and also you would like to obtain more info concerning http://www.hostgator1centcoupon.info/ i implore you to visit our own web site. Limitless variety of websites.

I wouldn't think about a hosting service that didn't let me host an unrestricted variety of websites for one regular monthly price under $10. I have numerous sites and I such as the versatility of having the ability to construct more websites at no added expense (except for signing up the domain).

If you go with the Infant Plan (this is the strategy I have) or Business Strategy, you can host as numerous websites on as numerous domain names as you like. This is where making use of a hosting service like HostGator can conserve you a lot of cash in the long run against making use of a website builder or having a website designer host your websites. When you utilize a website builder or have a website designer host your site, you'll normally pay additional for each extra site (or each added set of websites).

The disadvantage obviously, is you must handle your very own hosting. Nevertheless, luckily, this isn't really tough with the user friendly CPanel and technical support.

Domain registration.

You can easily register domain names with HostGator. You have to spend for each domain. REMEMBER to set your domain names on auto-renew (and that your billing details is set up to auto-pay) so your domain name registration does not lapse. Absolutely nothing can be worse than developing a terrific internet site and afterwards to lose it all due to the fact that you forgot to renew your domain name. It's possible then that somebody else registers your domain and you cannot get it once again. That might be devastating.

All HostGator prepares offer unrestricted bandwidth.

This is great and you ought to require this with any website hosting service. I would rule out using a hosting service that didn't provide unlimited bandwidth.

Simple CPanel Control panel.

You manage your sites with HostGator in a control panel referred to as a CPanel. The CPanel is an easy-to-use user interface to manage your sites and domains.

Easy website setup.

I specifically require with any hosting service that I utilize can set up WordPress with practically a single click or a series of simple clicks. HostGator provides Fantastico De Luxe and QuickInstall choices for easily setting up WordPress and many other scripts to develop your site (i.e. Joomla and others).

Access to 4,500 website design templates.

For any internet designer, this is huge. This is a fantastic way to develop websites inexpensively for customers. You can examine out these design templates for yourself on the HostGator website without having to sign up.

Free website home builders.

With an account, you can quickly develop an internet site with one of two website builders you get access to. The 2 website contractors are:.

Trendy Site Builder, and.
Website Studio website contractor.

Note, you can just use the website contractors for one website on your account. Exactly what this means is if you get an account where you can host unrestricted domains, you can just construct one website with a website contractor.

Email accounts.

You get endless POP3 e-mail accounts with SMTP. Having e-mail accounts on your customized domain names is more professional than a gmail or hotmail e-mail account.

45 day cash back assure.

You can get your refund if you cancel your account within 45 days if HostGator isn't for you.

Video tutorials.

HostGator offers you access to many video tutorials that step you with many processes.

Consumer support.

You can access live consumer support by means of the telephone and live talk. The operators for technical support know a lot about working in HostGator. Note, nonetheless, you will not get much help with specific scripts such as WordPress. If you have a concern about tailoring a WordPress theme, HostGator won't help you (I found this to be the case with Bluehost. What I do in these scenarios is inquire on my premium WordPress style support online forum and/or do general Google searches).

1 Criticism of HostGator.

I needed to call HostGator to verify my account upon opening it. This didn't take long, but it was an extra action. I would have chosen just to sign up and get going without having to call them for confirmation.

In probability and statistics, the Dirichlet-multinomial distribution is a probability distribution for a multivariate discrete random variable. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector α, and a set of discrete samples is drawn from the categorical distribution with probability vector p. The compounding corresponds to a Polya urn scheme. In document classification, for example, the distribution is used to represent the distributions of word counts for different document types.

Probability mass function

Conceptually, we are doing N independent draws from a categorical distribution with K categories. Let us represent the independent draws as random categorical variables zn for n=1N. Let us denote the number of times a particular category k has been seen (for k=1K) among all the categorical variables as nk. Note that knk=N. Then, we have two separate views onto this problem:

  1. A set of N categorical variables z1,,zN.
  2. A single vector-valued variable x=(n1,,nK), distributed according to a multinomial distribution.

The former case is a set of random variables specifying each individual outcome, while the latter is a variable specifying the number of outcomes of each of the K categories. The distinction is important, as the two cases have correspondingly different probability distributions.

The parameter of the categorical distribution is p=(p1,p2,,pK), where pk is the probability to draw value k; p is likewise the parameter of the multinomial distribution P(x|p). Rather than specifying p directly, we give it a conjugate prior distribution, and hence it is drawn from a Dirichlet distribution with parameter vector α=(α1,α2,,αK).

By integrating out p, we obtain a compound distribution. However, the form of the distribution is different depending on which view we take.

For a set of individual outcomes

Joint distribution

For categorical variables =z1,,zN, the marginal joint distribution is obtained by integrating out p:

Pr(α)=pPr(p)Pr(pα)dp

which results in the following explicit formula:

Pr(α)=Γ(A)Γ(N+A)k=1KΓ(nk+αk)Γ(αk)

where Γ is the gamma function, with

A=kαk and N=knk, and where nk=number of zn's with the value k.

Note that, although the variables z1,,zN do not appear explicitly in the above formula, they enter in through the nk values.

Conditional distribution

Another useful formula, particularly in the context of Gibbs sampling, asks what the conditional density of a given variable zn is, conditioned on all the other variables (which we will denote (n)). It turns out to have an extremely simple form:

Pr(zn=k(n),α)nk(n)+αk

where nk(n) specifies the number of counts of category k seen in all variables other than zn.

It may be useful to show how to derive this formula. In general, conditional distributions are proportional to the corresponding joint distributions, so we simply start with the above formula for the joint distribution of all the z1,,zN values and then eliminate any factors not dependent on the particular zn in question. To do this, we make use of the notation nk(n) defined above, and note that

nj={nj(n),if j=knj(n)+1,if j=k

We also use the fact that

Γ(n+1)=nΓ(n)

Then:

Pr(zn=k(n),α)Pr(zn=k,(n)α)=Γ(A)Γ(N+A)j=1KΓ(nj+αj)Γ(αj)j=1KΓ(nj+αj)=Γ(nk+αk)j=kΓ(nj+αj)=Γ(nk(n)+1+αk)j=kΓ(nj(n)+αj)=(nk(n)+αk)Γ(nk(n)+αk)j=kΓ(nj(n)+αj)=(nk(n)+αk)jΓ(nj(n)+αj)nk(n)+αk

In general, it is not necessary to worry about the normalizing constant at the time of deriving the equations for conditional distributions. The normalizing constant will be determined as part of the algorithm for sampling from the distribution (see Categorical distribution#Sampling). However, when the conditional distribution is written in the simple form above, it turns out that the normalizing constant assumes a simple form:

knk(n)+αk=A+knk(n)=A+N1

Hence

Pr(zn=k(n),α)=nk(n)+αkA+N1

This formula is closely related to the Chinese restaurant process, which results from taking the limit as K.

In a Bayesian network

In a larger Bayesian network in which categorical (or so-called "multinomial") distributions occur with Dirichlet distribution priors as part of a larger network, all Dirichlet priors can be collapsed provided that the only nodes depending on them are categorical distributions. The collapsing happens for each Dirichlet-distribution node separately from the others, and occurs regardless of any other nodes that may depend on the categorical distributions. It also occurs regardless of whether the categorical distributions depend on nodes additional to the Dirichlet priors (although in such a case, those other nodes must remain as additional conditioning factors). Essentially, all of the categorical distributions depending on a given Dirichlet-distribution node become connected into a single Dirichlet-multinomial joint distribution defined by the above formula. The joint distribution as defined this way will depend on the parent(s) of the integrated-out Dirichet prior nodes, as well as any parent(s) of the categorical nodes other than the Dirichlet prior nodes themselves.

In the following sections, we discuss different configurations commonly found in Bayesian networks. We repeat the probability density from above, and define it using the symbol DirMult(α):

Pr(α)=DirMult(α)=Γ(kαk)Γ(knk+αk)k=1KΓ(nk+αk)Γ(αk)
Multiple Dirichlet priors with the same hyperprior

Imagine we have a hierarchical model as follows:

αsome distributionθd=1MDirichletK(α)zd=1M,n=1NdCategoricalK(θd)

In cases like this, we have multiple Dirichet priors, each of which generates some number of categorical observations (possibly a different number for each prior). The fact that they are all dependent on the same hyperprior, even if this is a random variable as above, makes no difference. The effect of integrating out a Dirichlet prior links the categorical variables attached to that prior, whose joint distribution simply inherits any conditioning factors of the Dirichlet prior. The fact that multiple priors may share a hyperprior makes no difference:

Pr(α)=dDirMult(dα)

where d is simply the collection of categorical variables dependent on prior d.

Accordingly, the conditional probability distribution can be written as follows:

Pr(zdn=k(dn),α)nk,d(n)+αk

where nk,d(n) specifically means the number of variables among the set d, excluding zdn itself, that have the value k .

Note in particular that we need to count only the variables having the value k that are tied together to the variable in question through having the same prior. We do not want to count any other variables also having the value k.

Multiple Dirichlet priors with the same hyperprior, with dependent children

Now imagine a slightly more complicated hierarchical model as follows:

αsome distributionθd=1MDirichletK(α)zd=1M,n=1NdCategoricalK(θd)ϕsome other distributionwd=1M,n=1NdF(wdnzdn,ϕ)

This model is the same as above, but in addition, each of the categorical variables has a child variable dependent on it. This is typical of a mixture model.

Again, in the joint distribution, only the categorical variables dependent on the same prior are linked into a single Dirichlet-multinomial:

Pr(,𝕎α,ϕ)=dDirMult(dα)d=1Mn=1NdF(wdnzdn,ϕ)

The conditional distribution of the categorical variables dependent only on their parents and ancestors would have the identical form as above in the simpler case. However, in Gibbs sampling it is necessary to determine the conditional distribution of a given node zdn dependent not only on (dn) and ancestors such as α but on all the other parameters.

Note however that we derived the simplified expression for the conditional distribution above simply by rewriting the expression for the joint probability and removing constant factors. Hence, the same simplification would apply in a larger joint probability expression such as the one in this model, composed of Dirichlet-multinomial densities plus factors for many other random variables dependent on the values of the categorical variables.

This yields the following:

Pr(zdn=k(dn),𝕎,α,ϕ)(nk,d(n)+αk)F(wdnzdn,ϕ)

Here the probability density of F appears directly. To do random sampling over zdn, we would compute the unnormalized probabilities for all K possibilities for zdn using the above formula, then normalize them and proceed as normal using the algorithm described in the categorical distribution article.

NOTE: Correctly speaking, the additional factor that appears in the conditional distribution is derived not from the model specification but directly from the joint distribution. This distinction is important when considering models where a given node with Dirichlet-prior parent has multiple dependent children, particularly when those children are dependent on each other (e.g. if they share a parent that is collapsed out). This is discussed more below.

Multiple Dirichlet priors with shifting prior membership

Now imagine we have a hierarchical model as follows:

θsome distributionzn=1NCategoricalK(θ)αsome distributionϕk=1KDirichletV(α)wn=1NCategoricalV(ϕzn)

Here we have a tricky situation where we have multiple Dirichlet priors as before and a set of dependent categorical variables, but the relationship between the priors and dependent variables isn't fixed, unlike before. Instead, the choice of which prior to use is dependent on another random categorical variable. This occurs, for example, in topic models, and indeed the names of the variables above are meant to correspond to those in latent Dirichlet allocation. In this case, the set 𝕎 is a set of words, each of which is drawn from one of K possible topics, where each topic is a Dirichlet prior over a vocabulary of V possible words, specifying the frequency of different words in the topic. However, the topic membership of a given word isn't fixed; rather, it's determined from a set of latent variables . There is one latent variable per word, a K -dimensional categorical variable specifying the topic the word belongs to.

In this case, all variables dependent on a given prior are tied together (i.e. correlated) in a group, as before — specifically, all words belonging to a given topic are linked. In this case, however, the group membership shifts, in that the words are not fixed to a given topic but the topic depends on the value of a latent variable associated with the word. However, note that the definition of the Dirichlet-multinomial density doesn't actually depend on the number of categorical variables in a group (i.e. the number of words in the document generated from a given topic), but only on the counts of how many variables in the group have a given value (i.e. among all the word tokens generated from a given topic, how many of them are a given word). Hence, we can still write an explicit formula for the joint distribution:

Pr(𝕎α,)=k=1KDirMult(𝕎k,α)=k=1K[Γ(vαv)Γ(vnvk+αv)v=1VΓ(nvk+αv)Γ(αv)]

Here we use the notation nvk to denote the number of word tokens whose value is word symbol v and which belong to topic k.

The conditional distribution still has the same form:

Pr(wn=v𝕎(n),,α)nvk,(n)+αv

Here again, only the categorical variables for words belonging to a given topic are linked (even though this linking will depend on the assignments of the latent variables), and hence the word counts need to be over only the words generated by a given topic. Hence the symbol nvk,(n), which is the count of words tokens having the word symbol v, but only among those generated by topic k, and excluding the word itself whose distribution is being described.

(Note that the reason why excluding the word itself is necessary, and why it even makes sense at all, is that in a Gibbs sampling context, we repeatedly resample the values of each random variable, after having run through and sampled all previous variables. Hence the variable will already have a value, and we need to exclude this existing value from the various counts that we make use of.)

A combined example: LDA topic models

We now show how to combine some of the above scenarios to demonstrate how to Gibbs sample a real-world model, specifically a smoothed latent Dirichlet allocation (LDA) topic model.

The model is as follows:

αA Dirichlet hyperprior, either a constant or a random variableβA Dirichlet hyperprior, either a constant or a random variableθd=1MDirichletK(α)ϕk=1KDirichletV(β)zd=1M,n=1NdCategoricalK(θd)wd=1M,n=1NdCategoricalV(ϕzdn)

Essentially we combine the previous three scenarios: We have categorical variables dependent on multiple priors sharing a hyperprior; we have categorical variables with dependent children (the latent variable topic identities); and we have categorical variables with shifting membership in multiple priors sharing a hyperprior. Note also that in the standard LDA model, the words are completely observed, and hence we never need to resample them. (However, Gibbs sampling would equally be possible if only some or none of the words were observed. In such a case, we would want to initialize the distribution over the words in some reasonable fashion — e.g. from the output of some process that generates sentences, such as a machine translation model — in order for the resulting posterior latent variable distributions to make any sense.)

Using the above formulas, we can write down the conditional probabilities directly:

Pr(wdn=v𝕎(dn),,β)#𝕎vk,(dn)+βvPr(zdn=k(dn),wdn=v,𝕎(dn),α)(#kd,(dn)+αk)Pr(wdn=v𝕎(dn),,β)

Here we have defined the counts more explicitly to clearly separate counts of words and counts of topics:

#𝕎vk,(dn)=number of words having value v among topic k excluding wdn#kd,(dn)=number of topics having value k among document d excluding zdn

Note that, as in the scenario above with categorical variables with dependent children, the conditional probability of those dependent children appears in the definition of the parent's conditional probability. In this case, each latent variable has only a single dependent child word, so only one such term appears. (If there were multiple dependent children, all would have to appear in the parent's conditional probability, regardless of whether there was overlap between different parents and the same children, i.e. regardless of whether the dependent children of a given parent also have other parents. In a case where a child has multiple parents, the conditional probability for that child appears in the conditional probability definition of each of its parents.)

Note, critically, however, that the definition above specifies only the unnormalized conditional probability of the words, while the topic conditional probability requires the actual (i.e. normalized) probability. Hence we have to normalize by summing over all word symbols:

Pr(zdn=k(dn),wdn=v,𝕎(dn),α)(#kd,(dn)+αk)#𝕎vk,(dn)+βvv=1V(#𝕎vk,(dn)+βv)=(#kd,(dn)+αk)#𝕎vk,(dn)+βv#𝕎k+B1

where

#𝕎k=number of words generated by topic kB=v=1Vβv

It's also worth making another point in detail, which concerns the second factor above in the conditional probability. Remember that the conditional distribution in general is derived from the joint distribution, and simplified by removing terms not dependent on the domain of the conditional (the part on the left side of the vertical bar). When a node z has dependent children, there will be one or more factors F(z) in the joint distribution that are dependent on z. Usually there is one factor for each dependent node, and it has the same density function as the distribution appearing the mathematical definition. However, if a dependent node has another parent as well (a co-parent), and that co-parent is collapsed out, then the node will become dependent on all other nodes sharing that co-parent, and in place of multiple terms for each such node, the joint distribution will have only one joint term. We have exactly that situation here. Even though zdn has only one child wdn, that child has a Dirichlet co-parent that we have collapsed out, which induces a Dirichlet-multinomial over the entire set of nodes 𝕎k.

It happens in this case that this issue does not cause major problems, precisely because of the one-to-one relationship between zdn and wdn. We can rewrite the joint distribution as follows:

p(𝕎kzdn)=p(wdn𝕎k,(dn),zdn)p(𝕎k,(dn)zdn)=p(wdn𝕎k,(dn),zdn)p(𝕎k,(dn))p(wdn𝕎k,(dn),zdn)

where we note that in the set 𝕎k,(dn) (i.e. the set of nodes 𝕎k excluding wdn ), none of the nodes have zdn as a parent. Hence it can be eliminated as a conditioning factor (line 2), meaning that the entire factor can be eliminated from the conditional distribution (line 3).

A second example: Naive Bayes document clustering

Here is another model, with a different set of issues. This is an implementation of an unsupervised Naive Bayes model for document clustering. That is, we would like to classify documents into multiple categories (e.g. "spam" or "non-spam", or "scientific journal article", "newspaper article about finance", "newspaper article about politics", "love letter") based on textual content. However, we don't already know the correct category of any documents; instead, we want to cluster them based on mutual similarities. (For example, a set of scientific articles will tend to be similar to each other in word use but very different from a set of love letters.) This is a type of unsupervised learning. (The same technique can be used for doing semi-supervised learning, i.e. where we know the correct category of some fraction of the documents and would like to use this knowledge to help in clustering the remaining documents.)

The model is as follows:

αA Dirichlet hyperprior, either a constant or a random variableβA Dirichlet hyperprior, either a constant or a random variableθd=1MDirichletK(α)ϕk=1KDirichletV(β)zd=1MCategoricalK(θd)wd=1M,n=1NdCategoricalV(ϕzd)

In many ways, this model is very similar to the LDA topic model described above, but it assumes one topic per document rather than one topic per word, with a document consisting of a mixture of topics. This can be seen clearly in the above model, which is identical to the LDA model except that there is only one latent variable per document instead of one per word. Once again, we assume that we are collapsing all of the Dirichlet priors.

The conditional probability for a given word is almost identical to the LDA case. Once again, all words generated by the same Dirichlet prior are interdependent. In this case, this means the words of all documents having a given label — again, this can vary depending on the label assignments, but all we care about is the total counts. Hence:

Pr(wdn=v𝕎(dn),,β)#𝕎vk,(dn)+βv

where

#𝕎vk,(dn)=number of words having value v among documents with label k excluding wdn

However, there is a critical difference in the conditional distribution of the latent variables for the label assignments, which is that a given label variable has multiple children nodes instead of just one — in particular, the nodes for all the words in the label's document. This relates closely to the discussion above about the factor F(zd) that stems from the joint distribution. In this case, the joint distribution needs to be taken over all words in all documents containing a label assignment equal to the value of zd, and has the value of a Dirichlet-multinomial distribution. Furthermore, we cannot reduce this joint distribution down to a conditional distribution over a single word. Rather, we can reduce it down only to a smaller joint conditional distribution over the words in the document for the label in question, and hence we cannot simplify it using the trick above that yields a simple sum of expected count and prior. Although it is in fact possible to rewrite it as a product of such individual sums, the number of factors is very large, and is not clearly more efficient than directly computing the Dirichlet-multinomial distribution probability.

For a multinomial distribution over category counts

For a random vector of category counts x=(n1,,nK), distributed according to a multinomial distribution, the marginal distribution is obtained by integrating out p:

Pr(xα)=pPr(xp)Pr(pα)dp

which results in the following explicit formula:

Pr(xα)=N!k(nk!)Γ(A)Γ(N+A)kΓ(nk+αk)Γ(αk)

where A is defined as the sum A=αk. Note that this differs crucially from the above formula in having an extra term at the front that looks like the factor at the front of a multinomial distribution. Another form for this same compound distribution, written more compactly in terms of the beta function, B, is as follows:

Pr(xα)=NB(A,N)k:nk>0nkB(αk,nk).

Related distributions

The one-dimensional version of the multivariate Pólya distribution is known as the Beta-binomial distribution.

Uses

The multivariate Pólya distribution is used in automated document classification and clustering, genetics, economy, combat modeling, and quantitative marketing.

Template:Inline

See also

References

55 yrs old Metal Polisher Records from Gypsumville, has interests which include owning an antique car, summoners war hack and spelunkering. Gets immense motivation from life by going to places such as Villa Adriana (Tivoli).

my web site - summoners war hack no survey ios