The core Bitcoin network can scale to much higher transaction rates than are seen today, assuming that nodes in the network are primarily running on high end servers rather than desktops. Bitcoin was designed to support lightweight clients that only process small parts of the block chain (see simplified payment verification below for more details on this). A configuration in which the vast majority of users sync lightweight clients to full nodes.
The requirement of expensive full nodes is controversial. While having a high-capacity set of full nodes increases the on-blockchain transaction rate, the requirement of full nodes to be high capacity has a centralizing effect. Scalability solutions that don't require high capacity full nodes include sidechains and the Lightning Network.
VISA handles on average around 2,000 transactions per second (tps), so call it a daily peak rate of 4,000 tps. It has a peak capacity of around 56,000 transactions per second,  however they never actually use more than about a third of this even during peak shopping periods. 
PayPal, in contrast, handles around 10 million transactions per day for an average of 115 tps. 
Let's take 4,000 tps as starting goal. Obviously if we want Bitcoin to scale to all economic transactions worldwide, including cash, it'd be a lot higher than that, perhaps more in the region of a few hundred thousand tps. And the need to be able to withstand DoS attacks (which VISA does not have to deal with) implies we would want to scale far beyond the standard peak rates. Still, picking a target let us do some basic calculations even if it's a little arbitrary.
Today the Bitcoin network is restricted to a sustained rate of 7 tps due to the bitcoin protocol restricting block sizes to 1MB.
Increasing Block Size
Increasing the blocksize is the easiest solution to implement, as it only requires changing one line of code, and is possible today. However, increasing the power of full nodes increases the cost and increasing the cost of full nodes has a centralizing effect. While a miner can run a full node, they have less and less reason to as blocks become more and more expensive to process. As less miners run full nodes, power over the mining network and ability to reverse transactions increases.
The protocol has two parts. Nodes send "inv" messages to other nodes telling them they have a new transaction. If the receiving node doesn't have that transaction it requests it with a getdata.
The big cost is the crypto and block chain lookups involved with verifying the transaction. Verifying a transaction means some hashing and some ECDSA signature verifications. RIPEMD-160 runs at 106 megabytes/sec (call it 100 for simplicity) and SHA256 is about the same. So hashing 1 megabyte should take around 10 milliseconds and hashing 1 kilobyte would take 0.01 milliseconds - fast enough that we can ignore it.
Bitcoin is currently able (with a couple of simple optimizations that are prototyped but not merged yet) to perform around 8000 signature verifications per second on an quad core Intel Core i7-2670QM 2.2Ghz processor. The average number of inputs per transaction is around 2, so we must halve the rate. This means 4000 tps is easily achievable CPU-wise with a single fairly mainstream CPU.
As we can see, this means as long as Bitcoin nodes are allowed to max out at least 4 cores of the machines they run on, we will not run out of CPU capacity for signature checking unless Bitcoin is handling 100 times as much traffic as PayPal. As of late 2012 the network is handling 0.5 transactions/second, so even assuming enormous growth in popularity we will not reach this level for a long time.
Of course Bitcoin does other things beyond signature checking, most obviously, managing the database. We use LevelDB which does the bulk of the heavy lifting on a separate thread, and is capable of very high read/write loads. Overall Bitcoin's CPU usage is dominated by ECDSA.
Let's assume an average rate of 2000tps, so just VISA. Transactions vary in size from about 0.2 kilobytes to over 1 kilobyte, but it's averaging half a kilobyte today.
That means that you need to keep up with around 8 megabits/second of transaction data (2000tps * 512 bytes) / 1024 bytes in a kilobyte / 1024 kilobytes in a megabyte = 0.97 megabytes per second * 8 = 7.8 megabits/second.
This sort of bandwidth is already common for even residential connections today, and is certainly at the low end of what colocation providers would expect to provide you with.
When blocks are solved, the current protocol will send the transactions again, even if a peer has already seen it at broadcast time. Fixing this to make blocks just list of hashes would resolve the issue and make the bandwidth needed for block broadcast negligable. So whilst this optimization isn't fully implemented today, we do not consider block transmission bandwidth here.
At very high transaction rates each block can be over half a gigabyte in size.
It is not required for most fully validating nodes to store the entire chain. In Satoshi's paper he describes "pruning", a way to delete unnecessary data about transactions that are fully spent. This reduces the amount of data that is needed for a fully validating node to be only the size of the current unspent output size, plus some additional data that is needed to handle re-orgs. As of October 2012 (block 203258) there have been 7,979,231 transactions, however the size of the unspent output set is less than 100MiB, which is small enough to easily fit in RAM for even quite old computers.
Only a small number of archival nodes need to store the full chain going back to the genesis block. These nodes can be used to bootstrap new fully validating nodes from scratch but are otherwise unnecessary.
The primary limiting factor in Bitcoin's performance is disk seeks once the unspent transaction output set stops fitting in memory. It is quite possible that the set will always fit in memory on dedicated server class machines, if hardware advances faster than Bitcoin usage does.
To account for 2 transactions per day for every human on earth, blocks would need to be 24GB, and would require the processing power of approximately 40 Intel Core i7-2670QM 2.2Ghz processors. Some argue that as block sizes increase, we cannot simply throw more resources at processing blocks unless said resources become cheaper at a rate faster than the need for these resources increases. Otherwise our result is a few, or just one, centralized parties that control the blockchain and are able to reverse or censor transactions, evidenced by the network previously beginning to converge to centralization to under much much smaller load. Others argue that under any realistic maximum usage scenario, like in the case of 24 GB blocks, full node requirements would not be so great. This is argued in spite of the fact that the processor requirements are 40 i7s costing $6000, one petabyte worth of disk space per year costing $30,000, and $50,000 yearly for 2PB of bandwidth (that is with the generous assumption that upload costs are the same as download costs).
The Lightning Network can provide trustless off-chain transactions and scales better than O(N*M) (requiring every full node to have every unspent output). Other advantages include not having a counterparty risk, no one can steal your bitcoin. Despite Lightning transactions being trustless, the system doesn't require everyone to process a copy of transactions from everyone else in order to be secure. In addition, Lightning has instant confirmations and doesn't require a hardfork. Due to these assumed qualities of the Lightning Network, those wary of the potential centralizing effects of large blocks believe this technology can replace increases in block size as the means to scale Bitcoin usage.
The description above applies to the current software with only minor optimizations assumed (the type that can and have been done by one man in a few weeks).
However there is potential for even greater optimizations to be made in future, at the cost of some additional complexity.
Pieter Wuille has implemented a custom verifier tuned for secp256k1 that can go beyond 20,000 signature verifications per second. It would require careful review by professional cryptographers to gain as much confidence as OpenSSL, but this can easily halve CPU load.
Algorithms exist to accelerate batch verification over elliptic curve signatures. This means if there are several inputs that are signing with the same key, it's possible to check their signatures simultaneously for another 9x speedup. This is a somewhat more complex implementation, however, it can work with existing signatures (clients don't need upgrading).
Newly developed digital signature algorithms, like ed25519 have extremely efficient software implementations that can reach speeds of nearly 80,000 verifications per second, even on an old Westmere CPU. That is a 10x improvement over secp256k1, and most likely is even higher on more modern CPUs. Supporting this in Bitcoin would mean a chain fork. This ECC algorithm has also been shown to be easily accelerated using GPUs, yielding scalar point multiplication times of less than a microsecond (i.e. a million point multiplications per second).
At these speeds it is likely that disk and memory bandwidth would become the primary limiting factor, rather than signature checking.
Simplified payment verification
It's possible to build a Bitcoin implementation that does not verify everything, but instead relies on either connecting to a trusted node, or puts its faith in high difficulty as a proxy for proof of validity. bitcoinj is an implementation of this mode.
In Simplified Payment Verification (SPV) mode, named after the section of Satoshi's paper that describes it, clients connect to an arbitrary full node and download only the block headers. They verify the chain headers connect together correctly and that the difficulty is high enough. They then request transactions matching particular patterns from the remote node (ie, payments to your addresses), which provides copies of those transactions along with a Merkle branch linking them to the block in which they appeared. This exploits the Merkle tree structure to allow proof of inclusion without needing the full contents of the block.
As a further optimization, block headers that are buried sufficiently deep can be thrown away after some time (eg. you only really need to store as low as 2016 headers).
The level of difficulty required to obtain confidence the remote node is not feeding you fictional transactions depends on your threat model. If you are connecting to a node that is known to be reliable, the difficulty doesn't matter. If you want to pick a random node, the cost for an attacker to mine a block sequence containing a bogus transaction should be higher than the value to be obtained by defrauding you. By changing how deeply buried the block must be, you can trade off confirmation time vs cost of an attack.
Programs implementing this approach can have fixed storage/network overhead in the null case of no usage, and resource usage proportional to received/sent transactions.
See also: Thin Client Security.
There are a few proposals for optimizing Bitcoin's scalability. Some of these require an alt-chain / hard fork.
- Ultimate blockchain compression - the idea that the blockchain can be compressed to achieve "trust-free lite nodes"
- The Finite Blockchain paper that describes splitting the blockchain into three data structures, each better suited for its purpose. The three data structures are a finite blockchain (keep N blocks into the past), an "account tree" which keeps account balance for every address with a non-zero balance, and a "proof chain" which is an (ever growing) slimmed down version of the blockchain.