Browsed by
Tag: BGP

CCIE Studies: Performance Routing PfR/OER

CCIE Studies: Performance Routing PfR/OER


Hey fellow CCIE’s candidates and networking geeks. Today I want to step deep into the realm of PfR or Performance Routing. First let’s go back in time to the predecessor, Optimized Edge Routing or OER. As crazy as this sounds, OER came out in 2006 with IOS 12.3 . So, technically before all this SDN fanfare, Cisco actually decoupled the control (part of it at least) and data plane with OER/PfR back in the dizay.


OER/PfR was created to help with a major issue that plagues many mid-market customers even to this day, proper load sharing and/or balancing on the edge of the network. Who wants to have redundant Internet connections, possibly even with diverse providers and have one of those connection sit there idle until something blows up? The short answer, pretty much nobody. Your paying for that circuit, you should be using it. Well, Shaun why not just use BGP? Well that’s a great question! You sure could and advertise part of your networks off one connection and the remaining networks off the other connection. That would achieve a level of load sharing inbound to the enterprise. Traffic egressing out of the enterprise could also be split to share the two connections. Sometimes the issue with BGP peering is the complexity and requirements. When I worked at the SP, a class C (/24) was the longest prefix that you could advertise. I heard it’s now a /23, but that has not been confirmed. Working with ARIN for a direct assignment of two IPv4 /24’s will be an exercise in patience. Remember we are running out of IPv4 space, perhaps you could get some IPv6 block for half price… J/K All that said, it can be a pain in the you know what to make this happen and not all companies have the resources to manage that type of edge peering agreement with the providers.

Well that’s where OER/PfR comes into play. Let’s keep this simple because OER/PfR can be quite a deep subject. Rather than base forwarding decisions on destination and lowest cost metric, why not take a path’s characteristics into consideration such as jitter, delay, utilization, load distribution, packet loss/health, or even MOS score? That’s the power of OER/PfR!!!

This is right from

“PfR can also improve application availability by dynamically routing around network problems like black holes and brownouts that traditional IP routing may not detect. In addition, the intelligent load balancing capability of PfR can optimize path selection based on link use or circuit pricing.”

So, what did we do without BGP or OER/PfR? Typically, static routes with a floating static route for the redundant link using IP SLA/objecting for state monitoring (far end reachability). Again we are paying for something we can’t use. To quote Brian Dennis from INE. “It’s something we always accepted, like STP. You paying for something you can’t use”. The good news, you don’t need to live in that world any more. We have evolved with technologies like Fabric Path/TRILL, vPC, OER/PfR, SDN. Man, it’s a good time to be into networking!

Let’s think about some use cases: Internet connection load sharing/balancing, application specific traffic steering based on performance (latency), loss/delay sensitive hosted IP telephony traffic, leverage burstable based circuits, etc…

In summary, PfR allows the network to intelligently choose link resources as needed to reduce operational costs. Sounds like a sales pitch right? Well I am a Cisco SE after all, it’s in my DNA plus I found that diddy in one of the PfR FAQs.

OK, now that you have an good background on the origins of OER/PfR, let’s talk about the major difference between OER and PfR. In short, OER was destination prefix based and PfR expanded the capabilities to include route control on a per application basis.

Let’s also get one major thing out of the way first before we drill into the specifics. With a holistic view of the EDGE network your able to accomplish this level of traffic engineering on a per application level. If there is something wrong within the PfR network devices the traffic will FALL BACK to old school forwarding. Got that? No catastrophic failure where the routers are sticking their hands up screaming for help.


OK, let’s talk a little about the components required for a PfR edge network.

***IOS 15.1+ minimum recommended for production network***

Versioning: Major versions must match! If running 12.4(T) the version is 2.x. Is running IOS 15 the version is 3.x. It’s OK to have say a 2.1 and a 2.2, but not a 2.x and a 3.x version, this is NOT supported. 

Border Router (BR): In the data plane of the edge network, monitors prefixes and reports back to MC. 
Master Controller (MC):
 Centralized control plane for central processoring and database for statistics collection. 
1x Internal Interface-
BRs ONLY peer with each other over internal interfaces (directly connected or via tunnel). Also used between BR and MC.
2x External Interfaces- OER/PfR expects traffic to flow between internal and external interfaces.
Route Control: Parent Route REQUIRED! This explanation is right from the Cisco FAQ.

A parent route is a route that is equal to, or less specific than, the destination prefix of the traffic class being optimized by Performance Routing. The parent route should have a route through the Performance Routing external interfaces. All routes for the parent prefix are called parent routes. For Performance Routing to control a traffic class on a Performance Routing external interface, the parent route must exist on the Performance Routing external interface. BGP and Static routes qualify as Performance Routing parent routes. In Cisco IOS Release 12.4(24)T and later releases, any route in RIB, with an equal or less specific mask than the traffic class, will qualify as a parent route.

For any route that PfR modifies or controls (BGP, Static, PIRO, EIGRP, PBR), having a Parent prefix in the routing table eliminates the possibility of a routing loop occurring. This is naturally a good thing to prevent in routed networks.

Now, since I’m an active CCIE candidate I’m gonna say this, IOS 12.4(T) has bugs with PfR. For one, the command operative syntax is still “OER” and certain functionality just seems downright broken. My lab consists of real 3560’s and ISR routers, so it’s not like I’m using emulation/GNS/dynamips and that’s my issue. I cannot stress enough, if doing a POC in a non-PROD environment feel free to use IOS 12.4(T). In a “real world” production environment, never settle for less than 15.1. ASR 1K requires IOS XE 2.6 or higher for PfR support.

Hardware Platform Support: ISR G1(RIP), G2, ASR, 7600, Cat6500, and 7200’s (RIP)
Classic IOS Feature Set Required: SP Services/Advance IP/Enterprise/Advance Enterprise
Universal IOS Image: Data Package required


Pfr faq fig3.jpg

I was going to use a complex CCIE sample config, but there are so many good examples of PfR already on the Cisco PfR Wiki.

Instead, let me concentrate on the basic requirements starting with the border router.

BR Config: 

key chain PFR
 key 1
  key-string PFR

oer border
 local Loopback0
 master key-chain PFR

ip route Serial1/2 (PARENT ROUTE)
ip route Serial1/1 (PARENT ROUTE)

MC Config: 

oer master
border key-chain PFR
interface Serial1/2 external
interface Serial1/1 external
interface Serial1/0 internal
periodic-interval 0
monitor-period 1
mode route control
resolve utilization priority 1 variance 10
no resolve delay
no resolve range


Granted, this is the most basic form of route control, but it will inject a route for the monitored prefix based on interface throughput utilization. I believe the default is 75% utilized.

Here are some useful commands to monitor/troubleshoot PfR.

“show pfr/oer master”

Conn Status: SUCCESS, PORT: 3949
Version: 2.2
Number of Border routers: 1
Number of Exits: 2
Number of monitored prefixes: 1 (max 5000)
Max prefixes: total 5000 learn 2500
Prefix count: total 1, learn 1, cfg 0
PBR Requirements met
Nbar Status: Inactive

Border Status UP/DOWN AuthFail Version ACTIVE UP 03:29:17 0 2.2

Global Settings:
max-range-utilization percent 20 recv 0
mode route metric bgp local-pref 5000
mode route metric static tag 5000
trace probe delay 1000
exit holddown time 60 secs, time remaining 0

Default Policy Settings:
backoff 300 3000 300
delay relative 50
holddown 300
periodic 0
probe frequency 56
number of jitter probe packets 100
mode route control
mode monitor both
mode select-exit good
loss relative 10
jitter threshold 20
mos threshold 3.60 percent 30
unreachable relative 50
resolve utilization priority 1 variance 10

Learn Settings:
current state : STARTED
time remaining in current state : 115 seconds
no delay
no inside bgp
no protocol
monitor-period 1
periodic-interval 0
aggregation-type prefix-length 24
prefixes 100
expire after time 720

“show pfr/oer master border detail” 

Border Status UP/DOWN AuthFail Version8.8.8.8 ACTIVE UP 03:31:46 0 2.2

External Capacity Max BW BW Used Load Status Exit Id
Interface (kbps) (kbps) (kbps) (%)
——— ——– —— ——- ——- —— ——
Se1/2 Tx 1544 1158 0 0 UP 2
Rx 1544 0 0
Se1/1 Tx 1544 1158 0 0 UP 1
Rx 1544 0 0

“show ip cache flow”

IP packet size distribution (25713 total packets):
1-32 64 96 128 160 192 224 256 288 320 352 384 416 448 480
.000 .040 .000 .200 .000 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000

512 544 576 1024 1536 2048 2560 3072 3584 4096 4608
.003 .000 .007 .000 .743 .000 .000 .000 .000 .000 .000

IP Flow Switching Cache, 4456704 bytes
2 active, 65534 inactive, 1007 added
16475 ager polls, 0 flow alloc failures
Active flows timeout in 1 minutes
Inactive flows timeout in 15 seconds
IP Sub Flow Cache, 533256 bytes
2 active, 16382 inactive, 1151 added, 1007 added to flow
0 alloc failures, 0 force free
1 chunk, 1 chunk added
last clearing of statistics never
Protocol Total Flows Packets Bytes Packets Active(Sec) Idle(Sec)
——– Flows /Sec /Flow /Pkt /Sec /Flow /Flow
TCP-Telnet 10 0.0 256 144 0.1 19.5 6.9
TCP-other 59 0.0 68 110 0.2 9.0 2.3
ICMP 13 0.0 1470 1500 1.3 52.3 3.5
Total: 82 0.0 313 1146 1.8 17.1 3.1

SrcIf SrcIPaddress DstIf DstIPaddress Pr SrcP DstP Pkts

“show pfr/oer master traffic-class”

OER Prefix Statistics:
Pas – Passive, Act – Active, S – Short term, L – Long term, Dly – Delay (ms),
P – Percentage below threshold, Jit – Jitter (ms),
MOS – Mean Opinion Score
Los – Packet Loss (packets-per-million), Un – Unreachable (flows-per-million),
E – Egress, I – Ingress, Bw – Bandwidth (kbps), N – Not applicable
U – unknown, * – uncontrolled, + – control more specific, @ – active probe all
# – Prefix monitor mode is Special, & – Blackholed Prefix
% – Force Next-Hop, ^ – Prefix is denied

DstPrefix Appl_ID Dscp Prot SrcPort DstPort SrcPrefix
Flags State Time CurrBR CurrI/F Protocol
PasSDly PasLDly PasSUn PasLUn PasSLos PasLLos EBw IBw
ActSDly ActLDly ActSUn ActLUn ActSJit ActPMOS ActSLos ActLLos
——————————————————————————– N defa N N N N
U U 0 0 0 0 0 0
U U 0 0 N N N N

“show oer border routes static”

Flags: C – Controlled by oer, X – Path is excluded from control,
E – The control is exact, N – The control is non-exact

Flags Network Parent Tag
CE 5000


Well folks, that’s all the steam I have left after pouring out my heart on PfR/OER. I hope this post was informative. Please drop me a line if you have any questions or I was not clear on any of my points. I appreciate any and all feedback. In my mind, Cisco gave us a glimpse into the future of networking way back in 2006. With data center technologies evolving on a daily basis, it’s only a matter of time before there is an MC for the enterprise network rather than just the edge. Heck Google is doing that already with 25% of all the Internet traffic TODAY! Until next time, keep those blinky lights flashing.




Unlink IGP’s, BGP does not use metrics to select best path. Instead, BGP is vector based. This path is determined with Path Attributes (PA’s). The default PA, if no others are set is AS-PATH. Shortest path to destination prefix is the best path.

Building the neighbor relationship:

TCP Port 179 (established based on neighbor address), Open, Established, and finally Updates (contains the prefix information). If there is a problem/error a “notification” message is sent.

Keepalive is 60 and hold time is 3 times or 180sec. Sent in Open message and they DO NOT have to match. Lower of the two is used mutually.

Authentication: MD5 only
Loopbacks require extra TTL hop, so multihop may be necessary for eBGP neighbors. (iBGP TTL is 255, eBGP TTL is 1). Overcome eBGP with “ebgp-multihop 255”

Two components to the BGP Table
1) NRLI: Prefix and mask
2) PA’s (NRLI’s that share the same PA’s)

Redistribution: When redistributing INTO BGP, if the metric is set it will alter the MED PA.
Auto-summary only affects network injection locally either through redistribution or the “network” command.  

Use “aggregate-address” to preform manual summarization. AS-SET will hold a list of the unordered ASN’s in the component subnets. Without this option the AS_PATH is set to NULL. Could be good to hide originating path, bad because it can create a route loop.
A summayr can also be made with a local static route to null0 and injected with the “network” command. This will NOT suppress component subnets.

BGP Sync: Not really used today because the BGP table (full) is too big to redistribute into IGP. Use RR’s or Confeds. It was designed to prevent black-holing but in reality, is not used anymore because in order for a BGP route to be considered best an IGP has to have the route. If concerned about the number of devices that have to run BGP, you could use MPLS.

Redistribution solves the routing to black-hole and sync solves the problem of advertising a black-hole route to another AS. USE WITH CAUTION WHEN REDISTRIBUTING BGP INTO IGP. 

Without RR or Confederations, a full mesh of iBGP peers is required. If you have more than 3 BGP nodes, this would be a royal pain in the tush. Full Mesh formula is n(n-1)/2.

 (8) Node Example:  8*(8-1)/2 = 28 TCP connections! That’s too many. 

BGP: Server/Client (Use update source to force the “client”). Only necessary on one side, but it should be one both to ensure clarity.

eBGP neighbors must be directly connected. So, if your using loopbacks to peer the “disable-connected-check” command is required without modifying “eBGP-multihop”. The other option is just to modify the eBgp multihop.

Route Reflector: 

Route reflector violates the ability to learn routes from another iBGP neighbor. A new loop prevention mechanism must be used.

Originator ID: Originator of the prefix sent by the RR (used to prevent loops between the clients)
Cluster List/ID: Route reflector ID (used to prevent loops between RR’s)

An alternative to Route Reflectors, accomplishes the same functionality (no need for a full mesh), but is more intricate. Used for LARGE scale BGP deployments.

AS to be presented outside the Confederation (eBGP) is configured with the “bgp confederation id xxxxx”
For example my private ASN in the confed is 64512 and my public ASN is 75

router bgp 64512
bgp confederation id 75

SUB AS’s count as a single AS no matter how many sub AS’s are included in path. Lowest router-id wins metric tie.

If recursion cannot occur for the  “next-hop-ip” and “next-hop-self” is not enabled. The prefix will show in the BGP database but not in the route table because it’s not a “best” path “>”.

Another way to change the next-hop IP is using a route-map on the neighbor and “set ip next-hop x.x.x.x”. If you leave the match empty it will match all prefixes coming from the specified neighbor. This can be used in a TE use case, where the next-hop is not even the originating router.

Redistributing BGP into IGP: USE WITH CAUTION! If necessary, make sure to use AS-PATH access-list to limit the routes to the prefixes originating on the peer router. IGP’s can be overwhelmed by a full BGP Internet route table. On a side note: RIB failures in BGP are advertised to neighbors, to prevent this default behavior issue the following command under the BGP process. “BGP suppress-inactive”

iBGP into IGP redistribution is NOT recommended because of the potential of loops to occur. Remember with iBGP the as_path is NOT preserved. If you MUST do so with caution… You have been warned.

Override default behavior (not allowed to redistribute into IGP): BGP> “BGP redistribute-internal”

BGP “auto-summary” works with 1) Redistribution of routes into BGP or 2) using the network command to advertise a classful address.

BGP Best Path Selection:

1) Weight – (non-transitive/local only) Can be set per neighbor or per an inbound route-map
2) Local Preference (transitive within a single AS)- Can be set per an inbound route-map

Un-suppress on a per neighbor basis and use route-map to un-suppress/suppress globally. IN the route-map use deny on the prefix to be allowed and permit to suppress.

Local-AS: Use this is allow a peer to use a different ASN from the global. Could be used for an AS migration. “no-prepend” will remove oldAS from the sting for INCOMING prefixes. This does NOT work for advertised prefixes. “replace-as” will remove newAS from string. Finally, “dual-AS” allows for a peer to use either ASN for peering.

“Remove-private-AS” on external peers only.

BGP Timers:

BGP Scanner: Default of 60 seconds, Conditional route advertisements, next-hop check, imports routes, route dampening. Change with “bgp scan-time”

Route Refresh/Soft Reconfiguration: RR replaced Soft Reconfiguration.

Batch routing updates: Updates and keepalives change with “neighbor x.x.x.x  advertisement-interval <seconds)”.

Timers Hello/Hold: Default of 60 hello and 180 sec. hold.

BGP Fast Failover: By default, if an interface goes down the peer session will go down. This feature is good for PTP links but not so good for shared links. Disable it with “no bgp fast-external-fallover”

Fast peering: Use “neighbor x.x.x.x fall-over” iBGP or eBGP based on route availability to the peer.

BGP Nexthop trigger: Event drived and enabled by default. Change with “bgp nexthop trigger delay xx”.