如题所述
åè¨ï¼æ¬æ主è¦è®²è§£Linux IOè°åº¦å±çä¸ç§æ¨¡å¼ï¼cfpãdeadlineånoopï¼å¹¶ç»åºåèªçä¼ååéç¨åºæ¯å»ºè®®ã
IOè°åº¦åçå¨Linuxå æ ¸çIOè°åº¦å±ãè¿ä¸ªå±æ¬¡æ¯é对Linuxçæ´ä½IOå±æ¬¡ä½ç³»æ¥è¯´çãä»read()æè write()ç³»ç»è°ç¨çè§åº¦æ¥è¯´ï¼Linuxæ´ä½IOä½ç³»å¯ä»¥å为ä¸å±ï¼å®ä»¬åå«æ¯ï¼
VFSå±ï¼ èææ件系ç»å±ãç±äºå æ ¸è¦è·å¤ç§æ件系ç»æ交éï¼èæ¯ä¸ç§æ件系ç»æå®ç°çæ°æ®ç»æåç¸å ³æ¹æ³é½å¯è½ä¸å°½ç¸åï¼æ以ï¼å æ ¸æ½è±¡äºè¿ä¸å±ï¼ä¸é¨ç¨æ¥éé åç§æ件系ç»ï¼å¹¶å¯¹å¤æä¾ç»ä¸æä½æ¥å£ã
æ件系ç»å±ï¼ ä¸åçæ件系ç»å®ç°èªå·±çæä½è¿ç¨ï¼æä¾èªå·±ç¹æçç¹å¾ï¼å ·ä½ä¸å¤è¯´äºï¼å¤§å®¶æ¿æçè¯èªå·±å»ç代ç å³å¯ã
页ç¼åå±ï¼ è´è´£ç对pageçç¼åã
éç¨åå±ï¼ ç±äºç»å¤§å¤æ°æ åµçioæä½æ¯è·å设å¤æ交éï¼æ以Linuxå¨æ¤æä¾äºä¸ä¸ªç±»ä¼¼vfså±çå设å¤æä½æ½è±¡å±ãä¸å±å¯¹æ¥åç§ä¸åå±æ§çå设å¤ï¼å¯¹ä¸æä¾ç»ä¸çBlock IO请æ±æ åã
IOè°åº¦å± ï¼å 为ç»å¤§å¤æ°çå设å¤é½æ¯ç±»ä¼¼ç£çè¿æ ·ç设å¤ï¼æ以æå¿ è¦æ ¹æ®è¿ç±»è®¾å¤çç¹ç¹ä»¥ååºç¨çä¸åç¹ç¹æ¥è®¾ç½®ä¸äºä¸åçè°åº¦ç®æ³åéåã以便å¨ä¸åçåºç¨ç¯å¢ä¸æé对æ§çæé«ç£çç读åæçï¼è¿éå°±æ¯å¤§åé¼é¼çLinuxçµæ¢¯æèµ·ä½ç¨çå°æ¹ãé对æºæ¢°ç¡¬ççåç§è°åº¦æ¹æ³å°±æ¯å¨è¿å®ç°çã
å设å¤é©±å¨å±ï¼ 驱å¨å±å¯¹å¤æä¾ç¸å¯¹æ¯è¾é«çº§ç设å¤æä½æ¥å£ï¼å¾å¾æ¯Cè¯è¨çï¼èä¸å±å¯¹æ¥è®¾å¤æ¬èº«çæä½æ¹æ³åè§èã
å设å¤å±ï¼ è¿å±å°±æ¯å ·ä½çç©ç设å¤äºï¼å®ä¹äºåç§ç对设å¤æä½æ¹æ³åè§èã
æä¸ä¸ªå·²ç»æ´ç好ç[Linux IOç»æå¾]ï¼é常ç»å ¸ï¼ä¸å¾èåè¨ï¼
æ们ä»å¤©è¦ç 究çå 容主è¦å¨IOè°åº¦è¿ä¸å±ã
å®è¦è§£å³çæ ¸å¿é®é¢æ¯ï¼å¦ä½æé«å设å¤IOçæ´ä½æ§è½ï¼è¿ä¸å±ä¹ä¸»è¦æ¯é对æºæ¢°ç¡¬çç»æè设计çã
ä¼æå¨ç¥ï¼æºæ¢°ç¡¬ççåå¨ä»è´¨æ¯ç£çï¼ç£å¤´å¨ççä¸ç§»å¨è¿è¡ç£é寻åï¼è¡ä¸ºç±»ä¼¼ææ¾ä¸å¼ å±çã
è¿ç§ç»æçç¹ç¹æ¯ï¼é¡ºåºè®¿é®æ¶ååéè¾é«ï¼ä½æ¯å¦æä¸æ¦å¯¹ççæéæºè®¿é®ï¼é£ä¹å¤§éçæ¶é´é½ä¼æµªè´¹å¨ç£å¤´ç移å¨ä¸ï¼è¿æ¶åå°±ä¼å¯¼è´æ¯æ¬¡IOçååºæ¶é´åé¿ï¼æ大çéä½IOçååºé度ã
ç£å¤´å¨ççä¸å¯»éçæä½ï¼ç±»ä¼¼çµæ¢¯è°åº¦ï¼å®é ä¸å¨æå¼å§çæ¶æï¼Linuxæè¿ä¸ªç®æ³å½å为Linuxçµæ¢¯ç®æ³ï¼å³ï¼
å¦æå¨å¯»éçè¿ç¨ä¸ï¼è½æ顺åºè·¯è¿çç¸å ³ç£éçæ°æ®è¯·æ±é½â顺便âå¤çæï¼é£ä¹å°±å¯ä»¥å¨æ¯è¾å°å½±åååºé度çåæä¸ï¼æé«æ´ä½IOçååéã
è¿å°±æ¯æ们为ä»ä¹è¦è®¾è®¡IOè°åº¦ç®æ³çåå ã
ç®åå¨å æ ¸ä¸é»è®¤å¼å¯äºä¸ç§ç®æ³/模å¼ï¼noopï¼cfqådeadlineãä¸¥æ ¼ç®åºè¯¥æ¯ä¸¤ç§ï¼
å 为第ä¸ç§å«ånoopï¼å°±æ¯ç©ºæä½è°åº¦ç®æ³ï¼ä¹å°±æ¯æ²¡æä»»ä½è°åº¦æä½ï¼å¹¶ä¸å¯¹io请æ±è¿è¡æåºï¼ä» ä» åéå½çioå并çä¸ä¸ªfifoéåã
ç®åå æ ¸ä¸é»è®¤çè°åº¦ç®æ³åºè¯¥æ¯cfqï¼å«åå®å ¨å ¬å¹³éåè°åº¦ãè¿ä¸ªè°åº¦ç®æ³äººå¦å ¶åï¼å®è¯å¾ç»ææè¿ç¨æä¾ä¸ä¸ªå®å ¨å ¬å¹³çIOæä½ç¯å¢ã
注ï¼è¯·å¤§å®¶ä¸å®è®°ä½è¿ä¸ªè¯è¯ï¼cfqï¼å®å ¨å ¬å¹³éåè°åº¦ï¼ä¸ç¶ä¸æ就没æ³çäºã
cfq为æ¯ä¸ªè¿ç¨å建ä¸ä¸ªåæ¥IOè°åº¦éåï¼å¹¶é»è®¤ä»¥æ¶é´çå请æ±æ°éå®çæ¹å¼åé IOèµæºï¼ä»¥æ¤ä¿è¯æ¯ä¸ªè¿ç¨çIOèµæºå ç¨æ¯å ¬å¹³çï¼cfqè¿å®ç°äºé对è¿ç¨çº§å«çä¼å 级è°åº¦ï¼è¿ä¸ªæ们åé¢ä¼è¯¦ç»è§£éã
æ¥çåä¿®æ¹IOè°åº¦ç®æ³çæ¹æ³æ¯ï¼
cfqæ¯éç¨æå¡å¨æ¯è¾å¥½çIOè°åº¦ç®æ³éæ©ï¼å¯¹æ¡é¢ç¨æ·ä¹æ¯æ¯è¾å¥½çéæ©ã
ä½æ¯å¯¹äºå¾å¤IOååè¾å¤§çåºæ¯å°±å¹¶ä¸æ¯å¾éåºï¼å°¤å ¶æ¯IOååéä¸å¨æäºè¿ç¨ä¸çåºæ¯ã
å 为è¿ç§åºæ¯æ们éè¦æ´å¤ç满足æ个æè æå 个è¿ç¨çIOååºé度ï¼èä¸æ¯è®©ææçè¿ç¨å ¬å¹³ç使ç¨IOï¼æ¯å¦æ°æ®åºåºç¨ã
deadlineè°åº¦ï¼æç»æéè°åº¦ï¼å°±æ¯æ´éåä¸è¿°åºæ¯ç解å³æ¹æ¡ãdeadlineå®ç°äºå个éåï¼
å ¶ä¸ä¸¤ä¸ªåå«å¤çæ£å¸¸readåwriteï¼ææåºå·æåºï¼è¿è¡æ£å¸¸ioçå并å¤ç以æé«ååéãå 为IO请æ±å¯è½ä¼éä¸å¨æäºç£çä½ç½®ï¼è¿æ ·ä¼å¯¼è´æ°æ¥ç请æ±ä¸ç´è¢«å并ï¼å¯è½ä¼æå ¶ä»ç£çä½ç½®çio请æ±è¢«é¥¿æ»ã
å¦å¤ä¸¤ä¸ªå¤çè¶ æ¶readåwriteçéåï¼æ请æ±å建æ¶é´æåºï¼å¦ææè¶ æ¶ç请æ±åºç°ï¼å°±æ¾è¿è¿ä¸¤ä¸ªéåï¼è°åº¦ç®æ³ä¿è¯è¶ æ¶ï¼è¾¾å°æç»æéæ¶é´ï¼çéåä¸ç请æ±ä¼ä¼å 被å¤çï¼é²æ¢è¯·æ±è¢«é¥¿æ»ã
ä¸ä¹ åï¼å æ ¸è¿æ¯é»è®¤æ é åç§ç®æ³ï¼è¿æä¸ç§å«åasçç®æ³ï¼Anticipatory schedulerï¼ï¼é¢æµè°åº¦ç®æ³ãä¸ä¸ªé«å¤§ä¸çååï¼æå¾æä¸åº¦è®¤ä¸ºLinuxå æ ¸é½ä¼ç®å½äºã
ç»æåç°ï¼æ éæ¯å¨åºäºdeadlineç®æ³åioè°åº¦çä¹åçä¸å°ä¼æ¶é´ï¼å¦æè¿æ®µæ¶é´å æå¯ä»¥å并çio请æ±å°æ¥ï¼å°±å¯ä»¥å并å¤çï¼æé«deadlineè°åº¦çå¨é¡ºåºè¯»åæ åµä¸çæ°æ®ååéã
å ¶å®è¿æ ¹æ¬ä¸æ¯å¥é¢æµï¼æè§å¾ä¸å¦å«æ大è¿è°åº¦ç®æ³ï¼å½ç¶è¿ç§çç¥å¨æäºç¹å®åºæ¯å·®ææä¸éã
ä½æ¯å¨å¤§å¤æ°åºæ¯ä¸ï¼è¿ä¸ªè°åº¦ä¸ä» 没ææé«ååéï¼è¿éä½äºååºé度ï¼æ以å æ ¸å¹²èæå®ä»é»è®¤é ç½®éå é¤äºãæ¯ç«Linuxçå®æ¨æ¯å®ç¨ï¼èæ们ä¹å°±ä¸åè¿ä¸ªè°åº¦ç®æ³ä¸å¤è´¹å£èäºã
1ãcfqï¼å®å ¨å ¬å¹³éåè°åº¦
cfqæ¯å æ ¸é»è®¤éæ©çIOè°åº¦éåï¼å®å¨æ¡é¢åºç¨åºæ¯ä»¥å大å¤æ°å¸¸è§åºç¨åºæ¯ä¸é½æ¯å¾å¥½çéæ©ã
å¦ä½å®ç°ä¸ä¸ªæè°çå®å ¨å ¬å¹³éåï¼Completely Fair Queueingï¼ï¼
é¦å æ们è¦ç解æè°çå ¬å¹³æ¯å¯¹è°çå ¬å¹³ï¼ä»æä½ç³»ç»çè§åº¦æ¥è¯´ï¼äº§çæä½è¡ä¸ºç主ä½é½æ¯è¿ç¨ï¼æ以è¿éçå ¬å¹³æ¯é对æ¯ä¸ªè¿ç¨èè¨çï¼æ们è¦è¯å¾è®©è¿ç¨å¯ä»¥å ¬å¹³çå ç¨IOèµæºã
é£ä¹å¦ä½è®©è¿ç¨å ¬å¹³çå ç¨IOèµæºï¼æ们éè¦å ç解ä»ä¹æ¯IOèµæºãå½æ们衡éä¸ä¸ªIOèµæºçæ¶åï¼ä¸è¬å欢ç¨çæ¯ä¸¤ä¸ªåä½ï¼ä¸ä¸ªæ¯æ°æ®è¯»åç带宽ï¼å¦ä¸ä¸ªæ¯æ°æ®è¯»åçIOPSã
带宽就æ¯ä»¥æ¶é´ä¸ºåä½ç读åæ°æ®éï¼æ¯å¦ï¼100Mbyte/sãèIOPSæ¯ä»¥æ¶é´ä¸ºåä½ç读å次æ°ãå¨ä¸åç读åæ å¢ä¸ï¼è¿ä¸¤ä¸ªåä½ç表ç°å¯è½ä¸ä¸æ ·ï¼ä½æ¯å¯ä»¥ç¡®å®çæ¯ï¼ä¸¤ä¸ªåä½çä»»ä½ä¸ä¸ªè¾¾å°äºæ§è½ä¸éï¼é½ä¼æ为IOçç¶é¢ã
ä»æºæ¢°ç¡¬ççç»æèèï¼å¦æ读åæ¯é¡ºåºè¯»åï¼é£ä¹IOç表ç°æ¯å¯ä»¥éè¿æ¯è¾å°çIOPSè¾¾å°è¾å¤§ç带宽ï¼å 为å¯ä»¥å并å¾å¤IOï¼ä¹å¯ä»¥éè¿é¢è¯»çæ¹å¼å éæ°æ®è¯»åæçã
å½IOç表ç°æ¯ååäºéæºè¯»åçæ¶åï¼é£ä¹IOPSå°±ä¼åå¾æ´å¤§ï¼IOç请æ±çå并å¯è½æ§ä¸éï¼å½æ¯æ¬¡io请æ±æ°æ®è¶å°çæ¶åï¼å¸¦å®½è¡¨ç°å°±ä¼è¶ä½ã
ä»è¿éæ们å¯ä»¥ç解ï¼é对è¿ç¨çIOèµæºç主è¦è¡¨ç°å½¢å¼æä¸¤ä¸ªï¼ è¿ç¨å¨åä½æ¶é´å æ交çIO请æ±ä¸ªæ°åè¿ç¨å ç¨IOç带宽ã
å ¶å®æ 论åªä¸ªï¼é½æ¯è·è¿ç¨åé çIOå¤çæ¶é´é¿åº¦ç´§å¯ç¸å ³çã
ææ¶ä¸å¡å¯ä»¥å¨è¾å°IOPSçæ åµä¸å ç¨è¾å¤§å¸¦å®½ï¼å¦å¤ä¸äºåå¯è½å¨è¾å¤§IOPSçæ åµä¸å ç¨è¾å°å¸¦å®½ï¼æ以对è¿ç¨å ç¨IOçæ¶é´è¿è¡è°åº¦ææ¯ç¸å¯¹æå ¬å¹³çã
å³ï¼æä¸ç®¡ä½ æ¯IOPSé«è¿æ¯å¸¦å®½å ç¨é«ï¼å°äºæ¶é´å±å°±æ¢ä¸ä¸ä¸ªè¿ç¨å¤çï¼ä½ ç±åæ ·åæ ·ã
æ以ï¼cfqå°±æ¯è¯å¾ç»ææè¿ç¨åé çåçå设å¤ä½¿ç¨çæ¶é´çï¼è¿ç¨å¨æ¶é´çå ï¼å¯ä»¥å°äº§ççIO请æ±æ交ç»å设å¤è¿è¡å¤çï¼æ¶é´çç»æï¼è¿ç¨ç请æ±å°æè¿å®èªå·±çéåï¼çå¾ ä¸æ¬¡è°åº¦çæ¶åè¿è¡å¤çãè¿å°±æ¯cfqçåºæ¬åçã
å½ç¶ï¼ç°å®çæ´»ä¸ä¸å¯è½æçæ£çâå ¬å¹³âï¼å¸¸è§çåºç¨åºæ¯ä¸ï¼æ们å¾è¯è½éè¦äººä¸ºç对è¿ç¨çIOå ç¨è¿è¡äººä¸ºæå®ä¼å 级ï¼è¿å°±å对è¿ç¨çCPUå ç¨è®¾ç½®ä¼å 级çæ¦å¿µä¸æ ·ã
æ以ï¼é¤äºé对æ¶é´çè¿è¡å ¬å¹³éåè°åº¦å¤ï¼cfqè¿æä¾äºä¼å 级æ¯æãæ¯ä¸ªè¿ç¨é½å¯ä»¥è®¾ç½®ä¸ä¸ªIOä¼å 级ï¼cfqä¼æ ¹æ®è¿ä¸ªä¼å 级ç设置æ åµä½ä¸ºè°åº¦æ¶çéè¦åèå ç´ ã
ä¼å 级é¦å åæä¸å¤§ç±»ï¼RTãBEãIDLEï¼å®ä»¬åå«æ¯å®æ¶ï¼Real Timeï¼ãæä½³ææï¼Best Tryï¼åé²ç½®ï¼Idleï¼ä¸ä¸ªç±»å«ï¼å¯¹æ¯ä¸ªç±»å«çIOï¼cfqé½ä½¿ç¨ä¸åççç¥è¿è¡å¤çãå¦å¤ï¼RTåBEç±»å«ä¸ï¼åå«ååååäº8个åä¼å 级å®ç°æ´ç»èçQOSéæ±ï¼èIDLEåªæä¸ä¸ªåä¼å 级ã
å¦å¤ï¼æ们é½ç¥éå æ ¸é»è®¤å¯¹åå¨ç读åé½æ¯ç»è¿ç¼åï¼buffer/cacheï¼çï¼å¨è¿ç§æ åµä¸ï¼cfqæ¯æ æ³åºåå½åå¤çç请æ±æ¯æ¥èªåªä¸ä¸ªè¿ç¨çã
åªæå¨è¿ç¨ä½¿ç¨åæ¥æ¹å¼ï¼sync readæè sync wirteï¼æè ç´æ¥IOï¼Direct IOï¼æ¹å¼è¿è¡è¯»åçæ¶åï¼cfqæè½åºååºIO请æ±æ¥èªåªä¸ªè¿ç¨ã
æ以ï¼é¤äºé对æ¯ä¸ªè¿ç¨å®ç°çIOéå以å¤ï¼è¿å®ç°äºä¸ä¸ªå ¬å ±çéåç¨æ¥å¤çå¼æ¥è¯·æ±ã
å½åå æ ¸å·²ç»å®ç°äºé对IOèµæºçcgroupèµæºé离ï¼æ以å¨ä»¥ä¸ä½ç³»çåºç¡ä¸ï¼cfqä¹å®ç°äºé对cgroupçè°åº¦æ¯æã
æ»çæ¥è¯´ï¼cfqç¨äºä¸ç³»åçæ°æ®ç»æå®ç°äºä»¥ä¸ææå¤æåè½çæ¯æï¼å¤§å®¶å¯ä»¥éè¿æºä»£ç çå°å ¶ç¸å ³å®ç°ï¼æ件å¨æºä»£ç ç®å½ä¸çblock/cfq-iosched.cã
1.1 cfq设计åç
å¨æ¤ï¼æ们对æ´ä½æ°æ®ç»æåä¸ä¸ªç®è¦æè¿°ï¼é¦å ï¼cfqéè¿ä¸ä¸ªå«åcfq_dataçæ°æ®ç»æç»´æ¤äºæ´ä¸ªè°åº¦å¨æµç¨ãå¨ä¸ä¸ªæ¯æäºcgroupåè½çcfqä¸ï¼å ¨é¨è¿ç¨è¢«åæäºè¥å¹²ä¸ªcontral groupè¿è¡ç®¡çã
æ¯ä¸ªcgroupå¨cfqä¸é½æä¸ä¸ªcfq_groupçç»æè¿è¡æè¿°ï¼ææçcgroupé½è¢«ä½ä¸ºä¸ä¸ªè°åº¦å¯¹è±¡æ¾è¿ä¸ä¸ªçº¢é»æ ä¸ï¼å¹¶ä»¥vdisktime为keyè¿è¡æåºã
vdisktimeè¿ä¸ªæ¶é´çºªå½çæ¯å½åcgroupæå ç¨çioæ¶é´ï¼æ¯æ¬¡å¯¹cgroupè¿è¡è°åº¦æ¶ï¼æ»æ¯éè¿çº¢é»æ éæ©å½åvdisktimeæ¶é´æå°çcgroupè¿è¡å¤çï¼ä»¥ä¿è¯ææcgroupsä¹é´çIOèµæºå ç¨âå ¬å¹³âã
å½ç¶æ们ç¥éï¼cgroupæ¯å¯ä»¥å¯¹blkioè¿è¡èµæºæ¯ä¾åé çï¼å ¶ä½ç¨åçå°±æ¯ï¼åé æ¯ä¾å¤§çcgroupå ç¨vdisktimeæ¶é´å¢é¿è¾æ ¢ï¼åé æ¯ä¾å°çvdisktimeæ¶é´å¢é¿è¾å¿«ï¼å¿«æ ¢ä¸åé æ¯ä¾ææ£æ¯ã
è¿æ ·å°±åå°äºä¸åçcgroupåé çIOæ¯ä¾ä¸ä¸æ ·ï¼å¹¶ä¸å¨cfqçè§åº¦çæ¥ä¾ç¶æ¯âå ¬å¹³âçã
éæ©å¥½äºéè¦å¤ççcgroupï¼cfq_groupï¼ä¹åï¼è°åº¦å¨éè¦å³çéæ©ä¸ä¸æ¥çservice_treeã
service_treeè¿ä¸ªæ°æ®ç»æ对åºçé½æ¯ä¸ç³»åç红é»æ ï¼ä¸»è¦ç®çæ¯ç¨æ¥å®ç°è¯·æ±ä¼å 级åç±»çï¼å°±æ¯RTãBEãIDLEçåç±»ãæ¯ä¸ä¸ªcfq_groupé½ç»´æ¤äº7个service_treesï¼å ¶å®ä¹å¦ä¸ï¼
å ¶ä¸service_tree_idleå°±æ¯ç¨æ¥ç»IDLEç±»åç请æ±è¿è¡æéç¨ç红é»æ ã
èä¸é¢äºç»´æ°ç»ï¼é¦å 第ä¸ä¸ªç»´åº¦é对RTåBEåå«åå®ç°äºä¸ä¸ªæ°ç»ï¼æ¯ä¸ä¸ªæ°ç»ä¸é½ç»´æ¤äºä¸ä¸ªçº¢é»æ ï¼åå«å¯¹åºä¸ç§ä¸ååç±»åç请æ±ï¼åå«æ¯ï¼SYNCãSYNC_NOIDLE以åASYNCã
æ们å¯ä»¥è®¤ä¸ºSYNCç¸å½äºSYNC_IDLE并ä¸SYNC_NOIDLE对åºãidlingæ¯cfqå¨è®¾è®¡ä¸ä¸ºäºå°½éå并è¿ç»çIO请æ±ä»¥è¾¾å°æé«ååéçç®çèå å ¥çæºå¶ï¼æ们å¯ä»¥ç解为æ¯ä¸ç§â空转âçå¾ æºå¶ã
空转æ¯æï¼å½ä¸ä¸ªéåå¤çä¸ä¸ªè¯·æ±ç»æåï¼ä¼å¨åçè°åº¦ä¹å空çä¸å°ä¼æ¶é´ï¼å¦æä¸ä¸ä¸ªè¯·æ±å°æ¥ï¼åå¯ä»¥åå°ç£å¤´å¯»åï¼ç»§ç»å¤ç顺åºçIO请æ±ã
为äºå®ç°è¿ä¸ªåè½ï¼cfqå¨service_treeè¿å±æ°æ®ç»æè¿å®ç°äºSYNCéåï¼å¦æ请æ±æ¯åæ¥é¡ºåºè¯·æ±ï¼å°±å ¥éè¿ä¸ªservice treeï¼å¦æ请æ±æ¯åæ¥éæºè¯·æ±ï¼åå ¥éSYNC_NOIDLEéåï¼ä»¥å¤æä¸ä¸ä¸ªè¯·æ±æ¯å¦æ¯é¡ºåºè¯·æ±ã
ææçå¼æ¥åæä½è¯·æ±å°å ¥éASYNCçservice treeï¼å¹¶ä¸é对è¿ä¸ªéå没æ空转çå¾ æºå¶ã
æ¤å¤ï¼cfqè¿å¯¹SSDè¿æ ·ç硬çæç¹æ®è°æ´ï¼å½cfqåç°åå¨è®¾å¤æ¯ä¸ä¸ªssd硬çè¿æ ·çéå深度æ´å¤§ç设å¤æ¶ï¼ææé对åç¬éåç空转é½å°ä¸çæï¼ææçIO请æ±é½å°å ¥éSYNC_NOIDLEè¿ä¸ªservice treeã
æ¯ä¸ä¸ªservice treeé½å¯¹åºäºè¥å¹²ä¸ªcfq_queueéåï¼æ¯ä¸ªcfq_queueéå对åºä¸ä¸ªè¿ç¨ï¼è¿ä¸ªæ们åç»å详ç»è¯´æã
cfq_groupè¿ç»´æ¤äºä¸ä¸ªå¨cgroupå é¨ææè¿ç¨å ¬ç¨çå¼æ¥IO请æ±éåï¼å ¶ç»æå¦ä¸ï¼
å¼æ¥è¯·æ±ä¹åæäºRTãBEãIDLEè¿ä¸ç±»è¿è¡å¤çï¼æ¯ä¸ç±»å¯¹åºä¸ä¸ªcfq_queueè¿è¡æéã
BEåRTä¹å®ç°äºä¼å 级çæ¯æï¼æ¯ä¸ä¸ªç±»åæIOPRIO_BE_NRè¿ä¹å¤ä¸ªä¼å 级ï¼è¿ä¸ªå¼å®ä¹ä¸º8ï¼æ°ç»ä¸æ 为0-7ã
æ们ç®ååæçå æ ¸ä»£ç çæ¬ä¸ºLinux 4.4ï¼å¯ä»¥çåºï¼ä»cfqçè§åº¦æ¥è¯´ï¼å·²ç»å¯ä»¥å®ç°å¼æ¥IOçcgroupæ¯æäºï¼æ们éè¦å®ä¹ä¸ä¸è¿éæè°å¼æ¥IOçå«ä¹ï¼å®ä» ä» è¡¨ç¤ºä»å åçbuffer/cacheä¸çæ°æ®åæ¥å°ç¡¬ççIO请æ±ï¼èä¸æ¯aio(man 7 aio)æè linuxçnativeå¼æ¥io以ålibaioæºå¶ï¼å®é ä¸è¿äºæè°çâå¼æ¥âIOæºå¶ï¼å¨å æ ¸ä¸é½æ¯åæ¥å®ç°çï¼æ¬è´¨ä¸å¯è¯ºä¼æ¼è®¡ç®æºæ²¡æçæ£çâå¼æ¥âæºå¶ï¼ã
æ们å¨ä¸é¢å·²ç»è¯´æè¿ï¼ç±äºè¿ç¨æ£å¸¸æ åµä¸é½æ¯å°æ°æ®å åå ¥buffer/cacheï¼æ以è¿ç§å¼æ¥IOé½æ¯ç»ä¸ç±cfq_groupä¸çasync请æ±éåå¤ççã
é£ä¹ä¸ºä»ä¹å¨ä¸é¢çservice_treeä¸è¿è¦å®ç°åä¸ä¸ªASYNCçç±»åå¢ï¼
è¿å½ç¶æ¯ä¸ºäºæ¯æåºåè¿ç¨çå¼æ¥IO并使ä¹å¯ä»¥âå®å ¨å ¬å¹³âååå¤å½ã
å®é ä¸å¨ææ°çcgroup v2çblkioä½ç³»ä¸ï¼å æ ¸å·²ç»æ¯æäºé对buffer IOçcgroupééæ¯æï¼è以ä¸è¿äºå¯è½å®¹ææ··æ·çä¸å ç±»åï¼é½æ¯å¨æ°çä½ç³»ä¸éè¦ç¨å°çç±»åæ è®°ã
æ°ä½ç³»çå¤æ度æ´é«äºï¼åè½ä¹æ´å 强大ï¼ä½æ¯å¤§å®¶å ä¸è¦çæ¥ï¼æ£å¼çcgroup v2ä½ç³»ï¼å¨Linux 4.5åå¸çæ¶åä¼æ£å¼è·å¤§å®¶è§é¢ã
æ们继ç»éæ©service_treeçè¿ç¨ï¼ä¸ç§ä¼å 级类åçservice_treeçéæ©å°±æ¯æ ¹æ®ç±»åçä¼å 级æ¥åéæ©çï¼RTä¼å 级æé«ï¼BEå ¶æ¬¡ï¼IDLEæä½ãå°±æ¯è¯´ï¼RTéæï¼å°±ä¼ä¸ç´å¤çRTï¼RT没äºåå¤çBEã
æ¯ä¸ªservice_tree对åºä¸ä¸ªå ç´ ä¸ºcfq_queueæéç红é»æ ï¼èæ¯ä¸ªcfq_queueå°±æ¯å æ ¸ä¸ºè¿ç¨ï¼çº¿ç¨ï¼å建ç请æ±éåã
æ¯ä¸ä¸ªcfq_queueé½ä¼ç»´æ¤ä¸ä¸ªrb_keyçåéï¼è¿ä¸ªåéå®é ä¸å°±æ¯è¿ä¸ªéåçIOæå¡æ¶é´ï¼service timeï¼ã
è¿éè¿æ¯éè¿çº¢é»æ æ¾å°service timeæ¶é´æççé£ä¸ªcfq_queueè¿è¡æå¡ï¼ä»¥ä¿è¯âå®å ¨å ¬å¹³âã
éæ©å¥½äºcfq_queueä¹åï¼å°±è¦å¼å§å¤çè¿ä¸ªéåéçIO请æ±äºãè¿éçè°åº¦æ¹å¼åºæ¬è·deadline类似ã
cfq_queueä¼å¯¹è¿å ¥éåçæ¯ä¸ä¸ªè¯·æ±è¿è¡ä¸¤æ¬¡å ¥éï¼ä¸ä¸ªæ¾è¿fifoä¸ï¼å¦ä¸ä¸ªæ¾è¿æ访é®æåºé¡ºåºä½ä¸ºkeyç红é»æ ä¸ã
é»è®¤ä»çº¢é»æ ä¸å请æ±è¿è¡å¤çï¼å½è¯·æ±ç延æ¶æ¶é´è¾¾å°deadlineæ¶ï¼å°±ä»çº¢é»æ ä¸åçå¾ æ¶é´æé¿çè¿è¡å¤çï¼ä»¥ä¿è¯è¯·æ±ä¸è¢«é¥¿æ»ã
è¿å°±æ¯æ´ä¸ªcfqçè°åº¦æµç¨ï¼å½ç¶å ¶ä¸è¿æå¾å¤ç»ææ«è没æ交代ï¼æ¯å¦å并å¤ç以å顺åºå¤çççã
1.2 cfqçåæ°è°æ´
ç解æ´ä¸ªè°åº¦æµç¨æå©äºæ们å³çå¦ä½è°æ´cfqçç¸å ³åæ°ãææcfqçå¯è°åæ°é½å¯ä»¥å¨/sys/class/block/sda/queue/iosched/ç®å½ä¸æ¾å°ï¼å½ç¶ï¼å¨ä½ çç³»ç»ä¸ï¼è¯·å°sdaæ¿æ¢ä¸ºç¸åºçç£çå称ãæ们æ¥çä¸ä¸é½æä»ä¹ï¼
è¿äºåæ°é¨åæ¯è·æºæ¢°ç¡¬çç£å¤´å¯»éæ¹å¼æå ³çï¼å¦æå ¶è¯´æä½ çä¸æï¼è¯·å è¡¥å ç¸å ³ç¥è¯ï¼
back_seek_max:ç£å¤´å¯ä»¥åå寻åçæ大èå´ï¼é»è®¤å¼ä¸º16Mã
back_seek_penalty:åå寻åçæ©ç½ç³»æ°ãè¿ä¸ªå¼æ¯è·åå寻åè¿è¡æ¯è¾çã
以ä¸ä¸¤ä¸ªæ¯ä¸ºäºé²æ¢ç£å¤´å¯»éåçæå¨è导è´å¯»åè¿æ ¢è设置çãåºæ¬æè·¯æ¯è¿æ ·ï¼ä¸ä¸ªio请æ±å°æ¥çæ¶åï¼cfqä¼æ ¹æ®å ¶å¯»åä½ç½®é¢ä¼°ä¸ä¸å ¶ç£å¤´å¯»éææ¬ã
设置ä¸ä¸ªæ大å¼back_seek_maxï¼å¯¹äºè¯·æ±æ访é®çæåºå·å¨ç£å¤´åæ¹ç请æ±ï¼åªè¦å¯»åèå´æ²¡æè¶ è¿è¿ä¸ªå¼ï¼cfqä¼ååå寻åç请æ±ä¸æ ·å¤çå®ã
å设置ä¸ä¸ªè¯ä¼°ææ¬çç³»æ°back_seek_penaltyï¼ç¸å¯¹äºç£å¤´åå寻åï¼åå寻åçè·ç¦»ä¸º1/2(1/back_seek_penalty)æ¶ï¼cfq认为è¿ä¸¤ä¸ªè¯·æ±å¯»åç代价æ¯ç¸åã
è¿ä¸¤ä¸ªåæ°å®é ä¸æ¯cfqå¤æ请æ±å并å¤ççæ¡ä»¶éå¶ï¼å¡äºå¤åè¿ä¸ªæ¡ä»¶ç请æ±ï¼é½ä¼å°½éå¨æ¬æ¬¡è¯·æ±å¤ççæ¶åä¸èµ·å并å¤çã
fifo_expire_async:设置å¼æ¥è¯·æ±çè¶ æ¶æ¶é´ã
åæ¥è¯·æ±åå¼æ¥è¯·æ±æ¯åºåä¸åéåå¤ççï¼cfqå¨è°åº¦çæ¶åä¸è¬æ åµé½ä¼ä¼å å¤çåæ¥è¯·æ±ï¼ä¹ååå¤çå¼æ¥è¯·æ±ï¼é¤éå¼æ¥è¯·æ±ç¬¦åä¸è¿°å并å¤ççæ¡ä»¶éå¶èå´å ã
å½æ¬è¿ç¨çéå被è°åº¦æ¶ï¼cfqä¼ä¼å æ£æ¥æ¯å¦æå¼æ¥è¯·æ±è¶ æ¶ï¼å°±æ¯è¶ è¿fifo_expire_asyncåæ°çéå¶ãå¦ææï¼åä¼å åéä¸ä¸ªè¶ æ¶ç请æ±ï¼å ¶ä½è¯·æ±ä»ç¶æç §ä¼å 级以åæåºç¼å·å¤§å°æ¥å¤çã
fifo_expire_sync:è¿ä¸ªåæ°è·ä¸é¢ç类似ï¼åºå«æ¯ç¨æ¥è®¾ç½®åæ¥è¯·æ±çè¶ æ¶æ¶é´ã
slice_idle:åæ°è®¾ç½®äºä¸ä¸ªçå¾ æ¶é´ãè¿è®©cfqå¨åæ¢cfq_queueæservice treeçæ¶åçå¾ ä¸æ®µæ¶é´ï¼ç®çæ¯æé«æºæ¢°ç¡¬ççååéã
ä¸è¬æ åµä¸ï¼æ¥èªåä¸ä¸ªcfq_queueæè service treeçIO请æ±ç寻åå±é¨æ§æ´å¥½ï¼æ以è¿æ ·å¯ä»¥åå°ç£çç寻å次æ°ãè¿ä¸ªå¼å¨æºæ¢°ç¡¬çä¸é»è®¤ä¸ºéé¶ã
å½ç¶å¨åºæ硬çæè 硬RAID设å¤ä¸è®¾ç½®è¿ä¸ªå¼ä¸ºéé¶ä¼éä½åå¨çæçï¼å 为åºæ硬ç没æç£å¤´å¯»åè¿ä¸ªæ¦å¿µï¼æ以å¨è¿æ ·ç设å¤ä¸åºè¯¥è®¾ç½®ä¸º0ï¼å ³éæ¤åè½ã
group_idle:è¿ä¸ªåæ°ä¹è·ä¸ä¸ä¸ªåæ°ç±»ä¼¼ï¼åºå«æ¯å½cfqè¦åæ¢cfq_groupçæ¶åä¼çå¾ ä¸æ®µæ¶é´ã
å¨cgroupçåºæ¯ä¸ï¼å¦ææ们沿ç¨slice_idleçæ¹å¼ï¼é£ä¹ç©ºè½¬çå¾ å¯è½ä¼å¨cgroupç»å æ¯ä¸ªè¿ç¨çcfq_queueåæ¢æ¶åçã
è¿æ ·ä¼å¦æè¿ä¸ªè¿ç¨ä¸ç´æ请æ±è¦å¤ççè¯ï¼é£ä¹ç´å°è¿ä¸ªcgroupçé é¢è¢«èå°½ï¼åç»ä¸çå ¶å®è¿ç¨ä¹å¯è½æ æ³è¢«è°åº¦å°ãè¿æ ·ä¼å¯¼è´åç»ä¸çå ¶å®è¿ç¨é¥¿æ»è产çIOæ§è½ç¶é¢ã
å¨è¿ç§æ åµä¸ï¼æ们å¯ä»¥å°slice_idle ï¼ 0ègroup_idle ï¼ 8ãè¿æ ·ç©ºè½¬çå¾ å°±æ¯ä»¥cgroup为åä½è¿è¡çï¼èä¸æ¯ä»¥cfq_queueçè¿ç¨ä¸ºåä½è¿è¡ï¼ä»¥é²æ¢ä¸è¿°é®é¢äº§çã
low_latency:è¿ä¸ªæ¯ç¨æ¥å¼å¯æå ³écfqçä½å»¶æ¶ï¼low latencyï¼æ¨¡å¼çå¼å ³ã
å½è¿ä¸ªå¼å ³æå¼æ¶ï¼cfqå°ä¼æ ¹æ®target_latencyçåæ°è®¾ç½®æ¥å¯¹æ¯ä¸ä¸ªè¿ç¨çåçæ¶é´ï¼slice timeï¼è¿è¡éæ°è®¡ç®ã
è¿å°æå©äºå¯¹ååéçå ¬å¹³ï¼é»è®¤æ¯å¯¹æ¶é´çåé çå ¬å¹³ï¼ã
å ³éè¿ä¸ªåæ°ï¼è®¾ç½®ä¸º0ï¼å°å¿½ç¥target_latencyçå¼ãè¿å°ä½¿ç³»ç»ä¸çè¿ç¨å®å ¨æç §æ¶é´çæ¹å¼è¿è¡IOèµæºåé ãè¿ä¸ªå¼å ³é»è®¤æ¯æå¼çã
æ们已ç»ç¥écfq设计ä¸æâ空转âï¼idlingï¼è¿ä¸ªæ¦å¿µï¼ç®çæ¯ä¸ºäºå¯ä»¥è®©è¿ç»ç读åæä½å°½å¯è½å¤çå并å¤çï¼åå°ç£å¤´ç寻åæä½ä»¥ä¾¿å¢å¤§ååéã
å¦ææè¿ç¨æ»æ¯å¾å¿«çè¿è¡é¡ºåºè¯»åï¼é£ä¹å®å°å 为cfqç空转çå¾ å½ä¸çå¾é«è导è´å ¶å®éè¦å¤çIOçè¿ç¨ååºé度ä¸éï¼å¦æå¦ä¸ä¸ªéè¦è°åº¦çè¿ç¨ä¸ä¼ååºå¤§é顺åºIOè¡ä¸ºçè¯ï¼ç³»ç»ä¸ä¸åè¿ç¨IOååéç表ç°å°±ä¼å¾ä¸åè¡¡ã
å°±æ¯å¦ï¼ç³»ç»å åçcacheä¸æå¾å¤è页è¦ååæ¶ï¼æ¡é¢åè¦æå¼ä¸ä¸ªæµè§å¨è¿è¡æä½ï¼è¿æ¶è页ååçåå°è¡ä¸ºå°±å¾å¯è½ä¼å¤§éå½ä¸ç©ºè½¬æ¶é´ï¼è导è´æµè§å¨çå°éIOä¸ç´çå¾ ï¼è®©ç¨æ·æè§æµè§å¨è¿è¡ååºé度åæ ¢ã
è¿ä¸ªlow_latency主è¦æ¯å¯¹è¿ç§æ åµè¿è¡ä¼åçé项ï¼å½å ¶æå¼æ¶ï¼ç³»ç»ä¼æ ¹æ®target_latencyçé 置对å 为å½ä¸ç©ºè½¬è大éå ç¨IOååéçè¿ç¨è¿è¡éå¶ï¼ä»¥è¾¾å°ä¸åè¿ç¨IOå ç¨çååéçç¸å¯¹åè¡¡ãè¿ä¸ªå¼å ³æ¯è¾åéå¨ç±»ä¼¼æ¡é¢åºç¨çåºæ¯ä¸æå¼ã
target_latency:å½low_latencyçå¼ä¸ºå¼å¯ç¶ææ¶ï¼cfqå°æ ¹æ®è¿ä¸ªå¼éæ°è®¡ç®æ¯ä¸ªè¿ç¨åé çIOæ¶é´çé¿åº¦ã
quantum:è¿ä¸ªåæ°ç¨æ¥è®¾ç½®æ¯æ¬¡ä»cfq_queueä¸å¤çå¤å°ä¸ªIO请æ±ãå¨ä¸ä¸ªéåå¤çäºä»¶å¨æä¸ï¼è¶ è¿è¿ä¸ªæ°åçIO请æ±å°ä¸ä¼è¢«å¤çãè¿ä¸ªåæ°åªå¯¹åæ¥ç请æ±ææã
slice_sync:å½ä¸ä¸ªcfq_queueéå被è°åº¦å¤çæ¶ï¼å®å¯ä»¥è¢«åé çå¤çæ»æ¶é´æ¯éè¿è¿ä¸ªå¼æ¥ä½ä¸ºä¸ä¸ªè®¡ç®åæ°æå®çãå ¬å¼ä¸ºï¼time_slice = slice_sync + (slice_sync/5 * (4 - prio))ãè¿ä¸ªåæ°å¯¹åæ¥è¯·æ±ææã
slice_async:è¿ä¸ªå¼è·ä¸ä¸ä¸ªç±»ä¼¼ï¼åºå«æ¯å¯¹å¼æ¥è¯·æ±ææã
slice_async_rq:è¿ä¸ªåæ°ç¨æ¥éå¶å¨ä¸ä¸ªsliceçæ¶é´èå´å ï¼ä¸ä¸ªéåæå¤å¯ä»¥å¤ççå¼æ¥è¯·æ±ä¸ªæ°ã请æ±è¢«å¤ççæ大个æ°è¿è·ç¸å ³è¿ç¨è¢«è®¾ç½®çioä¼å 级æå ³ã
1.3 cfqçIOPS模å¼
æ们已ç»ç¥éï¼é»è®¤æ åµä¸cfqæ¯ä»¥æ¶é´çæ¹å¼æ¯æç带ä¼å 级çè°åº¦æ¥ä¿è¯IOèµæºå ç¨çå ¬å¹³ã
é«ä¼å 级çè¿ç¨å°å¾å°æ´å¤çæ¶é´çé¿åº¦ï¼èä½ä¼å 级çè¿ç¨æ¶é´çç¸å¯¹è¾å°ã
å½æ们çåå¨æ¯ä¸ä¸ªé«é并ä¸æ¯æNCQï¼åçæ令éåï¼ç设å¤çæ¶åï¼æ们æ好å¯ä»¥è®©å ¶å¯ä»¥ä»å¤ä¸ªcfqéåä¸å¤çå¤è·¯ç请æ±ï¼ä»¥ä¾¿æåNCQçå©ç¨çã
æ¤æ¶ä½¿ç¨æ¶é´ççåé æ¹å¼åé èµæºå°±æ¾å¾ä¸åæ¶å®äºï¼å 为åºäºæ¶é´ççåé ï¼åä¸æ¶å»æå¤è½å¤çç请æ±éååªæä¸ä¸ªã
è¿æ¶ï¼æ们éè¦åæ¢cfqç模å¼ä¸ºIOPS模å¼ãåæ¢æ¹å¼å¾ç®åï¼å°±æ¯å°slice_idle=0å³å¯ãå æ ¸ä¼èªå¨æ£æµä½ çåå¨è®¾å¤æ¯å¦æ¯æNCQï¼å¦ææ¯æçè¯cfqä¼èªå¨åæ¢ä¸ºIOPS模å¼ã
å¦å¤ï¼å¨é»è®¤çåºäºä¼å 级çæ¶é´çæ¹å¼ä¸ï¼æ们å¯ä»¥ä½¿ç¨ioniceå½ä»¤æ¥è°æ´è¿ç¨çIOä¼å 级ãè¿ç¨é»è®¤åé çIOä¼å 级æ¯æ ¹æ®è¿ç¨çniceå¼è®¡ç®èæ¥çï¼è®¡ç®æ¹æ³å¯ä»¥å¨man ioniceä¸çå°ï¼è¿éä¸ååºè¯ã
2ãdeadlineï¼æç»æéè°åº¦
deadlineè°åº¦ç®æ³ç¸å¯¹cfqè¦ç®åå¾å¤ãå ¶è®¾è®¡ç®æ æ¯ï¼
å¨ä¿è¯è¯·æ±æç §è®¾å¤æåºç顺åºè¿è¡è®¿é®çåæ¶ï¼å ¼é¡¾å ¶å®è¯·æ±ä¸è¢«é¥¿æ»ï¼è¦å¨ä¸ä¸ªæç»æéå被è°åº¦å°ã
æ们ç¥éç£å¤´å¯¹ç£çç寻éæ¯å¯ä»¥è¿è¡é¡ºåºè®¿é®åéæºè®¿é®çï¼å 为寻é延æ¶æ¶é´çå ³ç³»ï¼é¡ºåºè®¿é®æ¶IOçååéæ´å¤§ï¼éæºè®¿é®çååéå°ã
å¦ææ们æ³ä¸ºä¸ä¸ªæºæ¢°ç¡¬çè¿è¡ååéä¼åçè¯ï¼é£ä¹å°±å¯ä»¥è®©è°åº¦å¨æç §å°½éå¤å顺åºè®¿é®çIO请æ±è¿è¡æåºï¼ä¹å请æ±ä»¥è¿æ ·ç顺åºåéç»ç¡¬çï¼å°±å¯ä»¥ä½¿IOçååéæ´å¤§ã
ä½æ¯è¿æ ·åä¹æå¦ä¸ä¸ªé®é¢ï¼å°±æ¯å¦ææ¤æ¶åºç°äºä¸ä¸ªè¯·æ±ï¼å®è¦è®¿é®çç£é离ç®åç£å¤´æå¨ç£éå¾è¿ï¼åºç¨ç请æ±å大ééä¸å¨ç®åç£ééè¿ã
导è´å¤§é请æ±ä¸ç´ä¼è¢«å并åæéå¤çï¼èé£ä¸ªè¦è®¿é®æ¯è¾è¿ç£éç请æ±å°å 为ä¸ç´ä¸è½è¢«è°åº¦è饿æ»ã
deadlineå°±æ¯è¿æ ·ä¸ç§è°åº¦å¨ï¼è½å¨ä¿è¯IOæ大ååéçæ åµä¸ï¼å°½é使è¿ç«¯è¯·æ±å¨ä¸ä¸ªæéå 被è°åº¦èä¸è¢«é¥¿æ»çè°åº¦å¨ã
IOè°åº¦åçå¨Linuxå æ ¸çIOè°åº¦å±ãè¿ä¸ªå±æ¬¡æ¯é对Linuxçæ´ä½IOå±æ¬¡ä½ç³»æ¥è¯´çãä»read()æè write()ç³»ç»è°ç¨çè§åº¦æ¥è¯´ï¼Linuxæ´ä½IOä½ç³»å¯ä»¥å为ä¸å±ï¼å®ä»¬åå«æ¯ï¼
VFSå±ï¼ èææ件系ç»å±ãç±äºå æ ¸è¦è·å¤ç§æ件系ç»æ交éï¼èæ¯ä¸ç§æ件系ç»æå®ç°çæ°æ®ç»æåç¸å ³æ¹æ³é½å¯è½ä¸å°½ç¸åï¼æ以ï¼å æ ¸æ½è±¡äºè¿ä¸å±ï¼ä¸é¨ç¨æ¥éé åç§æ件系ç»ï¼å¹¶å¯¹å¤æä¾ç»ä¸æä½æ¥å£ã
æ件系ç»å±ï¼ ä¸åçæ件系ç»å®ç°èªå·±çæä½è¿ç¨ï¼æä¾èªå·±ç¹æçç¹å¾ï¼å ·ä½ä¸å¤è¯´äºï¼å¤§å®¶æ¿æçè¯èªå·±å»ç代ç å³å¯ã
页ç¼åå±ï¼ è´è´£ç对pageçç¼åã
éç¨åå±ï¼ ç±äºç»å¤§å¤æ°æ åµçioæä½æ¯è·å设å¤æ交éï¼æ以Linuxå¨æ¤æä¾äºä¸ä¸ªç±»ä¼¼vfså±çå设å¤æä½æ½è±¡å±ãä¸å±å¯¹æ¥åç§ä¸åå±æ§çå设å¤ï¼å¯¹ä¸æä¾ç»ä¸çBlock IO请æ±æ åã
IOè°åº¦å± ï¼å 为ç»å¤§å¤æ°çå设å¤é½æ¯ç±»ä¼¼ç£çè¿æ ·ç设å¤ï¼æ以æå¿ è¦æ ¹æ®è¿ç±»è®¾å¤çç¹ç¹ä»¥ååºç¨çä¸åç¹ç¹æ¥è®¾ç½®ä¸äºä¸åçè°åº¦ç®æ³åéåã以便å¨ä¸åçåºç¨ç¯å¢ä¸æé对æ§çæé«ç£çç读åæçï¼è¿éå°±æ¯å¤§åé¼é¼çLinuxçµæ¢¯æèµ·ä½ç¨çå°æ¹ãé对æºæ¢°ç¡¬ççåç§è°åº¦æ¹æ³å°±æ¯å¨è¿å®ç°çã
å设å¤é©±å¨å±ï¼ 驱å¨å±å¯¹å¤æä¾ç¸å¯¹æ¯è¾é«çº§ç设å¤æä½æ¥å£ï¼å¾å¾æ¯Cè¯è¨çï¼èä¸å±å¯¹æ¥è®¾å¤æ¬èº«çæä½æ¹æ³åè§èã
å设å¤å±ï¼ è¿å±å°±æ¯å ·ä½çç©ç设å¤äºï¼å®ä¹äºåç§ç对设å¤æä½æ¹æ³åè§èã
æä¸ä¸ªå·²ç»æ´ç好ç[Linux IOç»æå¾]ï¼é常ç»å ¸ï¼ä¸å¾èåè¨ï¼
æ们ä»å¤©è¦ç 究çå 容主è¦å¨IOè°åº¦è¿ä¸å±ã
å®è¦è§£å³çæ ¸å¿é®é¢æ¯ï¼å¦ä½æé«å设å¤IOçæ´ä½æ§è½ï¼è¿ä¸å±ä¹ä¸»è¦æ¯é对æºæ¢°ç¡¬çç»æè设计çã
ä¼æå¨ç¥ï¼æºæ¢°ç¡¬ççåå¨ä»è´¨æ¯ç£çï¼ç£å¤´å¨ççä¸ç§»å¨è¿è¡ç£é寻åï¼è¡ä¸ºç±»ä¼¼ææ¾ä¸å¼ å±çã
è¿ç§ç»æçç¹ç¹æ¯ï¼é¡ºåºè®¿é®æ¶ååéè¾é«ï¼ä½æ¯å¦æä¸æ¦å¯¹ççæéæºè®¿é®ï¼é£ä¹å¤§éçæ¶é´é½ä¼æµªè´¹å¨ç£å¤´ç移å¨ä¸ï¼è¿æ¶åå°±ä¼å¯¼è´æ¯æ¬¡IOçååºæ¶é´åé¿ï¼æ大çéä½IOçååºé度ã
ç£å¤´å¨ççä¸å¯»éçæä½ï¼ç±»ä¼¼çµæ¢¯è°åº¦ï¼å®é ä¸å¨æå¼å§çæ¶æï¼Linuxæè¿ä¸ªç®æ³å½å为Linuxçµæ¢¯ç®æ³ï¼å³ï¼
å¦æå¨å¯»éçè¿ç¨ä¸ï¼è½æ顺åºè·¯è¿çç¸å ³ç£éçæ°æ®è¯·æ±é½â顺便âå¤çæï¼é£ä¹å°±å¯ä»¥å¨æ¯è¾å°å½±åååºé度çåæä¸ï¼æé«æ´ä½IOçååéã
è¿å°±æ¯æ们为ä»ä¹è¦è®¾è®¡IOè°åº¦ç®æ³çåå ã
ç®åå¨å æ ¸ä¸é»è®¤å¼å¯äºä¸ç§ç®æ³/模å¼ï¼noopï¼cfqådeadlineãä¸¥æ ¼ç®åºè¯¥æ¯ä¸¤ç§ï¼
å 为第ä¸ç§å«ånoopï¼å°±æ¯ç©ºæä½è°åº¦ç®æ³ï¼ä¹å°±æ¯æ²¡æä»»ä½è°åº¦æä½ï¼å¹¶ä¸å¯¹io请æ±è¿è¡æåºï¼ä» ä» åéå½çioå并çä¸ä¸ªfifoéåã
ç®åå æ ¸ä¸é»è®¤çè°åº¦ç®æ³åºè¯¥æ¯cfqï¼å«åå®å ¨å ¬å¹³éåè°åº¦ãè¿ä¸ªè°åº¦ç®æ³äººå¦å ¶åï¼å®è¯å¾ç»ææè¿ç¨æä¾ä¸ä¸ªå®å ¨å ¬å¹³çIOæä½ç¯å¢ã
注ï¼è¯·å¤§å®¶ä¸å®è®°ä½è¿ä¸ªè¯è¯ï¼cfqï¼å®å ¨å ¬å¹³éåè°åº¦ï¼ä¸ç¶ä¸æ就没æ³çäºã
cfq为æ¯ä¸ªè¿ç¨å建ä¸ä¸ªåæ¥IOè°åº¦éåï¼å¹¶é»è®¤ä»¥æ¶é´çå请æ±æ°éå®çæ¹å¼åé IOèµæºï¼ä»¥æ¤ä¿è¯æ¯ä¸ªè¿ç¨çIOèµæºå ç¨æ¯å ¬å¹³çï¼cfqè¿å®ç°äºé对è¿ç¨çº§å«çä¼å 级è°åº¦ï¼è¿ä¸ªæ们åé¢ä¼è¯¦ç»è§£éã
æ¥çåä¿®æ¹IOè°åº¦ç®æ³çæ¹æ³æ¯ï¼
cfqæ¯éç¨æå¡å¨æ¯è¾å¥½çIOè°åº¦ç®æ³éæ©ï¼å¯¹æ¡é¢ç¨æ·ä¹æ¯æ¯è¾å¥½çéæ©ã
ä½æ¯å¯¹äºå¾å¤IOååè¾å¤§çåºæ¯å°±å¹¶ä¸æ¯å¾éåºï¼å°¤å ¶æ¯IOååéä¸å¨æäºè¿ç¨ä¸çåºæ¯ã
å 为è¿ç§åºæ¯æ们éè¦æ´å¤ç满足æ个æè æå 个è¿ç¨çIOååºé度ï¼èä¸æ¯è®©ææçè¿ç¨å ¬å¹³ç使ç¨IOï¼æ¯å¦æ°æ®åºåºç¨ã
deadlineè°åº¦ï¼æç»æéè°åº¦ï¼å°±æ¯æ´éåä¸è¿°åºæ¯ç解å³æ¹æ¡ãdeadlineå®ç°äºå个éåï¼
å ¶ä¸ä¸¤ä¸ªåå«å¤çæ£å¸¸readåwriteï¼ææåºå·æåºï¼è¿è¡æ£å¸¸ioçå并å¤ç以æé«ååéãå 为IO请æ±å¯è½ä¼éä¸å¨æäºç£çä½ç½®ï¼è¿æ ·ä¼å¯¼è´æ°æ¥ç请æ±ä¸ç´è¢«å并ï¼å¯è½ä¼æå ¶ä»ç£çä½ç½®çio请æ±è¢«é¥¿æ»ã
å¦å¤ä¸¤ä¸ªå¤çè¶ æ¶readåwriteçéåï¼æ请æ±å建æ¶é´æåºï¼å¦ææè¶ æ¶ç请æ±åºç°ï¼å°±æ¾è¿è¿ä¸¤ä¸ªéåï¼è°åº¦ç®æ³ä¿è¯è¶ æ¶ï¼è¾¾å°æç»æéæ¶é´ï¼çéåä¸ç请æ±ä¼ä¼å 被å¤çï¼é²æ¢è¯·æ±è¢«é¥¿æ»ã
ä¸ä¹ åï¼å æ ¸è¿æ¯é»è®¤æ é åç§ç®æ³ï¼è¿æä¸ç§å«åasçç®æ³ï¼Anticipatory schedulerï¼ï¼é¢æµè°åº¦ç®æ³ãä¸ä¸ªé«å¤§ä¸çååï¼æå¾æä¸åº¦è®¤ä¸ºLinuxå æ ¸é½ä¼ç®å½äºã
ç»æåç°ï¼æ éæ¯å¨åºäºdeadlineç®æ³åioè°åº¦çä¹åçä¸å°ä¼æ¶é´ï¼å¦æè¿æ®µæ¶é´å æå¯ä»¥å并çio请æ±å°æ¥ï¼å°±å¯ä»¥å并å¤çï¼æé«deadlineè°åº¦çå¨é¡ºåºè¯»åæ åµä¸çæ°æ®ååéã
å ¶å®è¿æ ¹æ¬ä¸æ¯å¥é¢æµï¼æè§å¾ä¸å¦å«æ大è¿è°åº¦ç®æ³ï¼å½ç¶è¿ç§çç¥å¨æäºç¹å®åºæ¯å·®ææä¸éã
ä½æ¯å¨å¤§å¤æ°åºæ¯ä¸ï¼è¿ä¸ªè°åº¦ä¸ä» 没ææé«ååéï¼è¿éä½äºååºé度ï¼æ以å æ ¸å¹²èæå®ä»é»è®¤é ç½®éå é¤äºãæ¯ç«Linuxçå®æ¨æ¯å®ç¨ï¼èæ们ä¹å°±ä¸åè¿ä¸ªè°åº¦ç®æ³ä¸å¤è´¹å£èäºã
1ãcfqï¼å®å ¨å ¬å¹³éåè°åº¦
cfqæ¯å æ ¸é»è®¤éæ©çIOè°åº¦éåï¼å®å¨æ¡é¢åºç¨åºæ¯ä»¥å大å¤æ°å¸¸è§åºç¨åºæ¯ä¸é½æ¯å¾å¥½çéæ©ã
å¦ä½å®ç°ä¸ä¸ªæè°çå®å ¨å ¬å¹³éåï¼Completely Fair Queueingï¼ï¼
é¦å æ们è¦ç解æè°çå ¬å¹³æ¯å¯¹è°çå ¬å¹³ï¼ä»æä½ç³»ç»çè§åº¦æ¥è¯´ï¼äº§çæä½è¡ä¸ºç主ä½é½æ¯è¿ç¨ï¼æ以è¿éçå ¬å¹³æ¯é对æ¯ä¸ªè¿ç¨èè¨çï¼æ们è¦è¯å¾è®©è¿ç¨å¯ä»¥å ¬å¹³çå ç¨IOèµæºã
é£ä¹å¦ä½è®©è¿ç¨å ¬å¹³çå ç¨IOèµæºï¼æ们éè¦å ç解ä»ä¹æ¯IOèµæºãå½æ们衡éä¸ä¸ªIOèµæºçæ¶åï¼ä¸è¬å欢ç¨çæ¯ä¸¤ä¸ªåä½ï¼ä¸ä¸ªæ¯æ°æ®è¯»åç带宽ï¼å¦ä¸ä¸ªæ¯æ°æ®è¯»åçIOPSã
带宽就æ¯ä»¥æ¶é´ä¸ºåä½ç读åæ°æ®éï¼æ¯å¦ï¼100Mbyte/sãèIOPSæ¯ä»¥æ¶é´ä¸ºåä½ç读å次æ°ãå¨ä¸åç读åæ å¢ä¸ï¼è¿ä¸¤ä¸ªåä½ç表ç°å¯è½ä¸ä¸æ ·ï¼ä½æ¯å¯ä»¥ç¡®å®çæ¯ï¼ä¸¤ä¸ªåä½çä»»ä½ä¸ä¸ªè¾¾å°äºæ§è½ä¸éï¼é½ä¼æ为IOçç¶é¢ã
ä»æºæ¢°ç¡¬ççç»æèèï¼å¦æ读åæ¯é¡ºåºè¯»åï¼é£ä¹IOç表ç°æ¯å¯ä»¥éè¿æ¯è¾å°çIOPSè¾¾å°è¾å¤§ç带宽ï¼å 为å¯ä»¥å并å¾å¤IOï¼ä¹å¯ä»¥éè¿é¢è¯»çæ¹å¼å éæ°æ®è¯»åæçã
å½IOç表ç°æ¯ååäºéæºè¯»åçæ¶åï¼é£ä¹IOPSå°±ä¼åå¾æ´å¤§ï¼IOç请æ±çå并å¯è½æ§ä¸éï¼å½æ¯æ¬¡io请æ±æ°æ®è¶å°çæ¶åï¼å¸¦å®½è¡¨ç°å°±ä¼è¶ä½ã
ä»è¿éæ们å¯ä»¥ç解ï¼é对è¿ç¨çIOèµæºç主è¦è¡¨ç°å½¢å¼æä¸¤ä¸ªï¼ è¿ç¨å¨åä½æ¶é´å æ交çIO请æ±ä¸ªæ°åè¿ç¨å ç¨IOç带宽ã
å ¶å®æ 论åªä¸ªï¼é½æ¯è·è¿ç¨åé çIOå¤çæ¶é´é¿åº¦ç´§å¯ç¸å ³çã
ææ¶ä¸å¡å¯ä»¥å¨è¾å°IOPSçæ åµä¸å ç¨è¾å¤§å¸¦å®½ï¼å¦å¤ä¸äºåå¯è½å¨è¾å¤§IOPSçæ åµä¸å ç¨è¾å°å¸¦å®½ï¼æ以对è¿ç¨å ç¨IOçæ¶é´è¿è¡è°åº¦ææ¯ç¸å¯¹æå ¬å¹³çã
å³ï¼æä¸ç®¡ä½ æ¯IOPSé«è¿æ¯å¸¦å®½å ç¨é«ï¼å°äºæ¶é´å±å°±æ¢ä¸ä¸ä¸ªè¿ç¨å¤çï¼ä½ ç±åæ ·åæ ·ã
æ以ï¼cfqå°±æ¯è¯å¾ç»ææè¿ç¨åé çåçå设å¤ä½¿ç¨çæ¶é´çï¼è¿ç¨å¨æ¶é´çå ï¼å¯ä»¥å°äº§ççIO请æ±æ交ç»å设å¤è¿è¡å¤çï¼æ¶é´çç»æï¼è¿ç¨ç请æ±å°æè¿å®èªå·±çéåï¼çå¾ ä¸æ¬¡è°åº¦çæ¶åè¿è¡å¤çãè¿å°±æ¯cfqçåºæ¬åçã
å½ç¶ï¼ç°å®çæ´»ä¸ä¸å¯è½æçæ£çâå ¬å¹³âï¼å¸¸è§çåºç¨åºæ¯ä¸ï¼æ们å¾è¯è½éè¦äººä¸ºç对è¿ç¨çIOå ç¨è¿è¡äººä¸ºæå®ä¼å 级ï¼è¿å°±å对è¿ç¨çCPUå ç¨è®¾ç½®ä¼å 级çæ¦å¿µä¸æ ·ã
æ以ï¼é¤äºé对æ¶é´çè¿è¡å ¬å¹³éåè°åº¦å¤ï¼cfqè¿æä¾äºä¼å 级æ¯æãæ¯ä¸ªè¿ç¨é½å¯ä»¥è®¾ç½®ä¸ä¸ªIOä¼å 级ï¼cfqä¼æ ¹æ®è¿ä¸ªä¼å 级ç设置æ åµä½ä¸ºè°åº¦æ¶çéè¦åèå ç´ ã
ä¼å 级é¦å åæä¸å¤§ç±»ï¼RTãBEãIDLEï¼å®ä»¬åå«æ¯å®æ¶ï¼Real Timeï¼ãæä½³ææï¼Best Tryï¼åé²ç½®ï¼Idleï¼ä¸ä¸ªç±»å«ï¼å¯¹æ¯ä¸ªç±»å«çIOï¼cfqé½ä½¿ç¨ä¸åççç¥è¿è¡å¤çãå¦å¤ï¼RTåBEç±»å«ä¸ï¼åå«ååååäº8个åä¼å 级å®ç°æ´ç»èçQOSéæ±ï¼èIDLEåªæä¸ä¸ªåä¼å 级ã
å¦å¤ï¼æ们é½ç¥éå æ ¸é»è®¤å¯¹åå¨ç读åé½æ¯ç»è¿ç¼åï¼buffer/cacheï¼çï¼å¨è¿ç§æ åµä¸ï¼cfqæ¯æ æ³åºåå½åå¤çç请æ±æ¯æ¥èªåªä¸ä¸ªè¿ç¨çã
åªæå¨è¿ç¨ä½¿ç¨åæ¥æ¹å¼ï¼sync readæè sync wirteï¼æè ç´æ¥IOï¼Direct IOï¼æ¹å¼è¿è¡è¯»åçæ¶åï¼cfqæè½åºååºIO请æ±æ¥èªåªä¸ªè¿ç¨ã
æ以ï¼é¤äºé对æ¯ä¸ªè¿ç¨å®ç°çIOéå以å¤ï¼è¿å®ç°äºä¸ä¸ªå ¬å ±çéåç¨æ¥å¤çå¼æ¥è¯·æ±ã
å½åå æ ¸å·²ç»å®ç°äºé对IOèµæºçcgroupèµæºé离ï¼æ以å¨ä»¥ä¸ä½ç³»çåºç¡ä¸ï¼cfqä¹å®ç°äºé对cgroupçè°åº¦æ¯æã
æ»çæ¥è¯´ï¼cfqç¨äºä¸ç³»åçæ°æ®ç»æå®ç°äºä»¥ä¸ææå¤æåè½çæ¯æï¼å¤§å®¶å¯ä»¥éè¿æºä»£ç çå°å ¶ç¸å ³å®ç°ï¼æ件å¨æºä»£ç ç®å½ä¸çblock/cfq-iosched.cã
1.1 cfq设计åç
å¨æ¤ï¼æ们对æ´ä½æ°æ®ç»æåä¸ä¸ªç®è¦æè¿°ï¼é¦å ï¼cfqéè¿ä¸ä¸ªå«åcfq_dataçæ°æ®ç»æç»´æ¤äºæ´ä¸ªè°åº¦å¨æµç¨ãå¨ä¸ä¸ªæ¯æäºcgroupåè½çcfqä¸ï¼å ¨é¨è¿ç¨è¢«åæäºè¥å¹²ä¸ªcontral groupè¿è¡ç®¡çã
æ¯ä¸ªcgroupå¨cfqä¸é½æä¸ä¸ªcfq_groupçç»æè¿è¡æè¿°ï¼ææçcgroupé½è¢«ä½ä¸ºä¸ä¸ªè°åº¦å¯¹è±¡æ¾è¿ä¸ä¸ªçº¢é»æ ä¸ï¼å¹¶ä»¥vdisktime为keyè¿è¡æåºã
vdisktimeè¿ä¸ªæ¶é´çºªå½çæ¯å½åcgroupæå ç¨çioæ¶é´ï¼æ¯æ¬¡å¯¹cgroupè¿è¡è°åº¦æ¶ï¼æ»æ¯éè¿çº¢é»æ éæ©å½åvdisktimeæ¶é´æå°çcgroupè¿è¡å¤çï¼ä»¥ä¿è¯ææcgroupsä¹é´çIOèµæºå ç¨âå ¬å¹³âã
å½ç¶æ们ç¥éï¼cgroupæ¯å¯ä»¥å¯¹blkioè¿è¡èµæºæ¯ä¾åé çï¼å ¶ä½ç¨åçå°±æ¯ï¼åé æ¯ä¾å¤§çcgroupå ç¨vdisktimeæ¶é´å¢é¿è¾æ ¢ï¼åé æ¯ä¾å°çvdisktimeæ¶é´å¢é¿è¾å¿«ï¼å¿«æ ¢ä¸åé æ¯ä¾ææ£æ¯ã
è¿æ ·å°±åå°äºä¸åçcgroupåé çIOæ¯ä¾ä¸ä¸æ ·ï¼å¹¶ä¸å¨cfqçè§åº¦çæ¥ä¾ç¶æ¯âå ¬å¹³âçã
éæ©å¥½äºéè¦å¤ççcgroupï¼cfq_groupï¼ä¹åï¼è°åº¦å¨éè¦å³çéæ©ä¸ä¸æ¥çservice_treeã
service_treeè¿ä¸ªæ°æ®ç»æ对åºçé½æ¯ä¸ç³»åç红é»æ ï¼ä¸»è¦ç®çæ¯ç¨æ¥å®ç°è¯·æ±ä¼å 级åç±»çï¼å°±æ¯RTãBEãIDLEçåç±»ãæ¯ä¸ä¸ªcfq_groupé½ç»´æ¤äº7个service_treesï¼å ¶å®ä¹å¦ä¸ï¼
å ¶ä¸service_tree_idleå°±æ¯ç¨æ¥ç»IDLEç±»åç请æ±è¿è¡æéç¨ç红é»æ ã
èä¸é¢äºç»´æ°ç»ï¼é¦å 第ä¸ä¸ªç»´åº¦é对RTåBEåå«åå®ç°äºä¸ä¸ªæ°ç»ï¼æ¯ä¸ä¸ªæ°ç»ä¸é½ç»´æ¤äºä¸ä¸ªçº¢é»æ ï¼åå«å¯¹åºä¸ç§ä¸ååç±»åç请æ±ï¼åå«æ¯ï¼SYNCãSYNC_NOIDLE以åASYNCã
æ们å¯ä»¥è®¤ä¸ºSYNCç¸å½äºSYNC_IDLE并ä¸SYNC_NOIDLE对åºãidlingæ¯cfqå¨è®¾è®¡ä¸ä¸ºäºå°½éå并è¿ç»çIO请æ±ä»¥è¾¾å°æé«ååéçç®çèå å ¥çæºå¶ï¼æ们å¯ä»¥ç解为æ¯ä¸ç§â空转âçå¾ æºå¶ã
空转æ¯æï¼å½ä¸ä¸ªéåå¤çä¸ä¸ªè¯·æ±ç»æåï¼ä¼å¨åçè°åº¦ä¹å空çä¸å°ä¼æ¶é´ï¼å¦æä¸ä¸ä¸ªè¯·æ±å°æ¥ï¼åå¯ä»¥åå°ç£å¤´å¯»åï¼ç»§ç»å¤ç顺åºçIO请æ±ã
为äºå®ç°è¿ä¸ªåè½ï¼cfqå¨service_treeè¿å±æ°æ®ç»æè¿å®ç°äºSYNCéåï¼å¦æ请æ±æ¯åæ¥é¡ºåºè¯·æ±ï¼å°±å ¥éè¿ä¸ªservice treeï¼å¦æ请æ±æ¯åæ¥éæºè¯·æ±ï¼åå ¥éSYNC_NOIDLEéåï¼ä»¥å¤æä¸ä¸ä¸ªè¯·æ±æ¯å¦æ¯é¡ºåºè¯·æ±ã
ææçå¼æ¥åæä½è¯·æ±å°å ¥éASYNCçservice treeï¼å¹¶ä¸é对è¿ä¸ªéå没æ空转çå¾ æºå¶ã
æ¤å¤ï¼cfqè¿å¯¹SSDè¿æ ·ç硬çæç¹æ®è°æ´ï¼å½cfqåç°åå¨è®¾å¤æ¯ä¸ä¸ªssd硬çè¿æ ·çéå深度æ´å¤§ç设å¤æ¶ï¼ææé对åç¬éåç空转é½å°ä¸çæï¼ææçIO请æ±é½å°å ¥éSYNC_NOIDLEè¿ä¸ªservice treeã
æ¯ä¸ä¸ªservice treeé½å¯¹åºäºè¥å¹²ä¸ªcfq_queueéåï¼æ¯ä¸ªcfq_queueéå对åºä¸ä¸ªè¿ç¨ï¼è¿ä¸ªæ们åç»å详ç»è¯´æã
cfq_groupè¿ç»´æ¤äºä¸ä¸ªå¨cgroupå é¨ææè¿ç¨å ¬ç¨çå¼æ¥IO请æ±éåï¼å ¶ç»æå¦ä¸ï¼
å¼æ¥è¯·æ±ä¹åæäºRTãBEãIDLEè¿ä¸ç±»è¿è¡å¤çï¼æ¯ä¸ç±»å¯¹åºä¸ä¸ªcfq_queueè¿è¡æéã
BEåRTä¹å®ç°äºä¼å 级çæ¯æï¼æ¯ä¸ä¸ªç±»åæIOPRIO_BE_NRè¿ä¹å¤ä¸ªä¼å 级ï¼è¿ä¸ªå¼å®ä¹ä¸º8ï¼æ°ç»ä¸æ 为0-7ã
æ们ç®ååæçå æ ¸ä»£ç çæ¬ä¸ºLinux 4.4ï¼å¯ä»¥çåºï¼ä»cfqçè§åº¦æ¥è¯´ï¼å·²ç»å¯ä»¥å®ç°å¼æ¥IOçcgroupæ¯æäºï¼æ们éè¦å®ä¹ä¸ä¸è¿éæè°å¼æ¥IOçå«ä¹ï¼å®ä» ä» è¡¨ç¤ºä»å åçbuffer/cacheä¸çæ°æ®åæ¥å°ç¡¬ççIO请æ±ï¼èä¸æ¯aio(man 7 aio)æè linuxçnativeå¼æ¥io以ålibaioæºå¶ï¼å®é ä¸è¿äºæè°çâå¼æ¥âIOæºå¶ï¼å¨å æ ¸ä¸é½æ¯åæ¥å®ç°çï¼æ¬è´¨ä¸å¯è¯ºä¼æ¼è®¡ç®æºæ²¡æçæ£çâå¼æ¥âæºå¶ï¼ã
æ们å¨ä¸é¢å·²ç»è¯´æè¿ï¼ç±äºè¿ç¨æ£å¸¸æ åµä¸é½æ¯å°æ°æ®å åå ¥buffer/cacheï¼æ以è¿ç§å¼æ¥IOé½æ¯ç»ä¸ç±cfq_groupä¸çasync请æ±éåå¤ççã
é£ä¹ä¸ºä»ä¹å¨ä¸é¢çservice_treeä¸è¿è¦å®ç°åä¸ä¸ªASYNCçç±»åå¢ï¼
è¿å½ç¶æ¯ä¸ºäºæ¯æåºåè¿ç¨çå¼æ¥IO并使ä¹å¯ä»¥âå®å ¨å ¬å¹³âååå¤å½ã
å®é ä¸å¨ææ°çcgroup v2çblkioä½ç³»ä¸ï¼å æ ¸å·²ç»æ¯æäºé对buffer IOçcgroupééæ¯æï¼è以ä¸è¿äºå¯è½å®¹ææ··æ·çä¸å ç±»åï¼é½æ¯å¨æ°çä½ç³»ä¸éè¦ç¨å°çç±»åæ è®°ã
æ°ä½ç³»çå¤æ度æ´é«äºï¼åè½ä¹æ´å 强大ï¼ä½æ¯å¤§å®¶å ä¸è¦çæ¥ï¼æ£å¼çcgroup v2ä½ç³»ï¼å¨Linux 4.5åå¸çæ¶åä¼æ£å¼è·å¤§å®¶è§é¢ã
æ们继ç»éæ©service_treeçè¿ç¨ï¼ä¸ç§ä¼å 级类åçservice_treeçéæ©å°±æ¯æ ¹æ®ç±»åçä¼å 级æ¥åéæ©çï¼RTä¼å 级æé«ï¼BEå ¶æ¬¡ï¼IDLEæä½ãå°±æ¯è¯´ï¼RTéæï¼å°±ä¼ä¸ç´å¤çRTï¼RT没äºåå¤çBEã
æ¯ä¸ªservice_tree对åºä¸ä¸ªå ç´ ä¸ºcfq_queueæéç红é»æ ï¼èæ¯ä¸ªcfq_queueå°±æ¯å æ ¸ä¸ºè¿ç¨ï¼çº¿ç¨ï¼å建ç请æ±éåã
æ¯ä¸ä¸ªcfq_queueé½ä¼ç»´æ¤ä¸ä¸ªrb_keyçåéï¼è¿ä¸ªåéå®é ä¸å°±æ¯è¿ä¸ªéåçIOæå¡æ¶é´ï¼service timeï¼ã
è¿éè¿æ¯éè¿çº¢é»æ æ¾å°service timeæ¶é´æççé£ä¸ªcfq_queueè¿è¡æå¡ï¼ä»¥ä¿è¯âå®å ¨å ¬å¹³âã
éæ©å¥½äºcfq_queueä¹åï¼å°±è¦å¼å§å¤çè¿ä¸ªéåéçIO请æ±äºãè¿éçè°åº¦æ¹å¼åºæ¬è·deadline类似ã
cfq_queueä¼å¯¹è¿å ¥éåçæ¯ä¸ä¸ªè¯·æ±è¿è¡ä¸¤æ¬¡å ¥éï¼ä¸ä¸ªæ¾è¿fifoä¸ï¼å¦ä¸ä¸ªæ¾è¿æ访é®æåºé¡ºåºä½ä¸ºkeyç红é»æ ä¸ã
é»è®¤ä»çº¢é»æ ä¸å请æ±è¿è¡å¤çï¼å½è¯·æ±ç延æ¶æ¶é´è¾¾å°deadlineæ¶ï¼å°±ä»çº¢é»æ ä¸åçå¾ æ¶é´æé¿çè¿è¡å¤çï¼ä»¥ä¿è¯è¯·æ±ä¸è¢«é¥¿æ»ã
è¿å°±æ¯æ´ä¸ªcfqçè°åº¦æµç¨ï¼å½ç¶å ¶ä¸è¿æå¾å¤ç»ææ«è没æ交代ï¼æ¯å¦å并å¤ç以å顺åºå¤çççã
1.2 cfqçåæ°è°æ´
ç解æ´ä¸ªè°åº¦æµç¨æå©äºæ们å³çå¦ä½è°æ´cfqçç¸å ³åæ°ãææcfqçå¯è°åæ°é½å¯ä»¥å¨/sys/class/block/sda/queue/iosched/ç®å½ä¸æ¾å°ï¼å½ç¶ï¼å¨ä½ çç³»ç»ä¸ï¼è¯·å°sdaæ¿æ¢ä¸ºç¸åºçç£çå称ãæ们æ¥çä¸ä¸é½æä»ä¹ï¼
è¿äºåæ°é¨åæ¯è·æºæ¢°ç¡¬çç£å¤´å¯»éæ¹å¼æå ³çï¼å¦æå ¶è¯´æä½ çä¸æï¼è¯·å è¡¥å ç¸å ³ç¥è¯ï¼
back_seek_max:ç£å¤´å¯ä»¥åå寻åçæ大èå´ï¼é»è®¤å¼ä¸º16Mã
back_seek_penalty:åå寻åçæ©ç½ç³»æ°ãè¿ä¸ªå¼æ¯è·åå寻åè¿è¡æ¯è¾çã
以ä¸ä¸¤ä¸ªæ¯ä¸ºäºé²æ¢ç£å¤´å¯»éåçæå¨è导è´å¯»åè¿æ ¢è设置çãåºæ¬æè·¯æ¯è¿æ ·ï¼ä¸ä¸ªio请æ±å°æ¥çæ¶åï¼cfqä¼æ ¹æ®å ¶å¯»åä½ç½®é¢ä¼°ä¸ä¸å ¶ç£å¤´å¯»éææ¬ã
设置ä¸ä¸ªæ大å¼back_seek_maxï¼å¯¹äºè¯·æ±æ访é®çæåºå·å¨ç£å¤´åæ¹ç请æ±ï¼åªè¦å¯»åèå´æ²¡æè¶ è¿è¿ä¸ªå¼ï¼cfqä¼ååå寻åç请æ±ä¸æ ·å¤çå®ã
å设置ä¸ä¸ªè¯ä¼°ææ¬çç³»æ°back_seek_penaltyï¼ç¸å¯¹äºç£å¤´åå寻åï¼åå寻åçè·ç¦»ä¸º1/2(1/back_seek_penalty)æ¶ï¼cfq认为è¿ä¸¤ä¸ªè¯·æ±å¯»åç代价æ¯ç¸åã
è¿ä¸¤ä¸ªåæ°å®é ä¸æ¯cfqå¤æ请æ±å并å¤ççæ¡ä»¶éå¶ï¼å¡äºå¤åè¿ä¸ªæ¡ä»¶ç请æ±ï¼é½ä¼å°½éå¨æ¬æ¬¡è¯·æ±å¤ççæ¶åä¸èµ·å并å¤çã
fifo_expire_async:设置å¼æ¥è¯·æ±çè¶ æ¶æ¶é´ã
åæ¥è¯·æ±åå¼æ¥è¯·æ±æ¯åºåä¸åéåå¤ççï¼cfqå¨è°åº¦çæ¶åä¸è¬æ åµé½ä¼ä¼å å¤çåæ¥è¯·æ±ï¼ä¹ååå¤çå¼æ¥è¯·æ±ï¼é¤éå¼æ¥è¯·æ±ç¬¦åä¸è¿°å并å¤ççæ¡ä»¶éå¶èå´å ã
å½æ¬è¿ç¨çéå被è°åº¦æ¶ï¼cfqä¼ä¼å æ£æ¥æ¯å¦æå¼æ¥è¯·æ±è¶ æ¶ï¼å°±æ¯è¶ è¿fifo_expire_asyncåæ°çéå¶ãå¦ææï¼åä¼å åéä¸ä¸ªè¶ æ¶ç请æ±ï¼å ¶ä½è¯·æ±ä»ç¶æç §ä¼å 级以åæåºç¼å·å¤§å°æ¥å¤çã
fifo_expire_sync:è¿ä¸ªåæ°è·ä¸é¢ç类似ï¼åºå«æ¯ç¨æ¥è®¾ç½®åæ¥è¯·æ±çè¶ æ¶æ¶é´ã
slice_idle:åæ°è®¾ç½®äºä¸ä¸ªçå¾ æ¶é´ãè¿è®©cfqå¨åæ¢cfq_queueæservice treeçæ¶åçå¾ ä¸æ®µæ¶é´ï¼ç®çæ¯æé«æºæ¢°ç¡¬ççååéã
ä¸è¬æ åµä¸ï¼æ¥èªåä¸ä¸ªcfq_queueæè service treeçIO请æ±ç寻åå±é¨æ§æ´å¥½ï¼æ以è¿æ ·å¯ä»¥åå°ç£çç寻å次æ°ãè¿ä¸ªå¼å¨æºæ¢°ç¡¬çä¸é»è®¤ä¸ºéé¶ã
å½ç¶å¨åºæ硬çæè 硬RAID设å¤ä¸è®¾ç½®è¿ä¸ªå¼ä¸ºéé¶ä¼éä½åå¨çæçï¼å 为åºæ硬ç没æç£å¤´å¯»åè¿ä¸ªæ¦å¿µï¼æ以å¨è¿æ ·ç设å¤ä¸åºè¯¥è®¾ç½®ä¸º0ï¼å ³éæ¤åè½ã
group_idle:è¿ä¸ªåæ°ä¹è·ä¸ä¸ä¸ªåæ°ç±»ä¼¼ï¼åºå«æ¯å½cfqè¦åæ¢cfq_groupçæ¶åä¼çå¾ ä¸æ®µæ¶é´ã
å¨cgroupçåºæ¯ä¸ï¼å¦ææ们沿ç¨slice_idleçæ¹å¼ï¼é£ä¹ç©ºè½¬çå¾ å¯è½ä¼å¨cgroupç»å æ¯ä¸ªè¿ç¨çcfq_queueåæ¢æ¶åçã
è¿æ ·ä¼å¦æè¿ä¸ªè¿ç¨ä¸ç´æ请æ±è¦å¤ççè¯ï¼é£ä¹ç´å°è¿ä¸ªcgroupçé é¢è¢«èå°½ï¼åç»ä¸çå ¶å®è¿ç¨ä¹å¯è½æ æ³è¢«è°åº¦å°ãè¿æ ·ä¼å¯¼è´åç»ä¸çå ¶å®è¿ç¨é¥¿æ»è产çIOæ§è½ç¶é¢ã
å¨è¿ç§æ åµä¸ï¼æ们å¯ä»¥å°slice_idle ï¼ 0ègroup_idle ï¼ 8ãè¿æ ·ç©ºè½¬çå¾ å°±æ¯ä»¥cgroup为åä½è¿è¡çï¼èä¸æ¯ä»¥cfq_queueçè¿ç¨ä¸ºåä½è¿è¡ï¼ä»¥é²æ¢ä¸è¿°é®é¢äº§çã
low_latency:è¿ä¸ªæ¯ç¨æ¥å¼å¯æå ³écfqçä½å»¶æ¶ï¼low latencyï¼æ¨¡å¼çå¼å ³ã
å½è¿ä¸ªå¼å ³æå¼æ¶ï¼cfqå°ä¼æ ¹æ®target_latencyçåæ°è®¾ç½®æ¥å¯¹æ¯ä¸ä¸ªè¿ç¨çåçæ¶é´ï¼slice timeï¼è¿è¡éæ°è®¡ç®ã
è¿å°æå©äºå¯¹ååéçå ¬å¹³ï¼é»è®¤æ¯å¯¹æ¶é´çåé çå ¬å¹³ï¼ã
å ³éè¿ä¸ªåæ°ï¼è®¾ç½®ä¸º0ï¼å°å¿½ç¥target_latencyçå¼ãè¿å°ä½¿ç³»ç»ä¸çè¿ç¨å®å ¨æç §æ¶é´çæ¹å¼è¿è¡IOèµæºåé ãè¿ä¸ªå¼å ³é»è®¤æ¯æå¼çã
æ们已ç»ç¥écfq设计ä¸æâ空转âï¼idlingï¼è¿ä¸ªæ¦å¿µï¼ç®çæ¯ä¸ºäºå¯ä»¥è®©è¿ç»ç读åæä½å°½å¯è½å¤çå并å¤çï¼åå°ç£å¤´ç寻åæä½ä»¥ä¾¿å¢å¤§ååéã
å¦ææè¿ç¨æ»æ¯å¾å¿«çè¿è¡é¡ºåºè¯»åï¼é£ä¹å®å°å 为cfqç空转çå¾ å½ä¸çå¾é«è导è´å ¶å®éè¦å¤çIOçè¿ç¨ååºé度ä¸éï¼å¦æå¦ä¸ä¸ªéè¦è°åº¦çè¿ç¨ä¸ä¼ååºå¤§é顺åºIOè¡ä¸ºçè¯ï¼ç³»ç»ä¸ä¸åè¿ç¨IOååéç表ç°å°±ä¼å¾ä¸åè¡¡ã
å°±æ¯å¦ï¼ç³»ç»å åçcacheä¸æå¾å¤è页è¦ååæ¶ï¼æ¡é¢åè¦æå¼ä¸ä¸ªæµè§å¨è¿è¡æä½ï¼è¿æ¶è页ååçåå°è¡ä¸ºå°±å¾å¯è½ä¼å¤§éå½ä¸ç©ºè½¬æ¶é´ï¼è导è´æµè§å¨çå°éIOä¸ç´çå¾ ï¼è®©ç¨æ·æè§æµè§å¨è¿è¡ååºé度åæ ¢ã
è¿ä¸ªlow_latency主è¦æ¯å¯¹è¿ç§æ åµè¿è¡ä¼åçé项ï¼å½å ¶æå¼æ¶ï¼ç³»ç»ä¼æ ¹æ®target_latencyçé 置对å 为å½ä¸ç©ºè½¬è大éå ç¨IOååéçè¿ç¨è¿è¡éå¶ï¼ä»¥è¾¾å°ä¸åè¿ç¨IOå ç¨çååéçç¸å¯¹åè¡¡ãè¿ä¸ªå¼å ³æ¯è¾åéå¨ç±»ä¼¼æ¡é¢åºç¨çåºæ¯ä¸æå¼ã
target_latency:å½low_latencyçå¼ä¸ºå¼å¯ç¶ææ¶ï¼cfqå°æ ¹æ®è¿ä¸ªå¼éæ°è®¡ç®æ¯ä¸ªè¿ç¨åé çIOæ¶é´çé¿åº¦ã
quantum:è¿ä¸ªåæ°ç¨æ¥è®¾ç½®æ¯æ¬¡ä»cfq_queueä¸å¤çå¤å°ä¸ªIO请æ±ãå¨ä¸ä¸ªéåå¤çäºä»¶å¨æä¸ï¼è¶ è¿è¿ä¸ªæ°åçIO请æ±å°ä¸ä¼è¢«å¤çãè¿ä¸ªåæ°åªå¯¹åæ¥ç请æ±ææã
slice_sync:å½ä¸ä¸ªcfq_queueéå被è°åº¦å¤çæ¶ï¼å®å¯ä»¥è¢«åé çå¤çæ»æ¶é´æ¯éè¿è¿ä¸ªå¼æ¥ä½ä¸ºä¸ä¸ªè®¡ç®åæ°æå®çãå ¬å¼ä¸ºï¼time_slice = slice_sync + (slice_sync/5 * (4 - prio))ãè¿ä¸ªåæ°å¯¹åæ¥è¯·æ±ææã
slice_async:è¿ä¸ªå¼è·ä¸ä¸ä¸ªç±»ä¼¼ï¼åºå«æ¯å¯¹å¼æ¥è¯·æ±ææã
slice_async_rq:è¿ä¸ªåæ°ç¨æ¥éå¶å¨ä¸ä¸ªsliceçæ¶é´èå´å ï¼ä¸ä¸ªéåæå¤å¯ä»¥å¤ççå¼æ¥è¯·æ±ä¸ªæ°ã请æ±è¢«å¤ççæ大个æ°è¿è·ç¸å ³è¿ç¨è¢«è®¾ç½®çioä¼å 级æå ³ã
1.3 cfqçIOPS模å¼
æ们已ç»ç¥éï¼é»è®¤æ åµä¸cfqæ¯ä»¥æ¶é´çæ¹å¼æ¯æç带ä¼å 级çè°åº¦æ¥ä¿è¯IOèµæºå ç¨çå ¬å¹³ã
é«ä¼å 级çè¿ç¨å°å¾å°æ´å¤çæ¶é´çé¿åº¦ï¼èä½ä¼å 级çè¿ç¨æ¶é´çç¸å¯¹è¾å°ã
å½æ们çåå¨æ¯ä¸ä¸ªé«é并ä¸æ¯æNCQï¼åçæ令éåï¼ç设å¤çæ¶åï¼æ们æ好å¯ä»¥è®©å ¶å¯ä»¥ä»å¤ä¸ªcfqéåä¸å¤çå¤è·¯ç请æ±ï¼ä»¥ä¾¿æåNCQçå©ç¨çã
æ¤æ¶ä½¿ç¨æ¶é´ççåé æ¹å¼åé èµæºå°±æ¾å¾ä¸åæ¶å®äºï¼å 为åºäºæ¶é´ççåé ï¼åä¸æ¶å»æå¤è½å¤çç请æ±éååªæä¸ä¸ªã
è¿æ¶ï¼æ们éè¦åæ¢cfqç模å¼ä¸ºIOPS模å¼ãåæ¢æ¹å¼å¾ç®åï¼å°±æ¯å°slice_idle=0å³å¯ãå æ ¸ä¼èªå¨æ£æµä½ çåå¨è®¾å¤æ¯å¦æ¯æNCQï¼å¦ææ¯æçè¯cfqä¼èªå¨åæ¢ä¸ºIOPS模å¼ã
å¦å¤ï¼å¨é»è®¤çåºäºä¼å 级çæ¶é´çæ¹å¼ä¸ï¼æ们å¯ä»¥ä½¿ç¨ioniceå½ä»¤æ¥è°æ´è¿ç¨çIOä¼å 级ãè¿ç¨é»è®¤åé çIOä¼å 级æ¯æ ¹æ®è¿ç¨çniceå¼è®¡ç®èæ¥çï¼è®¡ç®æ¹æ³å¯ä»¥å¨man ioniceä¸çå°ï¼è¿éä¸ååºè¯ã
2ãdeadlineï¼æç»æéè°åº¦
deadlineè°åº¦ç®æ³ç¸å¯¹cfqè¦ç®åå¾å¤ãå ¶è®¾è®¡ç®æ æ¯ï¼
å¨ä¿è¯è¯·æ±æç §è®¾å¤æåºç顺åºè¿è¡è®¿é®çåæ¶ï¼å ¼é¡¾å ¶å®è¯·æ±ä¸è¢«é¥¿æ»ï¼è¦å¨ä¸ä¸ªæç»æéå被è°åº¦å°ã
æ们ç¥éç£å¤´å¯¹ç£çç寻éæ¯å¯ä»¥è¿è¡é¡ºåºè®¿é®åéæºè®¿é®çï¼å 为寻é延æ¶æ¶é´çå ³ç³»ï¼é¡ºåºè®¿é®æ¶IOçååéæ´å¤§ï¼éæºè®¿é®çååéå°ã
å¦ææ们æ³ä¸ºä¸ä¸ªæºæ¢°ç¡¬çè¿è¡ååéä¼åçè¯ï¼é£ä¹å°±å¯ä»¥è®©è°åº¦å¨æç §å°½éå¤å顺åºè®¿é®çIO请æ±è¿è¡æåºï¼ä¹å请æ±ä»¥è¿æ ·ç顺åºåéç»ç¡¬çï¼å°±å¯ä»¥ä½¿IOçååéæ´å¤§ã
ä½æ¯è¿æ ·åä¹æå¦ä¸ä¸ªé®é¢ï¼å°±æ¯å¦ææ¤æ¶åºç°äºä¸ä¸ªè¯·æ±ï¼å®è¦è®¿é®çç£é离ç®åç£å¤´æå¨ç£éå¾è¿ï¼åºç¨ç请æ±å大ééä¸å¨ç®åç£ééè¿ã
导è´å¤§é请æ±ä¸ç´ä¼è¢«å并åæéå¤çï¼èé£ä¸ªè¦è®¿é®æ¯è¾è¿ç£éç请æ±å°å 为ä¸ç´ä¸è½è¢«è°åº¦è饿æ»ã
deadlineå°±æ¯è¿æ ·ä¸ç§è°åº¦å¨ï¼è½å¨ä¿è¯IOæ大ååéçæ åµä¸ï¼å°½é使è¿ç«¯è¯·æ±å¨ä¸ä¸ªæéå 被è°åº¦èä¸è¢«é¥¿æ»çè°åº¦å¨ã
温馨提示:答案为网友推荐,仅供参考