《Linux设备驱动开发详解:基于最新的Linux 4.0内核》19. Linux电源管理系统架构和驱动 联系客服

发布时间 : 星期日 文章《Linux设备驱动开发详解:基于最新的Linux 4.0内核》19. Linux电源管理系统架构和驱动更新完毕开始阅读14578ec60b4e767f5acfceee

15 int __init omap4_opp_init(void) 16 {

17 …

18 r = omap_init_opp_table(omap44xx_opp_def_list, 19 ARRAY_SIZE(omap44xx_opp_def_list)); 20

21 return r; 22 }

23 device_initcall(omap4_opp_init);

24 int __init omap_init_opp_table(struct omap_opp_def *opp_def, 25 u32 opp_def_size) 26 {

27 …

28 /* Lets now register with OPP library */

29 for (i = 0; i < opp_def_size; i++, opp_def++) { 30 …

31 if (!strncmp(opp_def->hwmod_name, \32 /*

33 * All current OMAPs share voltage rail and 34 * clock source, so CPU0 is used to represent 35 * the MPU-SS. 36 */

37 dev = get_cpu_device(0); 38 } …

39 r = opp_add(dev, opp_def->freq, opp_def->u_volt); 40 … 41 }

42 return 0; 43 }

针对device结构体指针dev对应的domain增加一个新的OPP,参数freq和u_volt即为该OPP对应的频率和电压。

int opp_enable(struct device *dev, unsigned long freq); int opp_disable(struct device *dev, unsigned long freq);

上述API用于使能和禁止某个OPP,一旦被disable,其available将成为false,之后有设备驱动想设置为这个OPP就不再可能了。譬如,当温度超过某个范围后,系统不允许1GHz的工作频率,可采用类似代码:

if (cur_temp > temp_high_thresh) {

/* Disable 1GHz if it was enabled */ rcu_read_lock();

opp = opp_find_freq_exact(dev, 1000000000, true); rcu_read_unlock();

/* just error check */ if (!IS_ERR(opp))

ret = opp_disable(dev, 1000000000); else

goto try_something_else; }

上述代码中调用的opp_find_freq_exact()用于寻找与一个确定频率和available匹配的OPP,其原型为:

struct opp *opp_find_freq_exact(struct device *dev, unsigned long freq, bool available);

另外,Linux还提供2个变体,opp_find_freq_floor()用于寻找1个OPP,它的频率向上接近或等于指定的频率;opp_find_freq_ceil()用于寻找1个OPP,它的频率向下接近或等于指定的频率,这2个函数的原型为:

struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq);

struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq);

我们可用下面的代码分别寻找1个设备的最大和最小工作频率:

freq = ULONG_MAX; rcu_read_lock();

opp_find_freq_floor(dev, &freq); rcu_read_unlock();

freq = 0;

rcu_read_lock();

opp_find_freq_ceil(dev, &freq); rcu_read_unlock();

在频率降低的同时,其支撑该频率运行所需的电压也往往可以动态调低;反之,则可能需要调高,下面这2个API分别用于获取某OPP对应的电压和频率:

unsigned long opp_get_voltage(struct opp *opp); unsigned long opp_get_freq(struct opp *opp);

举个例子,当某CPUFreq驱动想将CPU设置为某一频率的时候,它可能会同时设置电压,其代码流程为:

soc_switch_to_freq_voltage(freq) {

/* do things */ rcu_read_lock();

opp = opp_find_freq_ceil(dev, &freq); v = opp_get_voltage(opp); rcu_read_unlock(); if (v)

regulator_set_voltage(.., v); /* do other things */ }

如下简单的API可用于获取某设备所支持的OPP的个数:

int opp_get_opp_count(struct device *dev);

前面提到,TI OMAP CPUFreq驱动的底层就使用了OPP这种机制来获取CPU所支持的频率和电压列表。它在omap_init_opp_table()函数中添加了相应的OPP,在TI OMAP芯片的CPUFreq驱动drivers/cpufreq/omap-cpufreq.c中,则借助了快捷函数opp_init_cpufreq_table()来依据前面注册的OPP建立CPUFreq的频率表:

static int __cpuinit omap_cpu_init(struct cpufreq_policy *policy) {

if (!freq_table)

result = opp_init_cpufreq_table(mpu_dev, &freq_table);

… }

而在CPUFreq驱动的target成员函数omap_target()中,则使用OPP相关的API来获取了频率和电压:

static int omap_target(struct cpufreq_policy *policy, unsigned int target_freq, unsigned int relation) {

if (mpu_reg) {

opp = opp_find_freq_ceil(mpu_dev, &freq); …

volt = opp_get_voltage(opp); … }

… }

drivers/cpufreq/omap-cpufreq.c相对来说较为规范,它在<频率,电压>表方面底层使用了OPP,在设置电压的时候又使用了规范的Regulator API。

比较新的驱动一般不太喜欢直接在代码里面固话OPP表,而是喜欢在相应的结点添加operating-points属性,如imx27.dtsi中的:

cpus {

#size-cells = <0>; #address-cells = <1>;

cpu: cpu@0 {

device_type = \

compatible = \ operating-points = < /* kHz uV */ 266000 1300000 399000 1450000 >;

clock-latency = <62500>;

clocks = <&clks IMX27_CLK_CPU_DIV>; voltage-tolerance = <5>; }; };

如果CPUFreq的变化可以使用非常标准的regulator、clk API,我们甚至可以直接使用drivers/cpufreq/cpufreq-dt.c这个驱动。这样只需要在CPU结点上填充好频率电压表,然后在平台代码里面里面注册cpufreq-dt设备就可以了,arch/arm/mach-imx/imx27-dt.c、arch/arm/mach-imx/mach-imx51.c中可以找到类似的例子:

static void __init imx27_dt_init(void) {

struct platform_device_info devinfo = { .name = \

of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);

platform_device_register_full(&devinfo); }

7. PM QoS

Linux内核的PM QoS系统针对内核和应用程序提供了一套接口,透过这个接口,用户可以设定自身对性能的期望。一类是系统级的需求,透过cpu_dma_latency, network_latency, network_throughput这些参数来设定;一类是单个设备可以根据自身的性能需求发起per-device的PM QoS请求。

在内核空间,通过pm_qos_add_request()函数可以注册PM QoS 请求:

void pm_qos_add_request(struct pm_qos_request *req, int pm_qos_class, s32 value);

通过pm_qos_update_request() 函数可以更新已注册的PM QoS 请求:

void pm_qos_update_request(struct pm_qos_request *req, s32 new_value);

void pm_qos_update_request_timeout(struct pm_qos_request *req, s32 new_value, unsigned long timeout_us);

通过pm_qos_remove_request()函数可以删除已注册的PM QoS 请求:

void pm_qos_remove_request(struct pm_qos_request *req);

譬如在drivers/media/platform/via-camera.c这个摄像头驱动中,当摄像头开启后,通过如下语句可以阻止CPU进入C3级别的深度IDLE:

static int viacam_streamon(struct file *filp, void *priv, enum v4l2_buf_type t) { … pm_qos_add_request(&cam->qos_request, PM_QOS_CPU_DMA_LATENCY, 50); … }

这是因为,在CPUIdle子系统中,会根据PM_QOS_CPU_DMA_LATENCY请求的情况选择合适的C状态,如drivers/cpuidle/governors/ladder.c中的ladder_select_state()就会判断目标C状态的exit_latency与QoS要求的关系,如代码清单19.11。 代码清单19.11 CPUIdle LADDER governor对QoS的判断

01 static int ladder_select_state(struct cpuidle_driver *drv, 02 struct cpuidle_device *dev) 03 {

04 …

05 int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); 06

07 … 08

09 /* consider promotion */

10 if (last_idx < drv->state_count - 1 &&

11 !drv->states[last_idx + 1].disabled &&

12 !dev->states_usage[last_idx + 1].disable &&

13 last_residency > last_state->threshold.promotion_time && 14 drv->states[last_idx + 1].exit_latency <= latency_req) { 15 last_state->stats.promotion_count++; 16 last_state->stats.demotion_count = 0; 17 if(last_state->stats.promotion_count>= 18 last_state->threshold.promotion_count) {

19 ladder_do_selection(ldev, last_idx, last_idx + 1); 20 return last_idx + 1; 21 } 22 } 23 … 24 }

LADDER在选择是否进入更深层次的C状态时,会比较C状态的exit_latency要小于透过pm_qos_request(PM_QOS_CPU_DMA_LATENCY)得到的PM QoS请求的延迟,见第14行。

同样的逻辑也出现于drivers/cpuidle/governors/menu.c中,如代码清单19.12的第18~19行。

代码清单19.12 CPUIdle MENU governor对QoS的判断

01 static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) 02 {

03 struct menu_device *data = &__get_cpu_var(menu_devices); 04 int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); 05 … 06 /*

07 * Find the idle state with the lowest power while satisfying 08 * our constraints. 09 */

10 for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) { 11 struct cpuidle_state *s = &drv->states[i];

12 struct cpuidle_state_usage *su = &dev->states_usage[i]; 13

14 if (s->disabled || su->disable)