Teambition数据抓取

起源

自从工作室使用了Teambition作为项目管理工具后，为了统一管理，bug也开始记录到Teambition里边去。

总体来说，Teambition轻便，多客户端支持，界面也很漂亮，但是如果用来管理bug的话，那么问题来了~

因为用的是免费版，数据统计功能弱到几乎可以忽略，那么怎么做bug统计呢？以当前未修复的bug统计为例。

最开始的几天，操作过程是这么做的：

打开Teambition的客户端，进入到测试分组的主页
右击打开开发者工具，把整个html内容保存到桌面
运行一段python小程序，解析这个html，打印出未修复的bug统计

结果如下：

当前未修复bug总计：18
SC 6
ZP 1
QR 3
DR 3
FF 5

这个方案虽然暂时够用，但是操作起来，还是比较麻烦，决定进行优化：

目标：采集bug数据过程全自动化，并转为json格式保存起来

开发环境：Python2.7+Win10+Pycharm

第一次尝试，使用requests库

登陆

import requests
s = requests.session()
header = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36',
'content-type':'application/x-www-form-urlencoded'
}
payload = {
'phone':None,
'email':'',
'password':''
}
r = s.post("https://account.Teambition.com/login",data=payload, headers=header)
print r.status_code

登陆成功~

下一步，获取全部的bug信息，卡住了，原因是：Teambition的页面是通过js动态生成的，而requests进入测试分组页拿到的js执行渲染前的页面，所以获取不到bug数据，如果要继续走下去，需要分析获取真实数据的url地址，暂时先搁着。

第二次尝试，使用selenium+bs4

登陆

from selenium import webdriver
driver = webdriver.Chrome()
driver.set_page_load_timeout(5)  # 设置超时，有时候个别js会加载太久
try:
    driver.get('https://account.Teambition.com/login')
except Exception as e:
    print e
# 输入用户名
driver.find_element_by_xpath("/html/body/section/div[2]/form/div[2]/input").send_keys('')
# 输入密码
driver.find_element_by_xpath("/html/body/section/div[2]/form/div[3]/input").send_keys('')
# 登陆按钮
driver.find_element_by_xpath("/html/body/section/div[2]/form/button").click()

选择项目，进入测试分组页

# 选择项目
driver.find_element_by_xpath('//*[@id="content"]/div/ul/li[2]/ul/li[2]').click()
sleep_for(1)
# 点击下拉菜单
driver.find_element_by_xpath('//*[@id="content"]/div/nav/div/section/ul/li[1]/i').click()
sleep_for(2)
popover_id = driver.find_element_by_class_name("popover")
aid = popover_id.get_attribute('id')
# 进入测试分组页
driver.find_element_by_xpath('//*[@id="{0}"]/div/div/ul[1]/li[3]'.format(aid)).click()

使用bs4分析页面并保存数据

import bs4,json
soup = bs4.BeautifulSoup(driver.page_source,'lxml') # 使用lxml作为解析器，需要单独安装
bugs = dict()
all_bugs = soup.find_all('li', {'class':re.compile("scrum-stage")})  # 获取全部分组
for i in all_bugs:
    group = i.find('div', {'class': 'stage-name'}).text.strip().split()[0]  # 获取分组名称，如“待处理”
    for ii in i.find_all('li'):
        aid = ii['data-id']
        name = ii.find('div', {'class': 'avatar img-circle img-24 hinted'})['data-title']
        title = ii.find('div', {'class': 'task-content'}).text.strip()
        bugs[aid] = {'name': name, "title": title, "group": group, "record_time": time_now}
# 保存数据
collect_txt = "bugstat.txt"
with open(collect_txt, 'w') as f:
    f.write(json.dumps(bugs))

搞定~