Python一键检查Butterfly主题友情链接可用性脚本

因为友情链接实在是太多了，想要高频率检查几乎不可能，所以写了个python脚本来读取本地yml文件，然后发送head请求来检查是否能够访问。检查完生成一个txt文件能够看到所有无法请求到的url，然后就可以一个一个手动访问查看是否有问题。证明确实无法访问的移动到失联列表，显著提高效率。

友情链接格式

我的友情链接格式为：

- class_name: 友情鏈接
  class_desc: 那些人，那些事
  link_list:
    - name: JerryC
      link: https://jerryc.me/
      avatar: https://jerryc.me/img/avatar.png
      descr: 今日事,今日畢
    - name: Hexo
      link: https://hexo.io/zh-tw/
      avatar: https://d33wubrfki0l68.cloudfront.net/6657ba50e702d84afb32fe846bed54fba1a77add/827ae/logo.svg
      descr: 快速、簡單且強大的網誌框架
- class_name: 網站
  class_desc: 值得推薦的網站
  link_list:
    - name: Youtube
      link: https://www.youtube.com/
      avatar: https://i.loli.net/2020/05/14/9ZkGg8v3azHJfM1.png
      descr: 視頻網站
    - name: Weibo
      link: https://www.weibo.com/
      avatar: https://i.loli.net/2020/05/14/TLJBum386vcnI1P.png
      descr: 中國最大社交分享平台
    - name: Twitter
      link: https://twitter.com/
      avatar: https://i.loli.net/2020/05/14/5VyHPQqR6LWF39a.png
      descr: 社交分享平台

Python脚本

先安装库（已安装可以忽略）

1	pip install pyyaml

脚本：

import yaml
import requests
import concurrent.futures
import os

# Path to the YAML file containing the link information
yaml_file_path = '你的友情链接文件地址link.yml'

# Path to the output text file that will list all inaccessible links
output_txt_path = '写入的无法访问网址列表文本地址inaccessible_links.txt'

# Load the YAML data
with open(yaml_file_path, 'r', encoding='utf-8') as file:
    data = yaml.safe_load(file)

# User-Agent string to mimic a web browser
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

# Dictionaries to store accessible and inaccessible links with their original index
accessible_links = {}
inaccessible_links = {}

# Function to check if a link is accessible with a HEAD request
def check_link_accessibility(link, index):
    headers = {"User-Agent": user_agent}  # Add User-Agent to headers
    try:
        # Send a HEAD request instead of GET
        response = requests.head(link, headers=headers, timeout=5)
        if response.status_code == 200:
            accessible_links[index] = link  # Store accessible link with its index
            print(f"Accessible: {link}", flush=True)  # Print accessible links
        else:
            inaccessible_links[index] = link  # Store inaccessible link with its index
    except requests.RequestException:
        inaccessible_links[index] = link  # Store inaccessible link with its index

# Use a ThreadPoolExecutor to check multiple links concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    # Collect all links from the YAML data
    links_to_check = []
    index = 0  # Index to maintain the original order
    for section in data:
        if 'link_list' in section:
            for item in section['link_list']:
                links_to_check.append((index, item['link']))  # Keep track of index
                index += 1

    # Submit all the tasks to the executor with the original index
    futures = [executor.submit(check_link_accessibility, link, idx) for idx, link in links_to_check]

    # Ensure all futures are completed
    concurrent.futures.wait(futures)

# Write the inaccessible links to the output text file in original order
with open(output_txt_path, 'w', encoding='utf-8') as file:
    if inaccessible_links:
        file.write("Inaccessible Links:\n")
        for idx in sorted(inaccessible_links.keys()):  # Sort by index to maintain order
            file.write(f"{inaccessible_links[idx]}\n")
    else:
        file.write("All links are accessible.")

# Print the accessible links in the original order
print("Accessible Links:")
for idx in sorted(accessible_links.keys()):  # Sort by index to maintain order
    print(accessible_links[idx], flush=True)