因为友情链接实在是太多了,想要高频率检查几乎不可能,所以写了个python脚本来读取本地yml文件,然后发送head请求来检查是否能够访问。检查完生成一个txt文件能够看到所有无法请求到的url,然后就可以一个一个手动访问查看是否有问题。证明确实无法访问的移动到失联列表,显著提高效率。

友情链接格式

我的友情链接格式为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
- class_name: 友情鏈接
class_desc: 那些人,那些事
link_list:
- name: JerryC
link: https://jerryc.me/
avatar: https://jerryc.me/img/avatar.png
descr: 今日事,今日畢
- name: Hexo
link: https://hexo.io/zh-tw/
avatar: https://d33wubrfki0l68.cloudfront.net/6657ba50e702d84afb32fe846bed54fba1a77add/827ae/logo.svg
descr: 快速、簡單且強大的網誌框架
- class_name: 網站
class_desc: 值得推薦的網站
link_list:
- name: Youtube
link: https://www.youtube.com/
avatar: https://i.loli.net/2020/05/14/9ZkGg8v3azHJfM1.png
descr: 視頻網站
- name: Weibo
link: https://www.weibo.com/
avatar: https://i.loli.net/2020/05/14/TLJBum386vcnI1P.png
descr: 中國最大社交分享平台
- name: Twitter
link: https://twitter.com/
avatar: https://i.loli.net/2020/05/14/5VyHPQqR6LWF39a.png
descr: 社交分享平台

Python脚本

先安装库(已安装可以忽略)

1
pip install pyyaml

脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import yaml
import requests
import concurrent.futures
import os

# Path to the YAML file containing the link information
yaml_file_path = '你的友情链接文件地址link.yml'

# Path to the output text file that will list all inaccessible links
output_txt_path = '写入的无法访问网址列表文本地址inaccessible_links.txt'

# Load the YAML data
with open(yaml_file_path, 'r', encoding='utf-8') as file:
data = yaml.safe_load(file)

# User-Agent string to mimic a web browser
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

# Dictionaries to store accessible and inaccessible links with their original index
accessible_links = {}
inaccessible_links = {}

# Function to check if a link is accessible with a HEAD request
def check_link_accessibility(link, index):
headers = {"User-Agent": user_agent} # Add User-Agent to headers
try:
# Send a HEAD request instead of GET
response = requests.head(link, headers=headers, timeout=5)
if response.status_code == 200:
accessible_links[index] = link # Store accessible link with its index
print(f"Accessible: {link}", flush=True) # Print accessible links
else:
inaccessible_links[index] = link # Store inaccessible link with its index
except requests.RequestException:
inaccessible_links[index] = link # Store inaccessible link with its index

# Use a ThreadPoolExecutor to check multiple links concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
# Collect all links from the YAML data
links_to_check = []
index = 0 # Index to maintain the original order
for section in data:
if 'link_list' in section:
for item in section['link_list']:
links_to_check.append((index, item['link'])) # Keep track of index
index += 1

# Submit all the tasks to the executor with the original index
futures = [executor.submit(check_link_accessibility, link, idx) for idx, link in links_to_check]

# Ensure all futures are completed
concurrent.futures.wait(futures)

# Write the inaccessible links to the output text file in original order
with open(output_txt_path, 'w', encoding='utf-8') as file:
if inaccessible_links:
file.write("Inaccessible Links:\n")
for idx in sorted(inaccessible_links.keys()): # Sort by index to maintain order
file.write(f"{inaccessible_links[idx]}\n")
else:
file.write("All links are accessible.")

# Print the accessible links in the original order
print("Accessible Links:")
for idx in sorted(accessible_links.keys()): # Sort by index to maintain order
print(accessible_links[idx], flush=True)