Skip to main content

Lighthouse automatic run audit and report result for a web site

Lighthouse is a SEO, performance audit tool by Google Chrome. It provide many report about performance, SEO with suggestion to improve.
This guide I will try to write some bash script and/or NodeJS module to automated the process.
This guide include many tools:
ScreemingFrog (SEO tool, this require fee or you can cr*k :), search free key )
Lighthouse  and here https://developers.google.com/web/tools/lighthouse/
Lighthouse viewer https://googlechrome.github.io/lighthouse/viewer2x/
Git-Bash (some unix tool on fucking Windows)
Python3 (if you use Windows then find yourself how to install it).
NodeJS: Lighthouse and many tool is based on (or install required NodeJS)

Script to get only performance Optimized images report from a URL:

#!/bin/bash
FILES=./LH_json/*
for f in $FILES
do
  report_length=$(cat $f |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" | wc -m);
  report="$(cat $f | python3 -c  | fold -w 12 | head -n 1)"
  
  # Normal page without any image need optimize have report length 720 chars.
  # So we only care ab site page has report length longer than that. These report contain image optimize suggest.

  if [ "$report_length" -gt 720 ]; then
    #echo "$report"
    echo $f
    cat $f |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" 
  else 
    #echo "---"
    printf "\n"
  fi
  
  #  len=$(expr length 'monkey brains');
  # TODO grep exclude eg farm4. as external resources
  # loop through report image to get percent (if needed). or just get image url and page-url
done

use simple print(json.load...) will result in error json format.
JSON seem use double quote for dict data. single quote like this will cause error:


{'rawValue': 90, 'description': 'Optimize images', 'extendedInfo': {'value': {'wastedKb': 12, 'results': [{'fromProtocol': True, 'preview': {'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'mimeType': 'image/jpeg', 'type': 'thumbnail'}, 'totalMs': '380\xa0ms', 'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'totalBytes': 49508, 'wastedBytes': 12223, 'totalKb': '48\xa0KB', 'wastedKb': '12\xa0KB', 'isCrossOrigin': False, 'wastedMs': '90\xa0ms', 'potentialSavings': '12\xa0KB (25%)'}], 'wastedMs': 90}}, 'informative': True, 'score': 90, 'helpText': 'Optimized images load faster and consume less cellular data. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/optimize-images).', 'details': {'header': 'View Details', 'type': 'table', 'itemHeaders': [{'text': '', 'type': 'text', 'itemType': 'thumbnail'}, {'text': 'URL', 'type': 'text', 'itemType': 'url'}, {'text': 'Original', 'type': 'text', 'itemType': 'text'}, {'text': 'Potential Savings', 'type': 'text', 'itemType': 'text'}], 'items': [[{'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'mimeType': 'image/jpeg', 'type': 'thumbnail'}, {'text': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'type': 'url'}, {'text': '48\xa0KB', 'type': 'text'}, {'text': '12\xa0KB (25%)', 'type': 'text'}]]}, 'displayValue': 'Potential savings of 12\xa0KB (~90\xa0ms)', 'scoringMode': 'binary', 'name': 'uses-optimized-images'}

So we have to put a command json.dumps() before this output:
cat $line  |  python3 -c "import sys, json; print(json.dumps( json.load(sys.stdin)['audits']['uses-optimized-images']) )" >> lh_opt_img.json

To get Lighthouse (LH) report from a URL:
lighthouse --quiet --chrome-flags="--headless"  --perf --output=json http://yoursite.test.co/courses/test-coures-1      --output-path course1.json

To get html instead of json output you can simple change output=json to output=html.
To view JSON autdit result go to Lighthouse viewer then browse json file to view.
If you have html output simple open it with chrome browser.
You can use trick to open many audit result:
In bash or git-bash type something like this:
open or chrome-browser path_to_html_result

Combine with python to get only wanted report
 lighthouse --quiet --chrome-flags="--headless"  --perf --output=json http://tafecourses.uat.pgtest.co/courses/massage-therapy/rockhampton/ |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" 

This example show the limit of one-line python should be. And we should move to complete python script solution or at least call another .py file from bash ?
https://unix.stackexchange.com/questions/116228/parse-json-using-python

Python script to grab JSON report, generate CSV expected output report.
import json
from collections import namedtuple

with open('lh_opt_img.json') as f:
for line in f:
# Parse JSON into an object with attributes corresponding to dict keys.
rp = json.loads(line, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
# print line, TODO output CSV format
print '"' + str(rp.rawValue) +'(ms)","' + str(rp.extendedInfo.value.wastedKb) + '","' + rp.extendedInfo.value.results[0].totalKb.encode('utf-8') +'","'+ rp.extendedInfo.value.results[0].potentialSavings.encode('utf-8') + '","' + rp.extendedInfo.value.results[0].url.encode('utf-8') + '"'

...
Optimize image
Write Node module instead of bash.
Other report beside performance.
Using tool ScreamingFrog
Wordpress case study.
What can we learn from complex stylesheet like this ? There are many style CSS that can be learned here.
Not used image (size crop)
How to speed up LH scan
Coverage all image from TF => list URLs cover all img to scan
...

Comments

Post a Comment

Popular posts from this blog

AWS Elasticache Memcached connection

https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/accessing-elasticache.html#access-from-outside-aws http://hourlyapps.blogspot.com/2010/06/examples-of-memcached-commands.html Access memcached https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/GettingStarted.AuthorizeAccess.html Zip include hidden file https://stackoverflow.com/questions/12493206/zip-including-hidden-files phpmemcachedadmin ~ phpMyAdmin or phpPgAdmin ... telnet mycachecluster.eaogs8.0001.usw2.cache.amazonaws.com 11211 stats items stats cachedump 27 100 https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/VPCs.EC.html https://lzone.de/cheat-sheet/memcached VPC ID Security Group ID (sg-...) Cluster: The identifier for the cluster memcached1 Creation Time: The time (UTC) when the cluster was created January 9, 2019 at 11:47:16 AM UTC+7 Configuration Endpoint: The configuration endpoint of the cluster memcached1.ahgofe.cfg.usw1.cache.amazonaws.com:11211 St...

Notes Windows 10 Virtualbox config, PHP Storm Japanese, custom PHP, Apache build, Postgresql

 cmd => Ctrl + Shift + Enter mklink "C:\Users\HauNT\Videos\host3" "C:\Windows\System32\drivers\etc\hosts" https://www.quora.com/How-to-create-a-router-in-php https://serverfault.com/questions/225155/virtualbox-how-to-set-up-networking-so-both-host-and-guest-can-access-internet 1 NAT + 1 host only config https://unix.stackexchange.com/questions/115464/how-to-properly-set-up-2-network-interfaces-in-centos-running-in-virtualbox DEVICE=eth0 TYPE=Ethernet #BOOTPROTO=dhcp BOOTPROTO=none #IPADDR=10.9.11.246 #PREFIX=24 #GATEWAY=10.9.11.1 #IPV4_FAILURE_FATAL=yes #HWADDR=08:00:27:CC:AC:AC ONBOOT=yes NAME="System eth0" [root@localhost www]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Advanced Micro Devices, Inc. [AMD] 79c970 [PCnet32 LANCE] DEVICE=eth1 IPADDR=192.168.56.28 <= no eff => auto like DHCP #GATEWAY=192.168.56.1 #BOOTPROTO=dhcp BOOTPROTO=static <= no eff ONBOOT=yes HWADDR=08:00:27:b4:20:10 [root@localhost www]# ...

Rocket.Chat DB schema

_raix_push_notifications avatars.chunks avatars.files instances meteor_accounts_loginServiceConfiguration meteor_oauth_pendingCredentials meteor_oauth_pendingRequestTokens migrations rocketchat__trash rocketchat_cron_history rocketchat_custom_emoji rocketchat_custom_sounds rocketchat_import rocketchat_integration_history rocketchat_integrations rocketchat_livechat_custom_field rocketchat_livechat_department rocketchat_livechat_department_agents rocketchat_livechat_external_message rocketchat_livechat_inquiry rocketchat_livechat_office_hour rocketchat_livechat_page_visited rocketchat_livechat_trigger rocketchat_message rocketchat_oauth_apps rocketchat_oembed_cache rocketchat_permissions rocketchat_raw_imports rocketchat_reports rocketchat_roles rocketchat_room rocketchat_settings rocketchat_smarsh_history rocketchat_statistics rocketchat_subscription rocketchat_uploads system.indexes users usersSessions https://rocket.chat/docs/developer-guides/sc...