Skip to main content

Lighthouse automatic run audit and report result for a web site

Lighthouse is a SEO, performance audit tool by Google Chrome. It provide many report about performance, SEO with suggestion to improve.
This guide I will try to write some bash script and/or NodeJS module to automated the process.
This guide include many tools:
ScreemingFrog (SEO tool, this require fee or you can cr*k :), search free key )
Lighthouse  and here https://developers.google.com/web/tools/lighthouse/
Lighthouse viewer https://googlechrome.github.io/lighthouse/viewer2x/
Git-Bash (some unix tool on fucking Windows)
Python3 (if you use Windows then find yourself how to install it).
NodeJS: Lighthouse and many tool is based on (or install required NodeJS)

Script to get only performance Optimized images report from a URL:

#!/bin/bash
FILES=./LH_json/*
for f in $FILES
do
  report_length=$(cat $f |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" | wc -m);
  report="$(cat $f | python3 -c  | fold -w 12 | head -n 1)"
  
  # Normal page without any image need optimize have report length 720 chars.
  # So we only care ab site page has report length longer than that. These report contain image optimize suggest.

  if [ "$report_length" -gt 720 ]; then
    #echo "$report"
    echo $f
    cat $f |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" 
  else 
    #echo "---"
    printf "\n"
  fi
  
  #  len=$(expr length 'monkey brains');
  # TODO grep exclude eg farm4. as external resources
  # loop through report image to get percent (if needed). or just get image url and page-url
done

use simple print(json.load...) will result in error json format.
JSON seem use double quote for dict data. single quote like this will cause error:


{'rawValue': 90, 'description': 'Optimize images', 'extendedInfo': {'value': {'wastedKb': 12, 'results': [{'fromProtocol': True, 'preview': {'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'mimeType': 'image/jpeg', 'type': 'thumbnail'}, 'totalMs': '380\xa0ms', 'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'totalBytes': 49508, 'wastedBytes': 12223, 'totalKb': '48\xa0KB', 'wastedKb': '12\xa0KB', 'isCrossOrigin': False, 'wastedMs': '90\xa0ms', 'potentialSavings': '12\xa0KB (25%)'}], 'wastedMs': 90}}, 'informative': True, 'score': 90, 'helpText': 'Optimized images load faster and consume less cellular data. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/optimize-images).', 'details': {'header': 'View Details', 'type': 'table', 'itemHeaders': [{'text': '', 'type': 'text', 'itemType': 'thumbnail'}, {'text': 'URL', 'type': 'text', 'itemType': 'url'}, {'text': 'Original', 'type': 'text', 'itemType': 'text'}, {'text': 'Potential Savings', 'type': 'text', 'itemType': 'text'}], 'items': [[{'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'mimeType': 'image/jpeg', 'type': 'thumbnail'}, {'text': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'type': 'url'}, {'text': '48\xa0KB', 'type': 'text'}, {'text': '12\xa0KB (25%)', 'type': 'text'}]]}, 'displayValue': 'Potential savings of 12\xa0KB (~90\xa0ms)', 'scoringMode': 'binary', 'name': 'uses-optimized-images'}

So we have to put a command json.dumps() before this output:
cat $line  |  python3 -c "import sys, json; print(json.dumps( json.load(sys.stdin)['audits']['uses-optimized-images']) )" >> lh_opt_img.json

To get Lighthouse (LH) report from a URL:
lighthouse --quiet --chrome-flags="--headless"  --perf --output=json http://yoursite.test.co/courses/test-coures-1      --output-path course1.json

To get html instead of json output you can simple change output=json to output=html.
To view JSON autdit result go to Lighthouse viewer then browse json file to view.
If you have html output simple open it with chrome browser.
You can use trick to open many audit result:
In bash or git-bash type something like this:
open or chrome-browser path_to_html_result

Combine with python to get only wanted report
 lighthouse --quiet --chrome-flags="--headless"  --perf --output=json http://tafecourses.uat.pgtest.co/courses/massage-therapy/rockhampton/ |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" 

This example show the limit of one-line python should be. And we should move to complete python script solution or at least call another .py file from bash ?
https://unix.stackexchange.com/questions/116228/parse-json-using-python

Python script to grab JSON report, generate CSV expected output report.
import json
from collections import namedtuple

with open('lh_opt_img.json') as f:
for line in f:
# Parse JSON into an object with attributes corresponding to dict keys.
rp = json.loads(line, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
# print line, TODO output CSV format
print '"' + str(rp.rawValue) +'(ms)","' + str(rp.extendedInfo.value.wastedKb) + '","' + rp.extendedInfo.value.results[0].totalKb.encode('utf-8') +'","'+ rp.extendedInfo.value.results[0].potentialSavings.encode('utf-8') + '","' + rp.extendedInfo.value.results[0].url.encode('utf-8') + '"'

...
Optimize image
Write Node module instead of bash.
Other report beside performance.
Using tool ScreamingFrog
Wordpress case study.
What can we learn from complex stylesheet like this ? There are many style CSS that can be learned here.
Not used image (size crop)
How to speed up LH scan
Coverage all image from TF => list URLs cover all img to scan
...

Comments

Post a Comment

Popular posts from this blog

AWS Elasticache Memcached connection

https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/accessing-elasticache.html#access-from-outside-aws http://hourlyapps.blogspot.com/2010/06/examples-of-memcached-commands.html Access memcached https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/GettingStarted.AuthorizeAccess.html Zip include hidden file https://stackoverflow.com/questions/12493206/zip-including-hidden-files phpmemcachedadmin ~ phpMyAdmin or phpPgAdmin ... telnet mycachecluster.eaogs8.0001.usw2.cache.amazonaws.com 11211 stats items stats cachedump 27 100 https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/VPCs.EC.html https://lzone.de/cheat-sheet/memcached VPC ID Security Group ID (sg-...) Cluster: The identifier for the cluster memcached1 Creation Time: The time (UTC) when the cluster was created January 9, 2019 at 11:47:16 AM UTC+7 Configuration Endpoint: The configuration endpoint of the cluster memcached1.ahgofe.cfg.usw1.cache.amazonaws.com:11211 St...

Simulate Fail2ban on Apache request spam with mod_evasive limitipconn ...

https://en.wikipedia.org/wiki/Manchu_alphabet https://en.wikipedia.org/wiki/Sweet_potato https://en.wikipedia.org/wiki/New_World_crops https://www.mdpi.com/journal/energies http://www.cired.net/publications/cired2007/pdfs/CIRED2007_0342_paper.pdf https://www.davidpashley.com/articles/writing-robust-shell-scripts/ trap command https://en.wikipedia.org/wiki/Race_condition https://unix.stackexchange.com/questions/172541/why-does-exit-1-not-exit-the-script exit 1 not work it seem { } brace bound fixed it. cat access_log | cut -d ' ' -f 1 > ip1 sort -n -t. -k1,1 -k2,2 -k3,3 -k4,4 | uniq -c | sort -n -r -s https://unix.stackexchange.com/questions/246104/unix-count-unique-ip-addresses-sort-them-by-most-frequent-and-also-sort-them https://stackoverflow.com/questions/20164696/how-to-block-spam-and-spam-bots-for-good-with-htaccess  Code: ------------------------------------------------------------------- #Block Spam Bots and Spam on your website #Block proxies...

Notes Windows 10 Virtualbox config, PHP Storm Japanese, custom PHP, Apache build, Postgresql

 cmd => Ctrl + Shift + Enter mklink "C:\Users\HauNT\Videos\host3" "C:\Windows\System32\drivers\etc\hosts" https://www.quora.com/How-to-create-a-router-in-php https://serverfault.com/questions/225155/virtualbox-how-to-set-up-networking-so-both-host-and-guest-can-access-internet 1 NAT + 1 host only config https://unix.stackexchange.com/questions/115464/how-to-properly-set-up-2-network-interfaces-in-centos-running-in-virtualbox DEVICE=eth0 TYPE=Ethernet #BOOTPROTO=dhcp BOOTPROTO=none #IPADDR=10.9.11.246 #PREFIX=24 #GATEWAY=10.9.11.1 #IPV4_FAILURE_FATAL=yes #HWADDR=08:00:27:CC:AC:AC ONBOOT=yes NAME="System eth0" [root@localhost www]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Advanced Micro Devices, Inc. [AMD] 79c970 [PCnet32 LANCE] DEVICE=eth1 IPADDR=192.168.56.28 <= no eff => auto like DHCP #GATEWAY=192.168.56.1 #BOOTPROTO=dhcp BOOTPROTO=static <= no eff ONBOOT=yes HWADDR=08:00:27:b4:20:10 [root@localhost www]# ...