Yesterday morning I took a few minutes to work on some automation with getId3 and mp3 upload site. I have been using getId3 to validate mp3 uploads for awhile. See my post:
For validation that is live now, I use getID3 to check that upload is the correct mime type but also has a playtime_seconds. Any file that is not playable will not have any playtime so it is a good method to ensure what the user uploaded is an actual mp3. While processing each upload does load the tag data I had not stored the additional ID3 mp3 tags. It would be good to improve the upload to insert the tags. However, today my task was to do that for over 5000 files already in the system without tag data.
First I created a getTags method and database then did a few test runs with the id3v2 tag data. Once i was able to process all 5,000 mp3s I started to improve the method to recognize imageData tag and save that. The most important tags (title, artist, band, album, year, genre, bpm, bitrate) are then inserted into the database. Once all the tags were processed I then wrote another method makeMp3Html which save html for the last 99 mp3s which have a valid image. During automation the html markup is saved to the public site folder as mp3s.html.
For automation timing I set crontab for getTags to run every 5 minutes, and makeMp3Html every 15 minutes. My timing is overkill to make sure it works even when not changing anything. After running all nite, it only processed 3 new images into the web site. Most user uploads do NOT have a valid mp3 image.
It is important to know that for this project I chose to cache html markup on purpose. I could have executed this data->html process in the page load, but where is the fun in that? For this purpose I really needed the website to not have a live database connection or even session for that matter. The front page, landing page, or entry page is accessed via 404s and redirecting domains by tons of ips/devices doing multiple hits per second. This creates a lot of requests for the site that will never be actual page loads. A session and/or mysql connection for each hit is not necessary and is poor performance.
One of my next projects for this is to implement a method to deliver the new top mp3s in a dispersed manner to spread the requests for non-existing mp3s. In the past I have used .htaccess to deliver these request for .mp3s to a promo.mp3 file. This is usually a track that I am promoting. The system is capable of doing 1 million hits in 30 days on the promo.mp3 so it is possible to really push new mp3s hard.
I found a neat mp3 player that dropped in pretty easy. All i had to do out of box was hide the play/pause button in front of each mp3 block. You can find the player here. I then created urls in the format of /mp3s/ID/ that will display a modal with the requested upload. In order to satisfy loading the modals with content and not a database connect, i made another automaton method makeMp3Modals that saves modal html code for every upload. When the request to the application comes in with an upload Id, the page knows to load that uploads corresponding Upload Modal. I have also integrated this into the Live and Deleted from live uploads. Now that this works I need to get into improving the modal markup to have players, downloads, share, embed, etc as well as functionality to undelete a file.
By end of day two I had also created hourly /stats/ using Canvas JS and a custom apache log file scripted into mysql. It took quite a while to get this working as I spent a great deal of time trying to do direct integration from apache into sql. I have scripted logs in the past and current implementation works but is not ideal. For direct insert into mysql it seems the very out dated technology mod_log_sql will not work with current servers mariadb. From my research I found out that same methods are possible in debian/ubuntu flavors, which is interesting project attempt for later.
Today I turned on automation to run the apache logs into mysql and then created another automation method makeStats that saves a block of html for front page. I wanted to see # requests per sec and total uploads for today and this achieves that purpose. None of the uploads were timestamped so I had to handle that as well. On the /stats/ side I have created metric views of Ips Per Hour, Mp3 Requests Per hour, 404 Requests per hour, Uploads Per Hour, Upload Pages as well as many refer pages already setup in /stats/ from before. It was very kewl to have that platform already ready to use once I had the same type of data rolling in and some good visual metrics:
Today I spent time most of my time working with Ruby on Rails. I did get some time into inspecting and creating more stat metrics. I was able to see some player.swf 404s that I could link back up to players that were not working. This in turn created 404s for mp3 files requested within the swf that did not work previously.
Further inspecting an improving the apache/mysql log automation, I added the accessing host so I can measure which domains are directing requests. I also noticed this morning that some BIG HISTORICAL subdomains were not routed in DNS after moving the server. Oops. I quickly added all the known subdomains on all of the active urls and seen the jump in traffic go back up to pre migration levels. Excellent! Here is the visual of that jump:
Next I spent some creative time and made the Mp3 Monster Modal for uploads. I then spent some time making the total upload size work for up to 250mb and 25 total files. I made sure this worked with uploading a 60mb and 240mb mix. Last but not least I set the Mp3 Monster Modal to open on all pages not requesting an Upload Modal. I want to measure if the number of uploads will increase, or how effective the Mp3 Monster is. Historically the number of landing page to upload conversions is always low. Tomorrow will be a new benchmark. Here is yesterday:
Today I also did some pre automation testing for 404 routing mp3s to a different existing asset versus delivering html source. Just 404-ing everything to the front page of the site creates responses that contain HTML for mp3 requests expecting mp3 encoding. This is useless hit to the front page and all linked assets (images, css, js). In the past I have used .htaccess to direct this 404 .mp3 request to a specific file. For this current day need I want to measure when files need to be routed to mp3 encoding and which files need to be routed to an html page. The historical question of: “Is this a player or a browser?”, “Is this a stream or a download?”… I am going to try and solve some of these issues using with a fresh outlook on automation and server technology that is not LAMP.
And after updating the blog here, we can see that the automation has operated… my new 240mb upload mix Mp3 is fully processed (id3 tags and html) and operational with the url /mp3s/[uploadid]/ (link):
Today I spent some time looking at the # of uploads, # of actual files in the database, etc.
I then adjusted the uploader class to log an error in a file when an upload fails. I want to watch what types of files are not uploading. I adjusted the stats just a bit, and then spent sometime working front page paths into mp3 pages. When you hover an artwork cover now, a link should appear which loads that songs individual /m3/ID/ url. I also added the link to the download button. Next I spent some more time trying to find a good player. I want something like soundcloud so that I can use it to share my songs without using soundcloud.