- UCLA M.S. Computer Science
- Android Developer
- Freelance IT
or, thoughts from a random graduate student
I've been working at Google this summer as an intern, which is part of the reason why there haven't been any updates to any of my things. (This blog, MicDroid, etc) They say Google is a place that engineers disappear into, and are not heard from again, and that seems somewhat true for me this summer. The other part of it is simply that my life has become busy, and I've had to make some sacrifices in what I spend my time on.
If there's one thing that's happened fairly frequently, it's that people are interested in what goes on at Google. Today I'll attempt to talk about some of the things I found interesting.
Please note: these are my personal views, they are not meant to be official announcements of any sort, nor are they in any way endorsed by Google.
Let's begin with what I've learned this summer at Google.
Next let's talk about how things are done at Google.
As expected of the engineering-driven culture at Google, the toolchain for working in the main Google source tree is pretty heavily developed.
Infrastructure at Google is actually quite interesting too. Having to deal with a lot of machines also equates to having a good amount of tools to deal with them as well.
In addition to being known for engineering prowess, Google is also known for being an interesting place to work.
I'm definitely going to miss working at Google (and not just for the free food!).
There are a few things that I did find irritating while working at Google though. Here are some.
Overall, Google has been great to me this summer, and I really think I would enjoy working there in the future (provided this post doesn't disqualify me, oops).
I've heard T-Pain himself mentioned that the famous I Am T-Pain app will be coming to Android, and that there will be much rejoicing. What does this mean for MicDroid? Well, despite the fact that I definitely don't have the backing of a company to equalize on features, I don't plan on closing up shop. It will however let me work at a more leisurely pace (as if the pace I work at isn't leisurely enough already), because there are a few things which are interesting enough to me to keep me working on this project.
Those of you out there who actually keep up to date with my commits on github will probably have noticed that MicDroid is in fact capable of background music now, albeit with some pretty serious caveats. First and foremost is the fact that in order to set a track as background music, it is required that the track be in WAVE format, and not MP3, AAC, or any of the other less popular audio formats. This is primarily because Android does not expose any sort of audio decoder (or encoder) API to developers, despite the fact that most phones have hardware capability to do so. The current workaround is to use Lame4Android to decode MP3 to WAVE, then set that WAVE as backgrond music. Ideally I'd like to add liblame to MicDroid and just decode MP3s on the fly, but the MP3 codec does in fact have licensing issues, and I'd prefer not to deal with that for now, at least not until other more interesting problems have been solved.
The other two remaining issues that I'd like to look at are echo cancellation and audio resampling. Both of these are necessary since phones may be recording at an arbitrary sample rate, while background music may be playing at a different sample rate. Since MicDroid uses a simplistic Output = Mic / 2 + Music / 2 sort of formula to mix, this means the PCM data input to the mixer needs to be sampled at the same rate. Currently I'm looking to tap CCRMA's resample library to do this, which means reading and understanding the code. Sadly my background is in software, so I don't have the DSP knowledge to make it a trivial task, so this may take longer than I hope. Secondly, the echo cancellation, which will make it far easier to deal with live correction (hopefully), is something else I'd love to add in. Personally I'm actually not too sure there's a lot that can be done, due to the lack of any low latency audio API in Android, but I think it's a direction worth looking into. The best sources for this seem to lie in VOIP software and solutions, and I'm looking into sourcing some code from Oslec and csipsimple. Again, DSP is not my strong suit, so it remains to be seen how much I can do with the code.
So in summary, there are still plenty of things left to build, and plenty of directions to expand. I'm thinking that once I get basic audio mixing capabilities functioning using resampling I will release an update with instrumental support as an experimental feature. Additionally, it is my plan to do a write up about the various audio processing that MicDroid does as well.
Until then though, I have plenty of coding to do.
I've put an updated version of Lame4Android out on the Android Market today, and while I'm glad to say it now supports decoding WAV->MP3, it does still have one small glitch, where feeding in files created using Lame4Android will not encode properly. I have a feeling that this is due to the first few bytes of the file not being skipped correctly, which causes the encoder to crap out immediately. I will have a fix for this soon.
Now, on to what I wanted to talk about. Any programmer over the span of their career will have written an API of some shape or form. It may not be good, or widely used, but they'll have done it. This is one of those parts of programming you wish they taught everyone else in school, along with variable naming and spacing, in that you'll wish everyone wrote their APIs like you would. It's also one of those things where everyone can tell when there is a good API, but cannot necessarily write one themselves.
I've had plenty of experience (attempting) to write APIs, but the first time I truly had to worry about what I was writing was when I worked at Deluxe. At the time I was given the task of re-writing the framework on which all of the Universal Blu-Ray releases were written on, since the previous version was getting a bit long in the tooth, and did not support some of the newer base framework functionality. Additionally it was full of strange quirks and unnecessary wrapper functions. Now here I think is the crux of writing a good API: exposing the right amount of magic to the client program. What I mean by this is that the amount of magic the API does behind the scenes to provide the functionality available underneath must not be too little, nor too much. Too little magic in the API and you're left with a nothing more than a thin wrapper around the underlying functionality. An example of this would have been the Universal Blu-Ray framework I was talking about earlier. In quite a few places the old framework would provide wrapper functions to switch audio tracks (think switching from French audio to German) that was nothing more than a wrapper around the underlying base framework's switch audio track functions. On the other hand, too much magic in the API leaves you with a very inflexible set of exposed functionality. An example I'm thinking of would be the Android MediaRecord API. While the designers expose pretty much all the functionality a developer would need to record audio, it just isn't very configurable. You can only record in a certain format, and then only directly to a file, not to a buffer, or to the speaker, or anything else. Nor can you (not until Gingerbread at least) apply effects to the recorded stream. In order to do any of the above, you as a developer must use AudioRecord, which only outputs to a buffer that the developer must manipulate. Too much work is done by the underlying layers, all of it magical to the client programs.
Now to get back to the point of everything though. Today I'd like to say the liblame API is finished (or as finished as a hobby project ever gets), and I'd like to say I did a good job in exposing enough magic to anyone who would like to use it. Please let me know if you do, I'd love to hear from you.
I've recently been bitten (again) by Samsung Galaxy S AudioRecord bugs, and after having to ignore it to focus on classwork for the past few weeks, it's time to get back into it. The last MicDroid update featured improved error handling due to proper (sort of) use of exceptions instead of an Android Handler to route all exceptions to error handling code. Unfortunately this broke Galaxy S support due to what I believe is described in these posts. It appears the Galaxy S phone just locks up the AudioRecord if you try AudioRecord.startRecording() while it is initialized improperly. This behavior certainly seems consistent with the (mass of) bug reports I've been getting from users. I think I'm going to have to try some of the dirty hacks mentioned above to fix it.
Another project which appears not to have had too many issues with the Galaxy S series is the rather famous Sipdroid. They also have AudioRecord code which supposedly works on the Galaxy S series. They use an interesting method of delaying until the next frame is read before reading from AudioRecord again. I'm fully intending to try this method out also, since if it works, it means I can potentially simplify a lot of the complex recording code current around. Also, it could potentially mean fixing the horrible buffer size hacks I'm using right now and finally allow the Galaxy S series to take advantage of live recording.
Should I get it to actually work, I definitely plan on releasing details.
Finally, the last, best hope for the Galaxy S series is the upcoming release of an official 2.2 ROM. There have been rumors that the 2.2 ROM fixes a lot of the audio bugs that plague the 2.1 Galaxy S ROM, and should an update that fixes these fully roll out, all of these horrible Galaxy S problems should be solved. Obviously this is the best solution for everyone :)
Something else I've been throwing around in my head has been the fact that the Galaxy S has a relatively slow internal SD card. I'm wondering if this has any effect on recording, especially if AudioRecord.read() reads audio at a rate faster than the phone can write.
It's problems like these that really have me looking forward to Gingerbread, as it's OpenSL and effects pipeline support can potentially go a LONG way towards resolving basic audio functionality issues like these. Now if only we could get Google to release the source code and Samsung to follow standards and release fully baked hardware...
MicDroid 0.40 has been released!
Release highlights include many force close errors fixed, including some more weird async task related rotation issues. Also there have been ads added to the recording library. AdMob ads can be removed via a setting in the options however, they're purely optional. Code for instrumental support has been added, however it is not enabled, since it only supports wave files currently, and that isn't much use for most users. Future work primarily involves getting LAME built as a library for proper mp3 support. Hopefully this release will be relatively trouble-free, and have fixed most of the outstanding issues from 0.39. Thanks for your support!